Deployments

TL;DR

  • A deployment is the runtime materialization of a backend — the typed graph running as containers on a runtime, behind the endpoint URLs that serve traffic, with the operational state the team watches.
  • A deployment is 1:1 with a backend. Each backend has at most one live deployment; running the same workflow in two environments means forking the backend first and deploying each fork independently. That constraint is what makes "what is currently in production" unambiguous.
  • Endpoint URLs are a backend property, not a deployment property. The backend publishes them as soon as it declares aliases; the deployment is the live process that makes those URLs serve. Tear the deployment down and the URLs remain but answer 404.
  • The deployment owns the live runtime; it does not own the graph definition (that lives in the backend) or the UI (that lives in the application). The runtime can be torn down, replaced, or migrated without touching either of the other layers.
  • "Healthy" means runtime readiness, not behavioral correctness. A running deployment confirms containers came up; representative input producing expected output is what confirms the solution works.

What a deployment actually is

A deployment is the runtime materialization of a backend. The op log on the backend is the input; the containers running on a runtime are the output. The backend is a typed graph definition that lives in the workspace; the deployment is one materialization of that graph on actual compute, with a lifecycle measured in minutes to weeks rather than months.

The split between graph and runtime is deliberate. The backend stays the inspectable definition, and the deployment is the disposable runtime — torn down on demand, restarted after runtime maintenance, replaced wholesale on a version bump, migrated to a different runtime without rewriting the graph. A backend edit therefore does not risk taking down a running system, a redeploy is not entangled with graph editing, and an environment difference is a separate backend rather than duplicated graph state.

The unit of the runtime side is the container, not the graph. Every vertex in the deployed backend becomes one component container; every edge becomes platform transport between them. Whether the vertex is a Python model server, a C++ stream processor, or a transformation primitive does not change that shape — it changes what runs inside. Operating a deployment looks like operating any other containerized workload, so existing container-debugging instincts apply directly.

Mental model — one backend, one deployment

   ┌────────────────────┐ owns endpoint    ┌──────────────────┐
   │ Backend  prod-bid  │── aliases ──▶    │ Deployment       │
   └─────────┬──────────┘                  │ Runtime X · prod │
             │ fork                        └──────────────────┘
             ├─────────────────────────┐
             ▼                         ▼
   ┌────────────────────┐   ┌────────────────────┐
   │ Backend  stg-bid   │   │ Backend  dbg-bid   │
   └─────────┬──────────┘   └──────────┬─────────┘
             │ owns aliases            │ owns aliases
             ▼                         ▼
   ┌────────────────────┐   ┌────────────────────┐
   │ Deployment         │   │ Deployment         │
   │ Runtime Y · stg    │   │ Runtime Z · debug  │
   └────────────────────┘   └────────────────────┘

The backend is the typed graph definition; the deployment is the live materialization of that exact backend on a specific runtime. To stand up "the same graph" somewhere else, fork the backend into a sibling object and deploy the fork. The forks share an ancestor and a starting graph; from the fork point onward they are independent objects with their own operation histories and their own deployments.

Each deployment row carries its ID, its status, the runtime it landed on, the backend it pins, how many containers it owns, an expiry timestamp, and a debug flag. The IDs are what every deployment-scoped command takes — container list --deployment, backend undeploy --deployment, and so on.

Why the 1:1 constraint matters

Use this framing whenever someone proposes "let's deploy the same backend twice".

A backend with two concurrent deployments would have no single answer to "which one is the truth?" — the operation log on the backend would diverge from at least one of the two runtimes, observability tooling would have to sort runtime state versus graph state by deployment, and the question "what is in production" would always need qualification.

The fork-then-deploy pattern keeps each deployment paired with a backend whose state is exactly the state being run. Staging is a different backend; production is a different backend; a debug copy of last week's incident is a different backend. Each one has its own operation log, its own forward URLs, its own audit trail. The cost is creating sibling backends explicitly; the result is that the runtime model always reflects what code is running where.

The 1:1 link is also what makes "redeploy" a meaningful operation. There is one deployment per backend; redeploying replaces the containers of that one deployment with fresh ones reflecting the current backend state. The forwarding URLs (bound to backend + vertex + endpoint, not to the deployment) survive the swap, so callers see a brief gap rather than a URL rotation.

Lifecycle

   deploy ─▶ deploying ─▶ running ─┬─▶ tearing_down ─▶ (gone)
                                   └─▶ failed

A deployment walks a small lifecycle. Deploying means the row exists and the platform is bringing containers up; the backend's endpoint URLs already resolve but answer 404 until containers are listening. Running means every container is up and the URLs serve traffic — the only state where the deployment is "live" in the sense callers expect. Tearing down means an explicit undeploy, an owning lease's commit-or-rollback, or an expiry has reclaimed the runtime; endpoints drop back to 404. Failed is the terminal state where a container exhausted its crashloop budget, transport could not establish, or a startup precondition was not met; endpoints stay at 404 until the team intervenes.

A deploy can also be rejected before any container starts: if the only available runtime is full or its runtime class does not match what the graph needs, the platform rejects the deploy up front rather than landing a partial runtime. That is intentional — a deploy that cannot satisfy its placement constraints fails fast instead of producing containers that cannot run.

Anything other than running is, from a caller's perspective, "not ready". The forwarding URLs may exist throughout, but the contract is that real traffic should not arrive until the deployment is running. The Test-with-a-live-backend flow's wait step gates "start sending fixtures" on this transition.

Endpoint URLs belong to the backend

Use this framing whenever a caller's URL needs to survive a redeploy.

The forwarding URL pattern binds tokens to (backend, vertex, endpoint) — not to (deployment, vertex, endpoint). That binding shape is why callers do not re-fetch URLs every time the deployment churns. Tear down the deployment, stand a new one up against the same backend, and the same URL serves the same endpoint. The deployment is replaceable runtime; the URL is part of the backend's public contract.

The flip side is that endpoint URLs exist whether or not a deployment is currently live. The backend can publish its aliases the moment they are declared; the URLs are routable; they answer 404 while no deployment is running behind them. This lets callers be configured with URLs ahead of the deployment, and it makes the gap between "deployment torn down" and "new deployment up" a brief 404 rather than a URL rotation event.

See Solutions for the alias model and Deploy and monitor for the forwarding token contract in detail.

What can change on a live backend

Use this framing whenever you wonder whether a backend edit needs a redeploy.

A deployed backend accepts a narrow set of in-place edits: parameter values on slots declared mutable: true on their component, file values on slots declared mutable: true, and similar runtime-safe values the component author opted into. The platform pushes the new value into the running container and the component picks it up through its subscribe-config hook on the next invocation. No redeploy; no container restart.

Topology changes are different. Adding a vertex, removing a vertex, swapping a component release, changing an edge, changing an endpoint alias, binding a different file to a non-mutable slot — all of these are rejected while a deployment is live. The deployment is the materialization of this exact graph, with the op log as the input; changing the graph would make the materialization disagree with the op log, leaving the running containers out of sync with the recorded backend state. The platform rejects the change instead.

For a topology change against a backend that already has a live deployment, the workflow is: fork the backend, edit the fork, deploy the fork as a separate deployment. The original backend keeps running with its current graph; the fork carries the topology change. Once the fork is proven, the team retires the original deployment, or promotes the fork as the new production backend, depending on the workflow. This is the same pattern that makes parallel environments work — the way to change the graph without affecting production is to deploy a new one rather than edit the running graph.

Operating modes

Use these as the small set of knobs that change how a deployment runs.

Debug deployments. --debug enables richer log capture from each container. Startup is slower, so reserve it for chasing specific problems rather than as a default. The deployment row carries debug: true so debug runs are easy to tell apart at a glance.

Pinned lifetime. By default a deployment auto-extends up to the runtime's ceiling. Passing --fixed-duration <go-duration> at deploy time pins a hard deadline — auto-extension is disabled and the platform reclaims the runtime on schedule. This fits known wall-clock budgets such as a demo window or a scheduled batch, and is not the choice for production.

Lease-owned deployments. A deployment created inside a lease is owned by that lease and is reclaimed when the lease closes. That is the standard shape for ephemeral test runs and for batch computations that should not survive their job — see Leases and The lease lifecycle.

What "healthy" means

A running deployment confirms that every container came up. It does not confirm whether the workflow produces correct outputs, whether the model loaded the right weights, or whether the typed values flowing across edges are semantically right. Health is runtime readiness; behavioral correctness is a separate question, answered by the proof loop.

The useful discipline is to gate traffic on running (the runtime is ready) and to gate promotion on the proof loop (the runtime produces the right results). Conflating them produces both kinds of failure — sending traffic before the runtime is ready, and promoting a deployment that runs without producing the right results.

See Prove behavior for the proof-loop side of this contract.

Where this fits

A deployment is the runtime side of the platform: the operating surface is a small set of verbs, the lifecycle is four states plus the up-front placement check, and the endpoint contract is one alias model on the backend. Keeping the graph definition in the backend and the UI in the application is what lets the deployment stay focused on running and observing the containers, even as the workflows it runs grow rich.

Related

Was this page helpful?