Deploy and monitor

For what a deployment is — runtime materialization, the 1:1 backend↔deployment relationship, container-level health, and stable forwarding URLs — see Deployments. This page is the one goal: take a checked backend and run it on real compute, then keep it healthy. The loop is deploy → forward → watch → redeploy → undeploy, and every step below is one move in that loop.

How a deployment moves

A deployment has a small, predictable lifecycle. deploy schedules containers and the deployment moves from deploying into running; redeploy swaps those containers for the current backend state while the forwarding URLs stay valid; undeploy tears it down. Anything other than running is not ready for traffic.

   ppl backend deploy


   ┌───────────┐   up    ┌─────────┐  undeploy  ┌──────────────┐
   │ deploying │ ───────▶│ running │ ──────────▶│ tearing_down │
   └─────┬─────┘         └────┬────┘            └──────────────┘
         │ error              │ redeploy
         ▼                    │ (same forward URLs)
   ┌─────────┐                ▼
   │ failed  │           new containers
   └─────────┘

The rest of this page walks that loop in order: pick a runtime, launch, expose endpoints, watch containers, then redeploy or tear down.

Picking a runtime

Use this step to decide where the backend runs.

A runtime is a pool of compute nodes the workspace can deploy onto. It carries the runtime constraints: which nodes belong to it, whether they have GPUs, how much headroom is left after current deployments, and how long a deployment may live before the runtime reclaims it. Runtimes keep "which compute does this run on" separate from "what does this run", so backend editing and capacity management stay distinct decisions.

Picking a runtime is a one-shot list. Each runtime reports its name, node count, current deployments, and the timeout that bounds how long a deployment may live on it. Pick one with free nodes and enough runway for the workload.

ppl runtime list

If the only available runtime is full, or its runtime class does not match what the backend needs (for example a CPU-only runtime for a GPU vertex), deploy rejects before any container starts. The check happens up front so a deployment is never scheduled onto a runtime that cannot host it.

See Runtimes and nodes.

Launching the deployment

Use this step to turn a checked backend into running containers.

Deploying schedules the backend's containers onto a runtime for a given lifetime. The deploy call only names the backend and the runtime.

ppl backend deploy --backend <backend_id> --runtime <runtime_id>

The deployment record returns id, status, and expires_at. Treat everything other than running as not ready for traffic. For the status lifecycle and the 1:1 backend↔deployment rule (fork the backend for parallel environments), see Deployments.

Two flags shape the runtime envelope:

  • --debug enables richer log capture. Startup is slower; reserve it for chasing problems, not as a default.
  • --fixed-duration 1h30m pins the deployment's lifetime instead of letting it auto-extend. Useful when the work has a known wall-clock budget such as a demo window or a scheduled batch.

Repeating --backend batch-deploys multiple backends in one call; in that mode --runtime is mandatory, since the platform does not pick a runtime for a batch.

See Deployments.

Exposing the endpoints

Use this step after the deployment is running to give callers, or your application, a URL.

Forwarding issues a public URL and a token for each endpoint role on the backend. Forwarding tokens survive redeploy — see Deployments for how the binding works.

ppl forward <backend_id>
ppl forward <backend_id> --endpoint image-input --expiration 1h

--expiration caps a token's lifetime; omitting it lets the token live as long as the binding does.

If the URL is called before every vertex container is running, the call returns 404: the container that serves it has not started yet. Poll containers until everything is running before pointing real traffic at the deployment.

See Solutions.

Watching what's running

Use this step continuously while the deployment is alive.

Monitoring a deployment is reading its container state.

ppl container list --deployment <deployment_id>

Each row reports the container id, which node_id it landed on, the vertex it serves, and its current status — one of starting, running, restarting, stopping, or failed. A container that keeps crashing on startup shows as restarting, which is usually a component-level problem — a missing model file, an exception during initialization, a misconfigured serving service dependency, an unbound parameter the component treated as required. Container logs are the next step:

ppl container logs <container_id>
ppl container logs <container_id> --tail 200
ppl container logs <container_id> --since 10m
ppl container logs <container_id> --follow --timestamps

The log surface accepts RFC3339 timestamps, Unix seconds, or Go duration strings (30m, 2h). --follow streams; everything else is a one-shot fetch.

For broader failure patterns, see Common failures.

Redeploying

Use this step after bumping a component version or recovering from a runtime restart.

Redeploy swaps the running containers for the current backend state in one operation, keeping the forwarding URLs:

ppl backend redeploy --backend <backend_id>
ppl backend redeploy --backend <backend_id> --runtime <runtime_id>

Pass exactly one of --backend or --deployment. Adding --runtime migrates the deployment to a different runtime in the same redeploy. Forwarding URLs are unchanged across the swap, with only a brief window where requests may queue or return 404 while the new containers come up.

The typical trigger is a vertex version bump made earlier with ppl backend change-version. The graph mutation lands first; the redeploy applies it to running compute.

Tearing down

Use this step when the deployment's work is finished.

Undeploy stops the containers and releases the runtime headroom. The backend definition is untouched; only its runtime instance is removed.

ppl backend undeploy --deployment <deployment_id>
ppl backend undeploy --backend <backend_id>

Pass exactly one of --deployment or --backend. The --save flag uploads every file declared in each vertex's generated_file_schema into the workspace file store before teardown. Use it when the deployment produced outputs worth keeping — computed artifacts, model checkpoints, derived datasets — and omit it for ephemeral runs.

Where this fits

The production loop is small: deploy, forward, watch, redeploy, undeploy. Changes to the graph itself belong back in backend editing, before deployment. For why the definition layer and the runtime layer are separate, see Deployments.

Related

Was this page helpful?