Test with a live backend

TL;DR

  • A live-backend test runs the candidate component (or model artifact, or graph change) on the same runtime production uses, with an I/O harness wrapped around it. The candidate runs in the production container environment, over production typed streams, with no mocked transport and no separate test SDK.
  • The canonical loop is manifest-driven: ppl lease run --test=<manifest> --runtime=<id> mints a lease, publishes a prerelease from the working directory, uploads local fixtures, opens a live-test event stream, and rolls the lease back at the end. Exit 0 on ready, exit 1 on failed.
  • The harness shape is input ⇒ CUT ⇒ output with input_<format>_http / output_<format>_http components on the edges. Inputs are HTTP POST; outputs are WebSocket frames (JSON for serializable types, metadata-then-binary for media).
  • A lease is the cleanup contract: the test backend, the prerelease, the temporary fixtures, and the deployment all live inside the lease and are removed together when the lease closes. The TTL guarantees teardown even if the test runner exits early.
  • The hand-built path (backend create / add-vertex / connect / change-parameter / deploy) covers the cases the manifest grammar cannot express — one-off graphs, pre-existing harness combinations, non-standard test runners.

What a live-backend test is

A live-backend test runs the candidate component on the runtime it will run on in production, inside a real backend graph, with an I/O harness wrapped around it so fixtures can flow in and results can flow out. The candidate sees the same containerized environment, the same typed streams, and the same serving-service dependencies it sees in production. The harness components (input_<format>_http, output_<format>_http) are themselves released components, so the test path and the production path are the same path.

This avoids the usual trade-off between fast feedback and a realistic environment. A mocked model gives a quick test that proves nothing about the real model; a hand-run model in a notebook is faithful but offers no isolation, reproducibility, or CI story; a separate "test mode" SDK drifts from production over time. A live-backend test keeps the realistic environment without any of those.

The lease is what makes this practical as a CI pattern. A real backend stood up per test would otherwise leak — the backend, the deployment, and the temporary fixtures would all persist, and every CI run would leave orphans behind. Inside a lease, the whole apparatus is one owned scope that is removed together, so the test is both realistic and clean.

See Leases and The lease lifecycle for the ownership model behind this.

The harness shape

fixture


input_<fmt>_http ──► CUT (component under test) ──► output_<fmt>_http
   (POST or WS)      (the candidate)                 (WS frames)

The harness gives the candidate component a real input source and a real output sink, so the test exercises typed streams the same way production does. The input component takes an HTTP request and emits a typed stream into the CUT's input port. The output component consumes the CUT's output stream and emits it as a WebSocket frame the test can read.

The harness is two normal released components added as vertices on a normal backend — not special infrastructure. You substitute different harness components (an Input Audio HTTP for an audio CUT, an Output Image HTTP for an image-emitting CUT) by changing the vertex on each end. The shape stays the same. A CUT with no output_type — a sink such as log-message — is still live-testable: it produces no harness-readable frame, so you assert against its container logs instead.

Two ways to run one

There are two paths to the same live test. Pick by whether you will repeat it.

Manifest-drivenDeclarative and repeatable, driven by ppl lease run --test. Reach for it for CI, regression suites, proof loops, and version comparisons. Detailed in Path 1 below.
Hand-built backendThe raw backend create / add-vertex / connect / deploy primitives. Reach for it for one-off graphs and harness combinations the manifest grammar cannot express. Detailed in Path 2 below.

Path 1 — manifest-driven (ppl lease run --test=…)

Use this path for anything you want to repeat: CI, regression suites, proof loops, version comparisons.

The manifest describes the test declaratively: what the candidate is, what fixtures it should see, what graph wraps it, and what outputs should be captured. The CLI reads the manifest, publishes the candidate if needed, uploads local fixtures, opens the live-test event stream, and emits server events as NDJSON until the test reaches a terminal state.

ppl lease run --test=tests/live-test.yml --runtime=<runtime_id>

The CLI does five things in order, all derived from the manifest: parse, optionally publish the CUT prerelease from the working directory, pre-upload any local-path fixtures (rewriting their manifest entries to file_id), open the live-test stream, and emit every server event as one NDJSON line on stdout. The exit code is 0 on ready and 1 on failed or transport close, which is the contract a CI step needs.

--cut-version-id points the test at a specific prerelease instead of publishing one from cwd. --label sets the lease label (used for later bulk cleanup). --ttl caps the lease's lifetime (the server applies a default plus a hard cap). --keep-lease skips the rollback so you can inspect the live environment afterwards — use it whenever you intend to drive the deployed backend by hand after the test setup completes.

The manifest

The manifest is a declarative description of the whole run. The vertices section names harness components on the edges and leaves the CUT vertex unmarked — a vertex without a harness is the CUT. The edges section wires them. Fixtures attach by name and resolve to exactly one of path / url / file_id. Secrets bind workspace secrets into vertex parameters. Output files declare which generated files the platform should save into the workspace at teardown.

release:
  auto: true                    # CLI publishes a prerelease from cwd

deploy:
  fixed_duration: 10m           # Pin deployment lifetime (Go duration)
  lease_ttl: 30m                # Whole-lease TTL (server caps)

fixtures:
  - {name: dog, path: ./tests/dog.jpg}

secrets:
  - {vertex: cut, key: HF_TOKEN, workspace_secret: hf_token}

graph:
  vertices:
    in:  {harness: "Input Image HTTP"}
    cut: {params: {model_cfg: {type: String, value: '"fastvit_t8"'}}}
    out: {harness: "Output JSON HTTP"}
  edges:
    - {from: in,  from_output: 0, to: cut, to_input: 0}
    - {from: cut, from_output: 0, to: out, to_input: 0}

output_files:
  - {name: report, vertex: cut, key: report.json}

The event stream

The server reports each step of the test setup over the live-test stream. Each event is a JSON object with a type field; the CLI prints them verbatim, one per line.

typeMeaning
lease_createdServer minted a lease for this run.
fixture_fetchedOne per URL-mode fixture.
backend_createdServer allocated a backend id.
vertex_addedOne per vertex; carries the server-assigned vertex_id.
edge_connectedOne per edge.
param_setOne per vertex parameter set.
file_boundOne per vertex file binding.
deploy_startedDeployment id minted.
containerOne per spawned container.
readyTerminal success. CLI exits 0.
failedTerminal failure. Payload {stage, error, debug_bundle?}. CLI exits 1.

The stream covers the test setup; the fixture-in / result-out exchange happens against the forwarded endpoints after ready. Map your assertions back onto the server-assigned vertex_id carried in each vertex_added event.

Path 2 — hand-built backend

Use this path when the manifest grammar cannot describe the test: one-off graphs, harness combinations the schema does not cover, or drivers written in a runner the manifest flow does not embed.

The hand-built path uses the same CLI primitives the platform uses internally: create a backend, add the input / CUT / output vertices, connect their ports, bind any parameters the CUT needs, and deploy.

PID=$(ppl backend create --name "live test $(date +%s)" | jq -r .data.backend_id)

# component versions returns a bare JSON array (no {count, items} envelope):
INPUT_V=$(ppl component versions <input_component_id>  | jq -r '.[] | select(.tags[]? == "latest") | .id')
CUT_V=$(ppl component versions <cut_component_id>      | jq -r '.[] | select(.tags[]? == "latest") | .id')
OUTPUT_V=$(ppl component versions <output_component_id> | jq -r '.[] | select(.tags[]? == "latest") | .id')

# add-vertex has no vertex-id flag — the server assigns it and returns
# it in .data.vertex_id. Capture each id for the connect calls.
IN=$(ppl  backend add-vertex $PID --version $INPUT_V  --alias in  | jq -r .data.vertex_id)
CUT=$(ppl backend add-vertex $PID --version $CUT_V    --alias cut | jq -r .data.vertex_id)
OUT=$(ppl backend add-vertex $PID --version $OUTPUT_V --alias out | jq -r .data.vertex_id)

ppl backend connect $PID --from-vertex $IN  --from-output 0 --to-vertex $CUT --to-input 0
ppl backend connect $PID --from-vertex $CUT --from-output 0 --to-vertex $OUT --to-input 0

ppl backend change-parameter $PID --vertex $CUT --name model_cfg \
    --type String --value '"fastvit_t8"'

ppl backend deploy --runtime <runtime_id> --backend $PID

These verbs do not attach to a lease — only the manifest-driven lease run path stamps the backend, prerelease, fixtures, and deployment as one lease-owned scope that rolls back together. So the hand-built sequence above is the form for one-off exploration, where you discard the resources by hand (ppl backend undeploy --backend $PID, then ppl backend delete $PID). For a repeatable CI loop with automatic cleanup, use Path 1.

Choosing the harness

The harness on each side has to match the CUT's I/O type:

CUT input typeHarness ingress
ImageInput Image HTTP (output 0 = Image)
AudioFrameInput Audio HTTP
TensorInput NumPy HTTP
anything serializableInput JSON HTTP (output 0 = t)
CUT output typeHarness egress
ImageOutput Image HTTP
Polygon<Double>Output JSON HTTP
[BoundingBox]Output JSON HTTP
anything genericOutput JSON HTTP

Output JSON HTTP is the default egress: it accepts any backend-typed message and serializes it to JSON on the WebSocket, which is the easiest shape to assert against in tests. Use Output Image HTTP (metadata-then-binary frame protocol) when the CUT emits an Image and you want the raw bytes.

Discover harness components by query — the lean list cap is 20 records, and --query searches past it:

ppl component list --query=Input
ppl component list --query=Output

Driving the test

After deploy, get the endpoint URLs with ppl forward $PID. Each row carries the endpoint_name (whatever the harness component declared in its http: block), the vertex_id, the url, and the token. Treat the URL and token as bearer credentials; pass them as environment variables to your test driver rather than logging them.

The output URL is HTTP, but tests connect to it as WebSocket (https://wss://). The input URL is plain HTTP POST. Open the output WebSocket before posting the input — otherwise the response can arrive while the reader is still connecting and the test misses it. For Output JSON HTTP, one text frame per backend message carries the JSON body; for Output Image HTTP, one text frame {"type": "metadata", "metadata": {…}} (keys configurable via metadata_keys) is followed by one binary frame with the image bytes.

Timing realities

A fresh backend takes 30 seconds to 4 minutes to deploy — image pull from the runtime's local registry, container start, type inference run, endpoint registration. Cold runtimes sit at the top end of that range. If the CUT lazily loads a model (HuggingFace pull, ONNX initialisation, Triton model load), the first request after deploy can take an additional 10–30 seconds. Test drivers should set per-frame read timeouts at ≥30 seconds for the first request and budget cold-start time at the harness level, not per-test.

These numbers shape test design because every fresh backend pays them again — they do not amortize across backends. A CI suite that respects them stands up one backend, runs many fixtures against it, and tears it down, rather than standing up a new backend per fixture. The manifest path supports this directly: one lease holds one deployment for the whole run.

When the test fails

Failures fall into a small set of shapes:

  • Type mismatch at a connect edge — the platform refused the graph before deploy. The error names the edge and the conflict; fix the connection (often a missing transformation between mismatched but compatible types).
  • Endpoint resolves but POST returns 404 — the backend is still deploying. Poll until containers are running before sending input.
  • WebSocket reads time out — the CUT crashed inside the container. Read its logs (ppl container logs <container_id>).
  • WebSocket connects but no frames arrive — the CUT is running but failing silently (no exception, no emit). Component-level logging is the next step.
  • Deploy rejects with "no nodes available" — the runtime lacks GPU or RAM headroom for the candidate. Pick a different runtime or wait.

For broader patterns, see Common failures.

Where this fits

A live-backend test is the proof step between build and ship. It answers whether the candidate component, model artifact, or graph revision behaves correctly on the runtime it will run on in production. The runtime is identical to production and the lease guarantees cleanup, so the result is both faithful and repeatable — the evidence a promotion decision needs.

The manifest path makes that loop a CI step rather than a hand-driven sequence. The hand-built path covers tests that do not fit the manifest grammar. Both produce the same proof; they differ only in how you describe the test.

Related

  • Prove behavior — broader proof modes (model integration, data pipelines, version comparison) that all build on this loop.
  • The lease lifecycle — the cleanup contract behind every live-test run.
  • Leases — ownership model for ephemeral resources.
  • Deploy and monitor — the non-ephemeral counterpart of this loop.
  • Backends — the graph the test wraps.
  • Common failures — symptom → fix lookup.

Was this page helpful?