Transformations
TL;DR
- A transformation reshapes one stream into another: pack two fields into a record, drop the third element of a tuple, flatten a list, filter a value stream against a predicate stream, repeat a value a counted number of times, fan two streams into one.
- Transformations have the same typed contract as components (positional typed inputs, positional typed outputs, typed per-vertex parameters), so the backend's type inference still validates the whole graph at edit time.
- They are an inlined, built-in graph operation — no container, no image build, no
requirements.txt, no scheduling, no network hop. The platform runs them inline as graph plumbing. - The backend can auto-insert a transformation when a
connectinvolves two slots that are bridgeable but not identical. It inserts only transformations, never a component, and the response lists every implicit insertion so the author can inspect or remove what landed. sketchis the deploy-blocking placeholder: variadic typed I/O plus areadmeparameter for free-form notes. The surrounding graph can be wired and type-checked immediately; deploy refuses until everysketchis replaced.
Why transformations are their own primitive
A transformation reshapes one stream into another — pack two fields into a record, drop the third element of a tuple, repeat a value a counted number of times. The platform treats it as an inlined built-in graph operation, not a component you write and deploy.
Pipelogic splits the two concerns along a single line. Anything that runs code is a component. ML inference, external API calls, business rules — anything where the work is "compute something based on the input" — gets the full container treatment for isolation, lifecycle, and observability. Anything that is "same data, different shape" is a transformation. No container, no image; the platform recognises the operation and runs it inline.
The architectural reason for the split is that a pure shape change has none of the properties a container exists to provide. There is no code to isolate, no process lifecycle to manage, no logs to surface — only a deterministic reshaping of typed data. So the platform inlines it: no container to build, no image to push, nothing to schedule, no network hop between vertices. Typing is unaffected, because transformations carry the same typed contract as components, so the graph stays validated end to end.
Mental model — transformation vs component
Component Transformation
┌────────────────────────────┐ ┌────────────────────────────┐
│ input (positional, typed) │ │ input (positional, typed) │
│ output (positional, typed) │ │ output (positional, typed) │
│ config (typed) │ │ config (typed) │
│ │ │ │
│ runs in its own container │ │ runs inline │
│ has a container lifecycle │ │ built-in graph operation │
└────────────────────────────┘ └────────────────────────────┘
The shape of the contract is identical — same positional typed slots, same typed parameters, same way of appearing as a vertex in a backend graph. The difference is operational: a component is materialised as a container at deploy time, a transformation is woven into the runtime between its neighbouring components. From the graph's perspective both are vertices; from the runtime's perspective only components have container lifecycles.
See Backend operations for the verbs that add and wire vertices of either kind.
When a transformation is the right tool
Use this framing whenever the next step is "reshape this typed stream" rather than "compute something from it".
The most common shapes a transformation handles, in roughly the order of frequency:
- Pack and unpack —
pack_recordcombines several typed streams into a single record-typed stream;unpack_recorddoes the inverse. Same for tuples (pack_tuple/unpack_tuple), named types (pack_named/unpack_named), and sum types (pack_union/unpack_union). The right primitive any time the upstream produces several pieces and the downstream wants them as one composite (or vice versa). - Reshape cardinality —
flattenturns a stream of lists into a stream of elements;lift_unrollsplits a list-typed stream into a size stream plus an element stream (andlift_rerollis the inverse, collecting elements back into lists of the given sizes);lengthproduces the count of an incoming list;repeatreads a countnon one input and a value on another, emitting that valuentimes per tick. The right primitives for "I need many where I have one" or "I need one where I have many". - Combine streams —
joinis a key-matched inner join of two streams, emitting a tuple when both produce the same key;unite_streamsmerges several same-type streams into one in arrival order;select_streampasses through the input chosen by a numeric index;shufflegroups key-value pairs by key, emitting each key with the list of values that shared it. The right primitives for fan-in patterns. - Type conversion —
convert_valueconverts one atomic value into another atomic type tick-for-tick;constantinjects a fixed configured value into the graph. The right primitives when the shapes match but the specific type does not. - Flow control —
filterpairs a value stream with aBoolpredicate stream and drops the values whose predicate tick was false;delay_by_oneemits a configured initial value first, then each tick re-emits the value seen on the previous tick. The right primitives for thinning, gating, or aligning streams. - Placeholder —
sketchstands in for a not-yet-built step; see below.
The grouping above is a reading aid, not a queryable category. The live set is discoverable with ppl component list --query=<name> like any other catalog entry.
Auto-insertion makes the connect verb smarter
Use this framing the first time ppl backend connect succeeds and the response mentions vertices you did not add.
When ppl backend connect is called with an upstream slot and a downstream slot whose shapes are compatible-by-coercion but not identical — a Double into a slot expecting (Double,), a stream of values into a slot expecting a stream of lists — the platform may insert a bridging transformation automatically. The response includes a created_vertices and created_connections field listing every implicit addition.
The platform will not invent connections it cannot type-check, and it inserts only transformations, never a component. Inspecting the response after a connect is the way to see what was added; removing an unwanted auto-insertion is the normal ppl backend disconnect plus ppl backend delete-vertex pair.
The discipline this enables is that backend authors can wire graphs at the conceptual level — "output 0 of A goes to input 0 of B" — without writing every shape adapter by hand. The platform handles the boilerplate; the author handles the design.
sketch — typed scaffolding for what is not built yet
Use this when the surrounding graph needs to be wired before a particular step exists.
sketch is the placeholder transformation. Its inputs and outputs are variadic (neither side constrains the other), and its only parameter is a free-form readme: String for whatever notes the author wants to leave for the future implementer. The graph stays type-checkable around it — the platform accepts wirings into and out of the sketch — so the rest of the work can proceed without waiting for the missing piece to be built.
The constraint that makes the design honest is that a backend containing any sketch is refused at deploy. The platform will not run a graph with placeholder vertices; the sketch has to be replaced with a real component or transformation before deployment can succeed. That refusal is the load-bearing safety property — sketches let authors compose ahead of time without ever leading to a deployment that silently does nothing.
Per-vertex configuration applies the same way
Where a transformation does take a parameter, you set it via ppl backend change-parameter, identically to components: constant takes the value to inject; delay_by_one takes the initial value emitted before the first delayed tick. Many transformations take no parameters at all — pack_record derives its field names from the downstream record type rather than from a parameter, and operands like repeat's count and filter's predicate arrive on input streams, not as configuration. The parameter contract, where present, is part of the transformation's manifest, type-checked at the change-parameter call and validated against the rest of the graph at edit time.
That uniformity matters because it keeps the mental model small. Backend authors do not need a second vocabulary for "configuring transformations vs configuring components" — it is the same vocabulary, the same verbs, the same operation log entries.
Where this fits
Transformations are the platform's primitive for pure shape changes. They keep the type system intact while taking the container lifecycle off operations that do not need one — there is no code to isolate or observe, so the platform inlines them as graph plumbing instead of materialising a container per vertex. The price is one extra concept — a kind of vertex that is not a component — and the payoff is that shape adapters stay inline, built-in graph operations rather than deployed units.
The discipline that uses them well is to reach for a transformation first whenever the work is "same data, different shape", and reach for a component only when the work is real computation. The catalog makes the distinction easy: if ppl component list --query=<keyword> already shows a transformation that does what you need, the right answer is to wire it in, not to write a component around the same job.
Related
- Types — the shape language transformations bridge between.
- Components — the containerized counterpart.
- Backend operations — the verbs that add and wire vertices.
- Backends — where transformations are composed with components.
- Solutions — how the four primitives compose into a shipped product.