Generate Text (Ollama)

1 VersionGenerate

Ollama-backed chat completion. The worker takes a user-prompt {{type:String}} and a system-prompt {{type:String}}, replays the past {{param:history_messages}} turns, and chats with the Ollama-served model at {{param:model_name}} at {{param:temperature}}; output is either streamed chunk-by-chunk under {{param:streaming}} or returned whole, with a final {{type:StreamEnd}} marker when {{param:emit_stream_end}} is true.

How it fits

({{type:String}} user_prompt, {{type:String}} system_prompt) -> {{component:generate_text_ollama}} -> ({{type:String}} chunks, {{type:StreamEnd}})
                          |
                          +-- ensure the Ollama backend has pulled {{param:model_name}} at startup
                          +-- chat at {{param:temperature}} with up to {{param:history_messages}} prior turns
                          +-- emit chunks under {{param:streaming}}; close with {{type:StreamEnd}} when {{param:emit_stream_end}} is true
                          +-- abort in-flight generation on new input when {{param:preempt_on_new_input}} is true

Pick this for fully-local LLM chat under your own infrastructure. For a vLLM-hosted alternative prefer {{component:generate_text_sglang}} or {{component:generate_text_vllm}}.

Typical backends

Voice assistant: {{component:transcribe_audio_faster_whisper}} -> {{component:generate_text_ollama}} -> {{component:generate_speech_kokoro}} -> {{component:output_audio_file}}.
Document Q&A: {{component:extract_text_paddleocr}} -> {{component:generate_text_ollama}} -> {{component:send_http}}.
Caption then summarise: {{component:generate_image_caption_blip}} -> {{component:generate_text_ollama}} -> {{component:output_browser_stream}}.

Caveats

A local Ollama backend MUST be reachable from the worker. The component declares ollama as a dependency; the worker reads the backend address from the OLLAMA_HOST and OLLAMA_PORT env vars at startup. Both vars MUST be set or the worker aborts at startup.
{{param:model_name}} is pulled at startup with progress streamed from the backend; the worker calls show first and only pulls when the model is missing. An unreachable backend or a non-existent model aborts startup.
The worker accepts ONE user prompt per tick; multiple prompts in the same input is a HARD ERROR.
{{param:streaming}} true emits chunks progressively on the first output port; false emits the whole response in a single chunk at the end.
{{param:emit_stream_end}} true closes every generation with a {{type:StreamEnd}} marker on the second output port — downstream consumers that need an end-of-message signal MUST keep this on.
{{param:preempt_on_new_input}} true (default) cancels in-flight generation when a new prompt arrives; the partial answer is appended to history and the user sees a truncated stream.
{{param:history_messages}} 0 keeps no history (stateless mode), -1 keeps all messages seen on the session, and a positive value keeps the last N. Stateless mode is required for {{param:max_parallel}} > 1.
{{param:max_parallel}} caps concurrent LLM requests; it is only EFFECTIVE when {{param:history_messages}} is 0. With non-zero history the worker runs SEQUENTIALLY regardless of {{param:max_parallel}}.
An internal prompt queue is fixed at 16 entries; under sustained back-pressure the OLDEST queued prompt is dropped — fast bursts can lose messages.
{{param:display_reasoning}} true injects a system-prompt suffix instructing the model to emit [Reasoning] and [Answer] blocks; the worker does NOT enforce the structure — it relies on the model to follow the hint, and it costs significant tokens and latency.
{{param:temperature}} is the only model sampling knob exposed; downstream prompt engineering must cover anything else (top_p, top_k, etc.).
The history accumulator is PROCESS-LOCAL; a restart resets the conversation.
All eight config keys are captured ONCE at startup; runtime changes have NO effect and require a redeploy.

Versions

74be85e8latestdefaultlinux/amd64
Automated release
5/8/2026