
Generate Text (Ollama)
Ollama-backed chat completion. The worker takes a user-prompt {{type:String}} and a system-prompt {{type:String}}, replays the past {{param:history_messages}} turns, and chats with the Ollama-served model at {{param:model_name}} at {{param:temperature}}; output is either streamed chunk-by-chunk under {{param:streaming}} or returned whole, with a final {{type:StreamEnd}} marker when {{param:emit_stream_end}} is true.
How it fits
({{type:String}} user_prompt, {{type:String}} system_prompt) -> {{component:generate_text_ollama}} -> ({{type:String}} chunks, {{type:StreamEnd}})
|
+-- ensure the Ollama backend has pulled {{param:model_name}} at startup
+-- chat at {{param:temperature}} with up to {{param:history_messages}} prior turns
+-- emit chunks under {{param:streaming}}; close with {{type:StreamEnd}} when {{param:emit_stream_end}} is true
+-- abort in-flight generation on new input when {{param:preempt_on_new_input}} is true
Pick this for fully-local LLM chat under your own infrastructure. For a vLLM-hosted alternative prefer {{component:generate_text_sglang}} or {{component:generate_text_vllm}}.
Typical backends
- Voice assistant: {{component:transcribe_audio_faster_whisper}} -> {{component:generate_text_ollama}} -> {{component:generate_speech_kokoro}} -> {{component:output_audio_file}}.
- Document Q&A: {{component:extract_text_paddleocr}} -> {{component:generate_text_ollama}} -> {{component:send_http}}.
- Caption then summarise: {{component:generate_image_caption_blip}} -> {{component:generate_text_ollama}} -> {{component:output_browser_stream}}.
Caveats
- A local Ollama backend MUST be reachable from the worker. The component declares
ollamaas a dependency; the worker reads the backend address from theOLLAMA_HOSTandOLLAMA_PORTenv vars at startup. Both vars MUST be set or the worker aborts at startup. - {{param:model_name}} is pulled at startup with progress streamed from the backend; the worker calls
showfirst and only pulls when the model is missing. An unreachable backend or a non-existent model aborts startup. - The worker accepts ONE user prompt per tick; multiple prompts in the same input is a HARD ERROR.
- {{param:streaming}}
trueemits chunks progressively on the first output port;falseemits the whole response in a single chunk at the end. - {{param:emit_stream_end}}
truecloses every generation with a {{type:StreamEnd}} marker on the second output port — downstream consumers that need an end-of-message signal MUST keep this on. - {{param:preempt_on_new_input}}
true(default) cancels in-flight generation when a new prompt arrives; the partial answer is appended to history and the user sees a truncated stream. - {{param:history_messages}}
0keeps no history (stateless mode),-1keeps all messages seen on the session, and a positive value keeps the last N. Stateless mode is required for {{param:max_parallel}}> 1. - {{param:max_parallel}} caps concurrent LLM requests; it is only EFFECTIVE when {{param:history_messages}} is
0. With non-zero history the worker runs SEQUENTIALLY regardless of {{param:max_parallel}}. - An internal prompt queue is fixed at 16 entries; under sustained back-pressure the OLDEST queued prompt is dropped — fast bursts can lose messages.
- {{param:display_reasoning}}
trueinjects a system-prompt suffix instructing the model to emit[Reasoning]and[Answer]blocks; the worker does NOT enforce the structure — it relies on the model to follow the hint, and it costs significant tokens and latency. - {{param:temperature}} is the only model sampling knob exposed; downstream prompt engineering must cover anything else (top_p, top_k, etc.).
- The history accumulator is PROCESS-LOCAL; a restart resets the conversation.
- All eight config keys are captured ONCE at startup; runtime changes have NO effect and require a redeploy.
Sürümler
- 74be85e8latestdefaultlinux/amd64
Automated release

