Assistant Threads
Threads contain the interaction between Deepdesk assistants and the LLM (e.g. OpenAI). A thread is a list of message objects with a specific structure: system prompt, tool calls and their responses, and any extra context passed in during evaluations (conversation transcript/metadata, parameters, memory).
When threads are persisted
Threads are persisted and reused only when the assistant is evaluated in the context of a conversation. That allows:
- Keeping the history of what the assistant did across evaluations in that conversation
- Continuing from where it left off in future evaluations in the same conversation
Scoping:
- Each thread is tied to one assistant and one conversation.
- If multiple assistants are evaluated for the same conversation, each has its own thread.
- If the same assistant is evaluated for different conversations, there are separate threads.
- For evaluations without a conversation, a new thread is created for each evaluation (no persistence).
Structure of threads
A thread is an array of messages with different roles. Example:
[
{
"role": "system",
"content": "\nYou are an assistant receiving an ongoing conversation...\n\n<instructions>\nBased on the conversation return a summary...\n</instructions>\n"
},
{
"role": "user",
"content": "<conversation>\n<transcript>\n - 76c9b7570f343b0cf3a2d5a4737115694ec7d660416f233fa1bb4d2516892a6e\n</transcript>\n<metadata>\n - Internal conversation ID: 575\n - External conversation ID: f996158c-c1d4-4725-910d-691673a840c0\n - Profile: acme\n</metadata>\n</conversation>\n<parameters>\n - metadata: {}\n - platform_agent_id: 29\n</parameters>\n"
},
{
"role": "assistant",
"audio": null,
"content": "[ {\n \"code\": \"my-assistant\",\n \"name\": \"My Assistant\",\n \"response\": \"The agent greeted the customer...\"\n }\n]",
"refusal": null,
"annotations": []
}
]
Role meanings:
system— System prompt. Includes hardcoded instructions (depending on context) plus the current assistant’s instructions.user— Extra context for the evaluation: conversation transcript and metadata, parameters, and memory.assistant— LLM response. Can be plain content or tool calls; tool responses are also represented as assistant (or tool) messages.
Evaluation context
The system uses slightly different system prompt templates and user message content in three cases. In the templates, {instruction} is a placeholder for the assistant’s instructions.
| Context | What’s in user messages | Code reference |
|---|---|---|
| With conversation transcript and metadata | Transcript, metadata, parameters, memory | thread.py (transcript + metadata template) (lines 55–88) |
| With conversation metadata only | Metadata, parameters, memory (no transcript) | thread.py (metadata-only template) (lines 29–53) |
| Without conversation | Parameters and memory only | thread.py (no-conversation template) (lines 9–27) |
Anonymization of user messages
Threads are stored in the database. To avoid persisting personally identifiable information (PII), messages in the transcript are hashed before saving. At evaluation time, hashes are replaced with the real message content so the model sees readable text.
When persisted (in DB), the transcript in the user message may look like:
<transcript>
- 76c9b7570f343b0cf3a2d5a4737115694ec7d660416f233fa1bb4d2516892a6e
</transcript>
When used for evaluation (after substitution), the user message looks like:
<conversation>
<transcript>
- customer: customer message
- agent: agent message
</transcript>
<metadata>
- Internal conversation ID: 1234
- External conversation ID: 4321
- Profile: some_profile
</metadata>
</conversation>
<parameters>
- foo: bar
- lupsum: 1
</parameters>
<memory>
- key: value
- key2: value2
</memory>
Threadless assistants
An assistant can be configured as threadless: the system does not save a thread for it, regardless of evaluation context. Every evaluation then uses a new thread and no history is persisted. This is useful when you do not want to retain conversation history for that assistant.
For when to choose thread-based vs threadless, see Assistant threads vs threadless.
External reference
Example trace (with tool calls)
The following example shows a thread that includes tool calls and tool responses. The first user message has transcript and metadata; the assistant responds with two tool calls (write_to_memory, vasil-demo-ka); tool results are returned as tool role messages; then the assistant sends a final text response. A second user message follows with updated transcript and memory (including the stored demo value).
{
"kwargs": {
"messages": [
{"role": "system", "content": "You are an assistant receiving an ongoing conversation...\n\n<instructions>use write_to_memory tool with:\n- key: demo\n- data: demo\n\nthen call vasil_demo_ka</instructions>\n"},
{"role": "user", "content": "<conversation>\n<transcript>\n- agent: hello\n- customer: hello</transcript>\n<metadata>\n- Internal conversation ID: 814\n- External conversation ID: b587859b-c8b4-454a-b622-7d9c3fdbf91c\n- Profile: acme\n</metadata>\n</conversation>\n<parameters>\n- metadata: {}\n- platform_agent_id: 25\n</parameters>\n"},
{"role": "assistant", "content": null, "tool_calls": [{"id": "call_Z8UFWOwNJTWpNNl7gUQdqUA5", "type": "function", "function": {"name": "write_to_memory", "arguments": {"key": "demo", "data": "demo"}}}, {"id": "call_nfM2RHQXcLlsH0d4OnfBEpyT", "type": "function", "function": {"name": "vasil-demo-ka", "arguments": {}}}]},
{"role": "tool", "content": "OK, current memory: {'demo': 'demo'}", "tool_call_id": "call_Z8UFWOwNJTWpNNl7gUQdqUA5"},
{"role": "tool", "content": "Hello! How can I assist you today?", "tool_call_id": "call_nfM2RHQXcLlsH0d4OnfBEpyT"},
{"role": "assistant", "content": "The memory has been updated with the key \"demo\" and the value \"demo\". The agent is now ready to assist the customer."},
{"role": "user", "content": "<conversation>\n<transcript>\n- agent: how can I help\n</transcript>\n<metadata>\n- Internal conversation ID: 814\n- External conversation ID: b587859b-c8b4-454a-b622-7d9c3fdbf91c\n- Profile: acme\n</metadata>\n</conversation>\n<parameters>\n- metadata: {}\n- platform_agent_id: 25\n</parameters>\n<memory>\n- demo: demo\n</memory>\n"},
{"role": "assistant", "tool_calls": [{"id": "call_BkxGNnwO2VC9LsabYXCINkgT", "function": {"name": "write_to_memory", "arguments": {"key": "demo", "data": "demo"}}}, {"id": "call_BhH5bJoikcqKr5Vrm55yeRbu", "function": {"name": "vasil-demo-ka", "arguments": {}}}]},
{"role": "tool", "content": "OK, current memory: {'demo': 'demo'}", "tool_call_id": "call_BkxGNnwO2VC9LsabYXCINkgT"},
{"role": "tool", "content": "Hello again! What would you like to discuss or ask about today?", "tool_call_id": "call_BhH5bJoikcqKr5Vrm55yeRbu"}
],
"model": "gpt-4o-mini",
"temperature": 0,
"top_p": 1,
"tools": [...]
}
}
See also
- Assistant threads vs threadless — When to use thread-based vs threadless
- How Deepdesk constructs assistant prompts — Prompt construction and structure