Experiments
Overviewβ
The experiment system allows testing and evaluating assistant performance using predefined conversation data. It consists of the Experiment model and the run_experiment task that processes experiments asynchronously.
Experiment Modelβ
The Experiment model represents a single experiment instance with the following fields:
| Field | Description |
|---|---|
| assistant | Foreign key to the Assistant being tested |
| code | Unique UUID identifier for the experiment |
| name | Optional human-readable name |
| state | Current state: Created, Scheduled, Running, Completed, Failed |
| experiment_type | Type of experiment to run (see below) |
| input_file | CSV file containing conversation data |
| output_file | Generated output file (created after completion) |
| feedback | Error message if the experiment fails |
Experiment Typesβ
1. Conversation new messageβ
Tests how the assistant responds to new messages in a conversation. The assistant is evaluated for each message in each conversation. Messages in a conversation are accumulated along with message metadata. Parameters are not accumulated.
2. Conversation acceptedβ
Tests assistant behavior when a conversation is accepted. Evaluates the assistant only for the first message, with its metadata and parameters.
3. Conversation endedβ
Tests assistant behavior at the end of conversations. Evaluates the assistant only for the last message in a conversation, including messages and metadata from the whole conversation and the parameters for the last message.
Input File Formatβ
The input file must be a CSV containing conversation data. Each row is a single message in a conversation; there can be multiple conversations with multiple messages.
Make sure to properly escape characters in CSV and JSON fields.
Exampleβ
conversation_id,role,message,metadata,parameters
1,visitor,"Hello there","{\"key\": \"value\"}","{\"param\": \"value\"}"
1,agent,"Hi! How can I help?","{\"key\": \"value\"}","{\"param\": \"value\"}"
2,visitor,"Another conversation","{\"key\": \"value\"}","{\"param\": \"value\"}"
Required columnsβ
| Column | Description |
|---|---|
| conversation_id | Unique identifier for each conversation |
| role | Message sender role (e.g. visitor, agent) |
| message | The actual message text |
Optional columnsβ
| Column | Description |
|---|---|
| metadata | Stringified JSON object with additional conversation metadata |
| parameters | Stringified JSON object with assistant parameters |
CSV format requirementsβ
- Encoding: UTF-8
- Headers: Must match exactly
- Separator:
,(comma) - JSON fields: Must be valid JSON strings; empty JSON fields are treated as
null
Output Formatβ
The experiment generates a CSV file with these columns:
| Column | Description |
|---|---|
| conversation_id | Original conversation identifier |
| role | Message role |
| message | Original message text |
| metadata | Original metadata (if any) |
| parameters | Original parameters (if any) |
| output | Assistant's response (only for the evaluated message) |
| row_id | Unique identifier for the experiment row |
The output contains all original conversation data plus the assistant's response for the message that was evaluated.
How to Run an Experimentβ
The endpoint for starting a run is documented in the API docs (Swagger). Use the Swagger instance for the account where the experiment exists:
Replace with your account's API docs URL if different (e.g. production or another environment).
See alsoβ
- Assistants β Overview of assistants
- Assistants Developer Guide β APIs and integration