Experiments

Overview

The experiment system allows testing and evaluating assistant performance using predefined conversation data. It consists of the Experiment model and the run_experiment task that processes experiments asynchronously.

Experiment Model

The Experiment model represents a single experiment instance with the following fields:

Field	Description
assistant	Foreign key to the Assistant being tested
code	Unique UUID identifier for the experiment
name	Optional human-readable name
state	Current state: Created, Scheduled, Running, Completed, Failed
experiment_type	Type of experiment to run (see below)
input_file	CSV file containing conversation data
output_file	Generated output file (created after completion)
feedback	Error message if the experiment fails

Experiment Types

1. Conversation new message

Tests how the assistant responds to new messages in a conversation. The assistant is evaluated for each message in each conversation. Messages in a conversation are accumulated along with message metadata. Parameters are not accumulated.

2. Conversation accepted

Tests assistant behavior when a conversation is accepted. Evaluates the assistant only for the first message, with its metadata and parameters.

3. Conversation ended

Tests assistant behavior at the end of conversations. Evaluates the assistant only for the last message in a conversation, including messages and metadata from the whole conversation and the parameters for the last message.

Input File Format

The input file must be a CSV containing conversation data. Each row is a single message in a conversation; there can be multiple conversations with multiple messages.

info

Make sure to properly escape characters in CSV and JSON fields.

Example

conversation_id,role,message,metadata,parameters
1,visitor,"Hello there","{\"key\": \"value\"}","{\"param\": \"value\"}"
1,agent,"Hi! How can I help?","{\"key\": \"value\"}","{\"param\": \"value\"}"
2,visitor,"Another conversation","{\"key\": \"value\"}","{\"param\": \"value\"}"

Required columns

Column	Description
conversation_id	Unique identifier for each conversation
role	Message sender role (e.g. `visitor`, `agent`)
message	The actual message text

Optional columns

Column	Description
metadata	Stringified JSON object with additional conversation metadata
parameters	Stringified JSON object with assistant parameters

CSV format requirements

Encoding: UTF-8
Headers: Must match exactly
Separator: , (comma)
JSON fields: Must be valid JSON strings; empty JSON fields are treated as null

Output Format

The experiment generates a CSV file with these columns:

Column	Description
conversation_id	Original conversation identifier
role	Message role
message	Original message text
metadata	Original metadata (if any)
parameters	Original parameters (if any)
output	Assistant's response (only for the evaluated message)
row_id	Unique identifier for the experiment row

The output contains all original conversation data plus the assistant's response for the message that was evaluated.

How to Run an Experiment

The endpoint for starting a run is documented in the API docs (Swagger). Use the Swagger instance for the account where the experiment exists:

Staging API docs — run_experiment

Replace with your account's API docs URL if different (e.g. production or another environment).

Overview​

Experiment Model​

Experiment Types​

1. Conversation new message​

2. Conversation accepted​

3. Conversation ended​

Input File Format​

Example​

Required columns​

Optional columns​

CSV format requirements​

Output Format​

How to Run an Experiment​

See also​