Skip to main content

Experiments

Overview​

The experiment system allows testing and evaluating assistant performance using predefined conversation data. It consists of the Experiment model and the run_experiment task that processes experiments asynchronously.

Experiment Model​

The Experiment model represents a single experiment instance with the following fields:

FieldDescription
assistantForeign key to the Assistant being tested
codeUnique UUID identifier for the experiment
nameOptional human-readable name
stateCurrent state: Created, Scheduled, Running, Completed, Failed
experiment_typeType of experiment to run (see below)
input_fileCSV file containing conversation data
output_fileGenerated output file (created after completion)
feedbackError message if the experiment fails

Experiment Types​

1. Conversation new message​

Tests how the assistant responds to new messages in a conversation. The assistant is evaluated for each message in each conversation. Messages in a conversation are accumulated along with message metadata. Parameters are not accumulated.

2. Conversation accepted​

Tests assistant behavior when a conversation is accepted. Evaluates the assistant only for the first message, with its metadata and parameters.

3. Conversation ended​

Tests assistant behavior at the end of conversations. Evaluates the assistant only for the last message in a conversation, including messages and metadata from the whole conversation and the parameters for the last message.

Input File Format​

The input file must be a CSV containing conversation data. Each row is a single message in a conversation; there can be multiple conversations with multiple messages.

info

Make sure to properly escape characters in CSV and JSON fields.

Example​

conversation_id,role,message,metadata,parameters
1,visitor,"Hello there","{\"key\": \"value\"}","{\"param\": \"value\"}"
1,agent,"Hi! How can I help?","{\"key\": \"value\"}","{\"param\": \"value\"}"
2,visitor,"Another conversation","{\"key\": \"value\"}","{\"param\": \"value\"}"

Required columns​

ColumnDescription
conversation_idUnique identifier for each conversation
roleMessage sender role (e.g. visitor, agent)
messageThe actual message text

Optional columns​

ColumnDescription
metadataStringified JSON object with additional conversation metadata
parametersStringified JSON object with assistant parameters

CSV format requirements​

  • Encoding: UTF-8
  • Headers: Must match exactly
  • Separator: , (comma)
  • JSON fields: Must be valid JSON strings; empty JSON fields are treated as null

Output Format​

The experiment generates a CSV file with these columns:

ColumnDescription
conversation_idOriginal conversation identifier
roleMessage role
messageOriginal message text
metadataOriginal metadata (if any)
parametersOriginal parameters (if any)
outputAssistant's response (only for the evaluated message)
row_idUnique identifier for the experiment row

The output contains all original conversation data plus the assistant's response for the message that was evaluated.

How to Run an Experiment​

The endpoint for starting a run is documented in the API docs (Swagger). Use the Swagger instance for the account where the experiment exists:

Replace with your account's API docs URL if different (e.g. production or another environment).

See also​