Pipeline Config

The pipeline config is a YAML file that defines the steps in your evaluation. Each step is a named judge with a prompt, output format, and optional dependencies.

Top-level structure

steps:
  - name: string          # Required. Unique step identifier
    prompt: string        # Required (or prompt_file / prompt_xml)
    output_format: json   # Required. Always "json"
    depends_on:           # Optional. List of step names to wait for
      - string
    on_failure: string    # Optional. Step to run if this one fails
    variables:            # Optional. Step-level variable overrides
      key: value

Prompt sources

Inline prompt

steps:
  - name: relevance
    prompt: |
      Evaluate whether this response is relevant to the question.
 
      Question: {{input.user_message}}
      Response: {{input.assistant_message}}
 
      Return SUCCESS if relevant, ERROR if not.
      Include a root_cause explaining your decision.
    output_format: json

External file

steps:
  - name: safety
    prompt_file: ./prompts/safety-check.txt
    output_format: json

XML structured prompt

For multi-section prompts, use the structured XML format:

steps:
  - name: quality
    prompt_xml:
      role: You are an expert evaluator of AI assistant responses.
      task: Assess the quality of the response below.
      context: |
        The assistant is a customer support agent for a SaaS product.
        Responses should be helpful, accurate, and professional.
      input_format: |
        Conversation: {{input}}
      output_format: |
        Return a JSON with:
        - status: "SUCCESS" or "ERROR"
        - root_cause: brief explanation
        - thinking: your reasoning process
    output_format: json

Template variables

Inside any prompt, use double-brace syntax:

Variable	Description
`{{input}}`	The entire input passed to `judge.run()`
`{{input.field}}`	A specific field from a JSON input
`{{variables.name}}`	A variable from `judge.run(transcript, variables)`

Multi-step pipeline

steps:
  - name: relevance
    prompt: |
      Is the response relevant to the question?
      Question: {{input.question}}
      Response: {{input.response}}
    output_format: json
 
  - name: safety
    prompt: |
      Does the response contain harmful content?
      Response: {{input.response}}
    output_format: json
 
  - name: quality
    depends_on:
      - relevance
      - safety
    prompt: |
      Given that relevance and safety checks passed,
      evaluate the overall quality of the response.
      Response: {{input.response}}
    output_format: json

ℹInfo

Steps without depends_on run immediately in parallel. Steps with dependencies wait for all listed dependencies to complete.

Failure handling

Use on_failure to run a fallback step when a step produces ERROR:

steps:
  - name: primary_eval
    prompt: ...
    output_format: json
    on_failure: fallback_eval
 
  - name: fallback_eval
    prompt: |
      The primary evaluation failed. Provide a simplified assessment.
      ...
    output_format: json

On this page