KyroJudge Overview
The core evaluation engine. Define LLM-powered judges in YAML, run them as a DAG, and get structured pass/fail results.
TypeScript implementation of Kyro – The programmable evaluation layer for LLM applications
Table of Contents
- /judge
Overview
Kyro is a programmable evaluation framework that lets you define complex multi-agent judging pipelines using YAML configuration files. Perfect for testing AI applications, evaluating LLM outputs, and ensuring quality in production.
Key Features
- Declarative YAML Configuration – Define evaluation pipelines without writing code
- DAG-Based Orchestration – Automatic dependency resolution and parallel execution
- Multi-Provider Support – Works with Gemini, OpenAI, Azure OpenAI, Ollama
- Structured Prompts – Build XML-formatted prompts with automatic multiline formatting
- Runtime Variables – Override variable defaults at evaluation time
- Type-Safe – Full TypeScript support with strict validation
- Test Framework Integration – Works seamlessly with Jest and Vitest
Installation
Or install the unified entry point which re-exports everything:
Quick Start
1. Create a Configuration
Create a file kyro.config.yml:
2. Initialize Judge
3. Run Evaluation
Configuration
Configuration File Structure
A Kyro configuration has three required top-level fields:
Judges
Judges are AI evaluators that analyze your input. Each judge has:
prompt– The evaluation instruction (inline, file, or structured)variables– Optional variables for dynamic prompts
Inline Prompts
Simple string prompts:
File-Based Prompts
Reference external prompt files:
The file path is relative to the configuration file directory.
Structured Prompts
Build XML-formatted prompts with multiple sections:
Generated Output:
Key Features:
- Multiline content is automatically indented (2 spaces per line)
- Single-line content stays on one line
- Variables are interpolated before formatting
- Each section becomes an XML tag
Variables
Each variable in a judge is a definition object with a type, an optional default, and an optional required flag:
Variable definition fields:
| Field | Required | Description |
|---|---|---|
type | Yes | string, number, or boolean |
default | No | Value used when no runtime value is provided |
required | No | If true and no runtime value / default, throws at evaluation time |
Interpolation:
- Variables use
${variableName}syntax - Undefined variables are replaced with empty strings
- Works in all prompt types (inline, file, structured)
Runtime Variables
Pass values at evaluation time via run() to override defaults:
Variables not defined in the judge's variables block but present in the runtime object are still interpolated if the prompt references them:
Variable Precedence:
- Runtime value (highest) – passed to
run() defaultfrom the variable definition- Empty string (if the variable is not required and has no value)
Pipeline Steps
Pipeline steps define execution order and dependencies.
Basic Step
Step with Dependencies
Step with Failure Handling
Subagent Steps
Run multiple judges in parallel:
Execution Flow:
- Steps without dependencies run immediately
- Steps with dependencies wait for all dependencies to complete
- Subagents within a step run in parallel
- On failure,
on_failuresteps are triggered
Schema Validation
All configuration files are validated against JSON Schema:
Rules:
versionmust be a string or number- Judge names must be
UPPERCASE_WITH_UNDERSCORES(regex:^[A-Z_]+$) - Each step must have either
judgeORsubagents(mutually exclusive) - Step IDs must be unique
- Referenced judges must exist
Validation errors provide detailed messages:
API Reference
Judge
Main class for running evaluations.
Constructor
Parameters:
configPath– Path to YAML configuration fileprovider– An AI provider instance (useProviderFactory.create()or instantiate directly)
Example:
run()
Execute the evaluation pipeline.
Parameters:
input– Input to evaluate (string,.jsonfile, or.txtfile)variables?– Runtime values that override variable defaults defined in the YAML
Returns:
Example:
Types
ModelConfig
VariableValue
VariableDefinition
KyroResult
ExecutionResult
Provider Configuration
All providers are instantiated via ProviderFactory.create(config) or directly via their class constructors.
Gemini
OpenAI
Azure OpenAI
Ollama
For local models:
Testing Integration
Jest / Vitest
Examples
See the main README for complete examples:
- Customer Service Quality Assurance
- Multi-Agent Evaluation
Related Packages
/batch– For large-scale offline evaluation using the OpenAI Batch API. Same judge definitions, async workflow, 50% cheaper.
Contributing
Contributions are welcome! Please see the main CONTRIBUTING.md for guidelines.
License
MIT License - see LICENSE for details.
Need help? Open an issue