KyroKyro

KyroBatch Overview

Async batch evaluation at scale. Submit thousands of transcripts via provider Batch APIs, exit your process, and poll later.

Async batch evaluation for LLM applications using the OpenAI Batch API

Evaluate thousands of transcripts at 50% cost compared to synchronous requests. Submit a job, let your process exit, and poll for results later — state is automatically persisted to disk.

Table of Contents

Overview

/batch is designed for offline, large-scale evaluation workflows:

  • Async by designsubmit() returns a job ID immediately; check() polls without blocking
  • Process-exit safe – state is saved to .kyro/<config>.batch-state.json before your process exits
  • Multiple runs – run each judge N times and aggregate via majority vote for more reliable results
  • Cost-efficient – uses the OpenAI Batch API (50% cheaper than synchronous calls)

Installation

npm install /batch

Quick Start

1. Define Judges

Create judges.yml — only version and judges are needed (no pipeline):

version: 1
 
judges:
  QUALITY_CHECK:
    prompt: "Evaluate the quality of this ${type} support conversation."
    variables:
      type:
        type: string
        default: "customer"
 
  SAFETY_CHECK:
    prompt: "Check if this conversation contains any harmful content."

Prompt types are the same as /judge: inline strings, file paths (./prompt.txt), or structured XML objects.

2. Submit a Batch Job

import { BatchJudge, OpenAIBatchProvider } from '@kyro/batch';
 
const provider = new OpenAIBatchProvider({
  model: 'gpt-4o-mini',
  apiKey: process.env.OPENAI_API_KEY,
});
 
const batch = new BatchJudge(provider, './judges.yml');
 
const jobId = await batch.submit(
  [
    './transcripts/conv-001.json',
    './transcripts/conv-002.txt',
    'User: Hi\nAssistant: Hello!',  // inline string
  ],
  {
    runs: 3,
    aggregation: 'majority_vote',
    variables: { type: 'car' },
  }
);
 
console.log('Job submitted:', jobId);
// State saved to .kyro/judges.batch-state.json — safe to exit now

3. Poll for Results

Run this on a schedule (cron job, queue worker, etc.) until the result is non-null:

import { BatchJudge, OpenAIBatchProvider } from '@kyro/batch';
 
const batch = new BatchJudge(provider, './judges.yml');
 
// Pass jobId explicitly, or omit to load from the saved state file
const result = await batch.check(process.env.BATCH_JOB_ID);
 
if (result === null) {
  console.log('Still processing — check again later.');
  process.exit(0);
}
 
console.log('Status:', result.status); // 'SUCCESS' | 'PARTIAL' | 'ERROR'
console.log('Token usage:', result.usage);
 
for (const transcript of result.results) {
  console.log(`\n--- ${transcript.transcript} ---`);
  for (const [judgeName, evaluation] of Object.entries(transcript.judges)) {
    console.log(`  ${judgeName}: ${evaluation.aggregated.status}`);
    console.log(`  rootCause: ${evaluation.aggregated.rootCause}`);
    evaluation.runs.forEach(r => console.log(`    run ${r.run}: ${r.status}`));
  }
}

Configuration

Judges Config

The YAML config for batch only requires version and judges:

version: 1
 
judges:
  JUDGE_NAME:
    prompt: "Inline prompt with optional ${variable} interpolation"
    variables:
      variable:
        type: string        # 'string' | 'number' | 'boolean'
        default: "value"    # used when no runtime value is provided
        required: false     # if true, must be supplied at runtime

Prompt variants:

judges:
  # Inline string
  SIMPLE_CHECK:
    prompt: "Is this conversation helpful?"
 
  # File reference (relative to the judges.yml directory)
  FILE_CHECK:
    prompt: "./prompts/detailed-check.txt"
 
  # Structured XML sections
  STRUCTURED_CHECK:
    prompt:
      role: "You are an expert quality analyst."
      task: "Evaluate whether the agent resolved the user's issue."

Run Options

interface RunOptions {
  runs?: number;                    // Number of times to run each judge (default: 1)
  aggregation?: 'majority_vote';   // Aggregation strategy for multiple runs
  variables?: Record<string, string | number | boolean>;  // Runtime variable overrides
  webhook?: { url: string; secret?: string };             // Optional webhook on completion
}

API Reference

BatchJudge

new BatchJudge(provider: BatchProvider, judgesConfigPath: string, options?: { stateDir?: string })

Parameters:

ParameterTypeDescription
providerBatchProviderProvider instance (e.g. OpenAIBatchProvider)
judgesConfigPathstringPath to the judges YAML config file
options.stateDirstring?Custom directory for state files (default: .kyro/ next to the config file)

submit()

async submit(transcripts: string[], options?: RunOptions): Promise<string>

Submits all transcripts × all judges × runs as a single batch job. Returns the OpenAI batch job ID. State is saved automatically before returning.

Transcripts can be:

  • Inline strings
  • Paths to .json files (parsed and stringified)
  • Paths to .txt files (read as-is)

check()

async check(jobId?: string, options?: RunOptions): Promise<BatchRunResult | null>

Single-poll check — does not block or retry. Returns:

  • null — job is still in progress
  • BatchRunResult — job is complete (state file is cleared automatically)
  • Throws — job ended with a non-completed terminal status (failed, expired, cancelled)

If jobId is omitted, it is loaded from the state file saved by submit().

OpenAIBatchProvider

new OpenAIBatchProvider({ model: string; apiKey: string; clientConfig?: Record<string, unknown> })

Wraps the OpenAI Batch API. Only OpenAI is currently supported for batch.

Types

BatchRunResult

interface BatchRunResult {
  status: 'SUCCESS' | 'PARTIAL' | 'ERROR';
  batchJobId: string;
  completedAt: string;            // ISO 8601
  results: TranscriptEvaluation[];
  usage: { inputTokens: number; outputTokens: number };
}

Overall status logic:

  • SUCCESS — all judge evaluations passed (SUCCESS or NA)
  • ERROR — all judge evaluations failed
  • PARTIAL — mix of successes and failures

TranscriptEvaluation

interface TranscriptEvaluation {
  transcript: string;                          // Path or inline identifier
  judges: Record<string, JudgeEvaluation>;
}
 
interface JudgeEvaluation {
  aggregated: {
    status: 'SUCCESS' | 'ERROR' | 'NA';
    rootCause: string;
    thinkingPath: string;
  };
  runs: RunEvaluation[];   // One entry per run
}
 
interface RunEvaluation {
  run: number;             // 1-indexed
  status: 'SUCCESS' | 'ERROR' | 'NA';
  rootCause: string;
  thinkingPath: string;
}

BatchValidationError

Thrown on config or state file errors:

class BatchValidationError extends Error {
  errors?: unknown[];
}

State Management

After submit(), state is saved to:

<config-dir>/.kyro/<config-basename>.batch-state.json

For example, with ./judges.yml the state file is .kyro/judges.batch-state.json.

The state file stores the batch job ID and the full request map so that check() can reconstruct results without re-reading the transcripts. It is automatically deleted after a successful check().

To use a custom state directory:

const batch = new BatchJudge(provider, './judges.yml', { stateDir: '/tmp/kyro-state' });

Examples

See examples/javascript/batch/ for a complete working example including transcripts and a multi-step submit/check script.

License

MIT License - see LICENSE for details.