ADK.Eval (ADK v0.0.1-alpha.1)

Copy Markdown View Source

Evaluation framework for testing agent quality.

Run an agent against a suite of test cases and score the responses using pluggable scorers.

Quick start

alias ADK.Eval
alias ADK.Eval.{Case, Scorer}

cases = [
  Case.new(
    name: "greeting",
    input: "Say hello",
    scorers: [
      {Scorer.Contains, text: "hello", case_sensitive: false},
      {Scorer.ResponseLength, min: 1, max: 500}
    ]
  )
]

runner = ADK.Runner.new(app_name: "test", agent: my_agent)
report = Eval.run(runner, cases)
IO.puts(Eval.Report.format(report))

Options for run/3

  • :threshold - minimum aggregate score to pass a case (default: 1.0, meaning all scorers must pass)
  • :user_id - user ID for the session (default: "eval_user")

Summary

Functions

Run an agent against a list of eval cases, returning a report.

Functions

run(runner, cases, opts \\ [])

Run an agent against a list of eval cases, returning a report.