Evaluation framework for testing agent quality.
Run an agent against a suite of test cases and score the responses using pluggable scorers.
Quick start
alias ADK.Eval
alias ADK.Eval.{Case, Scorer}
cases = [
Case.new(
name: "greeting",
input: "Say hello",
scorers: [
{Scorer.Contains, text: "hello", case_sensitive: false},
{Scorer.ResponseLength, min: 1, max: 500}
]
)
]
runner = ADK.Runner.new(app_name: "test", agent: my_agent)
report = Eval.run(runner, cases)
IO.puts(Eval.Report.format(report))Options for run/3
:threshold- minimum aggregate score to pass a case (default: 1.0, meaning all scorers must pass):user_id- user ID for the session (default: "eval_user")
Summary
Functions
Run an agent against a list of eval cases, returning a report.
Functions
@spec run(ADK.Runner.t(), [ADK.Eval.Case.t()], keyword()) :: ADK.Eval.Report.t()
Run an agent against a list of eval cases, returning a report.