ADK.Eval (ADK v0.0.1)

Evaluation framework for testing agent quality.

Run an agent against a suite of test cases and score the responses using pluggable scorers.

Quick start

alias ADK.Eval
alias ADK.Eval.{Case, Scorer}

cases = [
  Case.new(
    name: "greeting",
    input: "Say hello",
    scorers: [
      {Scorer.Contains, text: "hello", case_sensitive: false},
      {Scorer.ResponseLength, min: 1, max: 500}
    ]
  )
]

runner = ADK.Runner.new(app_name: "test", agent: my_agent)
report = Eval.run(runner, cases)
IO.puts(Eval.Report.format(report))

Options for `run/3`

:threshold - minimum aggregate score to pass a case (default: 1.0, meaning all scorers must pass)
:user_id - user ID for the session (default: "eval_user")

Summary

Functions

run(runner, cases, opts \\ [])

Run an agent against a list of eval cases, returning a report.

Functions

run(runner, cases, opts \\ [])

@spec run(ADK.Runner.t(), [ADK.Eval.Case.t()], keyword()) :: ADK.Eval.Report.t()

Run an agent against a list of eval cases, returning a report.

ADK.Eval (ADK v0.0.1)

Quick start

Options for run/3

Summary

Functions

Functions

run(runner, cases, opts \\ [])

Options for `run/3`