When building production AI systems, evaluating model performance is crucial. Evalite provides a focused way to evaluate LLM outputs with real-time feedback, while the Vercel AI SDK makes it easy to generate text from various providers.
Setup
Install dependencies:
npm install @ai-sdk/anthropic ai evaliteAdd your API key to .env:
ANTHROPIC_API_KEY=your_key_hereExample: Country Capitals
Here’s a complete example that evaluates an LLM’s ability to answer questions about country capitals:
import { anthropic } from '@ai-sdk/anthropic';
import { generateText } from 'ai';
import { evalite } from 'evalite';
evalite('Capitals', {
data: () => [
{
input: 'What is the capital of France?',
expected: 'Paris',
},
{
input: 'What is the capital of Germany?',
expected: 'Berlin',
},
{
input: 'What is the capital of Italy?',
expected: 'Rome',
},
],
task: async (input) => {
const capitalResult = await generateText({
model: anthropic('claude-3-5-haiku-20241022'),
prompt: `
You are a helpful assistant that can answer questions about the capital of countries.
<question>
${input}
</question>
Answer the question.
Reply only with the capital of the country.
`,
});
return capitalResult.text;
},
scorers: [
{
name: 'includes',
scorer: ({ input, output, expected }) => {
return output.includes(expected!) ? 1 : 0;
},
},
],
});Components
Data: Array of test cases with input and expected values.
Task: Async function that calls the LLM and returns the generated text.
Scorers: Custom functions that evaluate output quality. The example checks if the output includes the expected answer.
Running
npm run eval:devThis starts the Evalite runner with real-time results showing pass/fail status, scores, and performance metrics.
Extending
Multiple Scorers:
scorers: [
{
name: 'includes',
scorer: ({ output, expected }) => output.includes(expected!) ? 1 : 0,
},
{
name: 'exact_match',
scorer: ({ output, expected }) => output.trim() === expected ? 1 : 0,
},
],Dynamic Data:
data: async () => {
const countries = await fetchCountriesFromAPI();
return countries.map(country => ({
input: `What is the capital of ${country.name}?`,
expected: country.capital,
}));
},