mclenhard/mcp-evals
A Node.js package and GitHub Action for evaluating MCP (Model Context Protocol) tool implementations using LLM-based scoring. This helps ensure your MCP server's tools are working correctly and performing well.
Platform-specific configuration:
{
"mcpServers": {
"mcp-evals": {
"command": "npx",
"args": [
"-y",
"mcp-evals"
]
}
}
}Add the config above to .claude/settings.json under the mcpServers key.
A Node.js package and GitHub Action for evaluating MCP (Model Context Protocol) tool implementations using LLM-based scoring, with built-in observability support. This helps ensure your MCP server's tools are working correctly, performing well, and are fully observable with integrated monitoring and metrics.
npm install mcp-evalsAdd the following to your workflow file:
name: Run MCP Evaluations
on:
pull_request:
types: [opened, synchronize, reopened]
jobs:
evaluate:
runs-on: ubuntu-latest
permissions:
contents: read
pull-requests: write
steps:
- uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
- name: Install dependencies
run: npm install
- name: Run MCP Evaluations
uses: mclenhard/mcp-evals@v1.0.9
with:
evals_path: 'src/evals/evals.ts' # Can also use .yaml files
server_path: 'src/index.ts'
openai_api_key: ${{ secrets.OPENAI_API_KEY }}
model: 'gpt-4' # Optional, defaults to gpt-4You can create evaluation configurations in either TypeScript or YAML format.
Create a file (e.g., evals.ts) that exports your evaluation configuration:
import { EvalConfig } from 'mcp-evals';
import { openai } from "@ai-sdk/openai";
import { grade, EvalFunction} from "mcp-evals";
const weatherEval: EvalFunction = {
name: 'Weather Tool Evaluation',
description: 'Evaluates the accuracy and completeness of weather information retrieval',
run: async () => {
const result = await grade(openai("gpt-4"), "What is the weather in New York?");
return JSON.parse(result);
}
};
const config: EvalConfig = {
model: openai("gpLoading reviews...