Overview
Features
Single-criterion evaluation (evaluate_llm_response)
Evaluates an LLM's response against a single evaluation criterion and returns a dictionary with a numeric score and a textual critique.
Multi-criteria evaluation (evaluate_llm_response_on_multiple_criteria)
Evaluates an LLM's response against multiple criteria and returns a list of dictionaries with scores and critiques.
Standardized MCP interface
Provides a standardized interface for LLMs to interact with the Atla API via MCP.
Atla evaluation model backend
Uses the Atla evaluation model to produce scoring and actionable feedback for LLM outputs.
API key required
Requires an Atla API key to operate the MCP server.
Local server startup via uvx
Can be started manually in a Python environment using uvx with the command pointing to atla-mcp-server.
OpenAI Agents SDK integration
Provides guidance and examples for connecting the MCP server to OpenAI Agents SDK clients.
Claude Desktop & Cursor configuration templates
Includes configuration snippets for Claude Desktop and Cursor to integrate the MCP server.
Who Is This For?
- Developers:Integrate Atla-based LLM evaluation into MCP-enabled workflows for testing and benchmarking.




