Overview
Features
MCP tool call translation
Translates MCP tool calls from clients into authenticated HTTPS requests to VisionAgent REST APIs.
Response streaming
Streams JSON results and any base64 media (images, masks) back to the MCP client.
OpenAPI-driven tool map
Fetches VisionAgent OpenAPI spec and auto-generates the MCP tool map with validation schemas.
Argument validation with Zod
Validates incoming tool arguments against Zod schemas derived from the live OpenAPI spec.
File-based arg handling
Reads and base64-encodes file-based arguments (e.g., imagePath, pdfPath) for upload.
Output visualization & storage
Optionally post-processes outputs (masks, boxes, depth maps) and saves to OUTPUT_DIRECTORY.
Local, private operation
Runs locally on STDIN/STDOUT with no telemetry; data is only sent to VisionAgent APIs.
Developer tooling & quick start
Includes generate-tools script and example MCP client configuration for easy setup.
Who Is This For?
- MCP clients:Developers using Claude Desktop, Cursor, Cline, and other MCP clients to access vision/document-analysis tools via VisionAgent without custom REST code.
- LLM engineers:LLM teams integrating image/video/document reasoning capabilities into workflows through textual prompts.
- ML/AI integrators:Integrators building local toolchains that rely on VisionAgent endpoints for vision tasks.




