What features does VisionAgent MCP MCP server provide?

Q: What features does VisionAgent MCP MCP server provide?

MCP tool call translation: Translates MCP tool calls from clients into authenticated HTTPS requests to VisionAgent REST APIs.. Response streaming: Streams JSON results and any base64 media (images, masks) back to the MCP client.. OpenAPI-driven tool map: Fetches VisionAgent OpenAPI spec and auto-generates the MCP tool map with validation schemas.. Argument validation with Zod: Validates incoming tool arguments against Zod schemas derived from the live OpenAPI spec.. File-based arg handling: Reads and base64-encodes file-based arguments (e.g., imagePath, pdfPath) for upload.. Output visualization & storage: Optionally post-processes outputs (masks, boxes, depth maps) and saves to OUTPUT_DIRECTORY.. Local, private operation: Runs locally on STDIN/STDOUT with no telemetry; data is only sent to VisionAgent APIs.. Developer tooling & quick start: Includes generate-tools script and example MCP client configuration for easy setup.

VisionAgent MCP MCP Server 2026: Features & Setup Guide

Overview

VisionAgent MCP Server is a lightweight side-car MCP server that runs locally on STDIN/STDOUT, translating each tool call from an MCP-compatible client (Claude Desktop, Cursor, Cline, etc.) into authenticated HTTPS requests to Landing AI’s VisionAgent REST APIs. The response JSON, plus any images or masks, is streamed back to the model so that you can issue natural-language computer-vision and document-analysis commands from your editor without writing custom REST code or loading an extra SDK. v0.1 adds support for agentic-document-analysis (PDFs/images text extraction), text-to-object-detection (bounding boxes), text-to-instance-segmentation (pixel masks), activity-recognition (video actions with timestamps), and depth-pro (monocular depth maps). The server validates inputs with Zod schemas derived from the live OpenAPI, auto-generates tool definitions via generate-tools, reads file-based inputs, and forwards authenticated requests to VisionAgent. Outputs can be saved and visualized to OUTPUT_DIRECTORY when IMAGE_DISPLAY_ENABLED is true. It runs locally, with no telemetry, and is configurable via a minimal JSON config and sample MCP client entries.

Features

MCP tool call translation

Translates MCP tool calls from clients into authenticated HTTPS requests to VisionAgent REST APIs.

Response streaming

Streams JSON results and any base64 media (images, masks) back to the MCP client.

OpenAPI-driven tool map

Fetches VisionAgent OpenAPI spec and auto-generates the MCP tool map with validation schemas.

Argument validation with Zod

Validates incoming tool arguments against Zod schemas derived from the live OpenAPI spec.

File-based arg handling

Reads and base64-encodes file-based arguments (e.g., imagePath, pdfPath) for upload.

Output visualization & storage

Optionally post-processes outputs (masks, boxes, depth maps) and saves to OUTPUT_DIRECTORY.

Local, private operation

Runs locally on STDIN/STDOUT with no telemetry; data is only sent to VisionAgent APIs.

Developer tooling & quick start

Includes generate-tools script and example MCP client configuration for easy setup.