Overview
Features
locate_objects
Detect and locate objects in an image using HuggingFace zero-shot object detection pipelines; inputs include image_path, candidate_labels, and an optional hf_model (defaults to google/owlvit-large-patch14); returns results in HF object-detection format.
zoom_to_object
Crop the image to the bounding box of a specified label and return the cropped image; if multiple objects match, returns the best-scoring one; returns MCPImage or None.
Default and configurable models
Defaults to google/owlvit-large-patch14 but supports specifying hf_model for other models; slower on CPU.
Hardware support
Supports GPU-accelerated runs (Docker with NVIDIA runtime and GPUs) or CPU-only runs; CPU mode may incur longer load and inference times.
Docker deployment
Build and run the MCP server via Docker; includes local build (make build-docker) and an optional public image (groundlight/mcp-vision:latest).
Claude Desktop integration
Configured via claude_desktop_config.json to run mcp-vision as an MCP server; examples cover CPU and GPU setups.
Development and build workflow
Development flow uses the uv package manager for installation and running (uv install; uv run python mcp_vision); includes Docker image build/run and pushing to Docker Hub.
Who Is This For?
- AI developer:Integrate CV tools into LLM/VL workflows via the MCP server to extend vision capabilities.
- Model integrator:Configure clients like Claude Desktop or other MCP-enabled tools to call mcp-vision as an MCP server for image analysis tasks.




