mcp-vision

mcp-vision

An MCP server exposing HuggingFace computer vision models as tools (e.g., zero-shot object detection).

46
Stars
6
Forks
0
Releases

Overview

mcp-vision is a Model Context Protocol (MCP) server that exposes HuggingFace computer vision models as callable tools to enhance the vision capabilities of large language or vision-language models. The repository is in active development and currently provides two tools: locate_objects and zoom_to_object. locate_objects detects and locates objects in an image using zero-shot object detection pipelines, accepting image_path, candidate_labels, and an optional hf_model (default google/owlvit-large-patch14); it returns results in HuggingFace object-detection format. zoom_to_object crops to a specified object label’s bounding box, returning the cropped image or None, and uses the best-scoring object when multiple matches exist. The default model can be overridden by the hf_model parameter, and CPU-only usage may result in slower performance. Deployment is via Docker, with local build commands and an optional public image (groundlight/mcp-vision:latest). Claude Desktop integration is documented via claude_desktop_config.json examples for CPU and GPU setups. Development guidance includes uv-based setup, building and running Docker images, and troubleshooting notes.

Details

Owner
groundlight
Language
Python
License
MIT License
Updated
2025-12-07

Features

locate_objects

Detect and locate objects in an image using HuggingFace zero-shot object detection pipelines; inputs include image_path, candidate_labels, and an optional hf_model (defaults to google/owlvit-large-patch14); returns results in HF object-detection format.

zoom_to_object

Crop the image to the bounding box of a specified label and return the cropped image; if multiple objects match, returns the best-scoring one; returns MCPImage or None.

Default and configurable models

Defaults to google/owlvit-large-patch14 but supports specifying hf_model for other models; slower on CPU.

Hardware support

Supports GPU-accelerated runs (Docker with NVIDIA runtime and GPUs) or CPU-only runs; CPU mode may incur longer load and inference times.

Docker deployment

Build and run the MCP server via Docker; includes local build (make build-docker) and an optional public image (groundlight/mcp-vision:latest).

Claude Desktop integration

Configured via claude_desktop_config.json to run mcp-vision as an MCP server; examples cover CPU and GPU setups.

Development and build workflow

Development flow uses the uv package manager for installation and running (uv install; uv run python mcp_vision); includes Docker image build/run and pushing to Docker Hub.

Audience

AI developerIntegrate CV tools into LLM/VL workflows via the MCP server to extend vision capabilities.
Model integratorConfigure clients like Claude Desktop or other MCP-enabled tools to call mcp-vision as an MCP server for image analysis tasks.

Tags

mcp-visionMCPcomputer visionHuggingFacezero-shot object detectionobject detectionClaude DesktopdockerGPUCPU