What features does mcp-vision MCP server provide?

Q: What features does mcp-vision MCP server provide?

locate_objects: Detect and locate objects in an image using HuggingFace zero-shot object detection pipelines; inputs include image_path, candidate_labels, and an optional hf_model (defaults to google/owlvit-large-patch14); returns results in HF object-detection format.. zoom_to_object: Crop the image to the bounding box of a specified label and return the cropped image; if multiple objects match, returns the best-scoring one; returns MCPImage or None.. Default and configurable models: Defaults to google/owlvit-large-patch14 but supports specifying hf_model for other models; slower on CPU.. Hardware support: Supports GPU-accelerated runs (Docker with NVIDIA runtime and GPUs) or CPU-only runs; CPU mode may incur longer load and inference times.. Docker deployment: Build and run the MCP server via Docker; includes local build (make build-docker) and an optional public image (groundlight/mcp-vision:latest).. Claude Desktop integration: Configured via claude_desktop_config.json to run mcp-vision as an MCP server; examples cover CPU and GPU setups.. Development and build workflow: Development flow uses the uv package manager for installation and running (uv install; uv run python mcp_vision); includes Docker image build/run and pushing to Docker Hub.

mcp-vision MCP Server 2026: Features & Setup Guide

Overview

mcp-vision is a Model Context Protocol (MCP) server that exposes HuggingFace computer vision models as callable tools to enhance the vision capabilities of large language or vision-language models. The repository is in active development and currently provides two tools: locate_objects and zoom_to_object. locate_objects detects and locates objects in an image using zero-shot object detection pipelines, accepting image_path, candidate_labels, and an optional hf_model (default google/owlvit-large-patch14); it returns results in HuggingFace object-detection format. zoom_to_object crops to a specified object label’s bounding box, returning the cropped image or None, and uses the best-scoring object when multiple matches exist. The default model can be overridden by the hf_model parameter, and CPU-only usage may result in slower performance. Deployment is via Docker, with local build commands and an optional public image (groundlight/mcp-vision:latest). Claude Desktop integration is documented via claude_desktop_config.json examples for CPU and GPU setups. Development guidance includes uv-based setup, building and running Docker images, and troubleshooting notes.

Features

locate_objects

Detect and locate objects in an image using HuggingFace zero-shot object detection pipelines; inputs include image_path, candidate_labels, and an optional hf_model (defaults to google/owlvit-large-patch14); returns results in HF object-detection format.

zoom_to_object

Crop the image to the bounding box of a specified label and return the cropped image; if multiple objects match, returns the best-scoring one; returns MCPImage or None.

Default and configurable models

Defaults to google/owlvit-large-patch14 but supports specifying hf_model for other models; slower on CPU.

Hardware support

Supports GPU-accelerated runs (Docker with NVIDIA runtime and GPUs) or CPU-only runs; CPU mode may incur longer load and inference times.

Docker deployment

Build and run the MCP server via Docker; includes local build (make build-docker) and an optional public image (groundlight/mcp-vision:latest).

Claude Desktop integration

Configured via claude_desktop_config.json to run mcp-vision as an MCP server; examples cover CPU and GPU setups.

Development and build workflow

Development flow uses the uv package manager for installation and running (uv install; uv run python mcp_vision); includes Docker image build/run and pushing to Docker Hub.