computer-control-mcp

computer-control-mcp

MCP server providing computer control capabilities (mouse, keyboard) and OCR features.

70
Stars
11
Forks
0
Releases

Overview

Computer Control MCP is a Python-based MCP server that exposes a suite of actions to remotely control a computer's user interface. It leverages PyAutoGUI for mouse and keyboard interactions and integrates OCR via RapidOCR (ONNXRuntime) for text extraction from screenshots, all while aiming for zero external dependencies beyond these libraries. The server provides functions to move and click the mouse, perform drag-and-drop, type text at the current cursor position, press and hold keys, and execute multi-key sequences. It supports screen capture of the entire display or specific windows, with optional saving to a downloads folder, and an OCR-enabled variant that returns text coordinates. Additional utilities include listing open windows, activating a window by title or pattern, fetching the current screen size, and waiting for a specified duration in milliseconds. Deployment is described via uvx with a sample mcpServers config, or via a standard pip install, and the repository includes development setup, test instructions, and an API reference. The project emphasizes ease of use and zero-external-dependency packaging.

Details

Owner
AB498
Language
Python
License
MIT License
Updated
2025-12-07

Features

Control mouse movements and clicks

Move the cursor, click at coordinates, and perform drag-and-drop with configurable buttons and durations.

Type text at the current cursor position

Type text at the active cursor, including typing, key down/up, and sequences.

Take screenshots of the entire screen or specific windows with optional saving to downloads directory

Capture images of the full screen or a target window, with optional save to a downloads folder.

Extract text from screenshots using OCR

Run OCR on captured images to extract text and coordinates.

List and activate windows

Enumerate open windows and bring a chosen window to the foreground by title or pattern.

Press keyboard keys

Press single keys, key sequences, and multi-key combinations.

Drag and drop operations

Perform drag-and-drop actions between coordinates with optional duration.

Tags

computer controlmouse controlkeyboard controlOCRscreen capturewindow managementPyAutoGUIRapidOCRONNXRuntimeautomationMCPuvxPython