Browser MCP

Browser MCP

MCP-enabled multimodal AI agent kernel that mounts MCP servers to connect to real-world tools.

19,584
Stars
1,866
Forks
20
Releases

Overview

Agent TARS is an MCP-enabled multimodal AI agent kernel designed to help LLMs operate across terminals, desktops, and browsers. Built on MCP, the kernel supports mounting MCP Servers to connect to real-world tools, enabling workflows that resemble human task completion. The system emphasizes GUI Agent and Vision capabilities, enabling hybrid browser control via GUI Agent, DOM, or hybrid strategies. It uses an Event Stream architecture to drive Context Engineering and an interactive Agent UI, facilitating protocol-driven data flows and debugging. It provides a quick-start CLI that can run headful or headless, with optional Web UI, and aims to be cross-platform with configuration options and integration with various MCP tools. It is part of the Agent TARS ecosystem with documentation and examples to help developers deploy and connect MCP Servers to automation tasks.

Details

Owner
bytedance
Language
TypeScript
License
Apache License 2.0
Updated
2025-12-07

Features

One-Click Out-of-the-box CLI

Supports both headful Web UI and headless server execution, enabling quick setup via npx or global npm installation.

Hybrid Browser Agent

Controls browsers using GUI Agent, DOM, or a hybrid strategy for versatile automation.

Event Stream

Protocol-driven Event Stream powers Context Engineering and the Agent UI for structured data flows.

MCP Integration

Kernel built on MCP and supports mounting MCP Servers to connect to real-world tools.

Audience

Developersto build and extend MCP-enabled multimodal agents and connect to real-world tools via MCP.
AI Engineersto explore multimodal LLM workflows with GUI, Vision, and browser automation.
Product Teamsto deploy automated tasks and workflows using CLI or Web UI.

Tags

MCPMultimodal AIGUI AgentVisionBrowser AutomationEvent StreamCLICross-platform