mcp-read-website-fast

mcp-read-website-fast

Fast, token-efficient web content extraction for AI agents - converts websites to clean Markdown.

112
Stars
18
Forks
0
Releases

Overview

Fast, token-efficient web content extraction for AI agents, converting websites to clean Markdown. This MCP server fetches pages locally, strips noise, and preserves links using Mozilla Readability and Turndown (GitHub Flavored Markdown). It operates with minimal dependencies, designed for Claude Code, IDEs, and LLM pipelines. The server caches results using SHA-256 hashed URLs for fast lookups and supports polite crawling with robots.txt compliance and rate limiting. It enables concurrent fetching with configurable crawl depth, while a stream-first design keeps memory usage low. Optional chunking supports downstream processing and knowledge graph preservation by maintaining link structures. The MCP SDK enables fast startup with lazy loading, and core crawling/markdown conversion are powered by @just-every/crawl. Tools include read_website for fetching and converting a page to Markdown, and status/clear-cache endpoints for cache observability. The server can be integrated via MCP clients using JSON configurations or npm/npx commands and is accessible through the read-website-fast namespace.

Details

Owner
just-every
Language
JavaScript
License
MIT License
Updated
2025-12-07

Features

Fast startup with lazy loading

Fast startup using the official MCP SDK with lazy loading for optimal performance.

Mozilla Readability extraction

Content extraction using Mozilla Readability (same as Firefox Reader View) to capture the main article content.

HTML to Markdown conversion

HTML to Markdown conversion with Turndown and GitHub Flavored Markdown support.

Smart caching

Caching with SHA-256 hashed URLs for efficient repeated fetches.

Polite crawling

Robots.txt support with rate limiting to crawl sites responsibly.

Concurrent fetching

Configurable depth and concurrency for parallel page retrieval.

Stream-first design

Low memory usage achieved through stream-first processing.

Link preservation

Preserves links to support knowledge graphs and downstream linking.

Audience

Claude CodeIntegrates with Claude Code to fetch pages and convert to clean Markdown for prompts.
IDEsUsed in IDEs to generate token-efficient Markdown content for AI workflows.
LLM pipelinesSupports LLM pipelines and agents with clean Markdown representations for training and reasoning.
Knowledge graphsFacilitates knowledge graph construction with preserved links and Markdown data.

Tags

webcrawlermarkdownreadabilityturndowngfmcachingrobots.txtconcurrencystreaminglink-preservation