eBook-mcp

eBook-mcp

Lightweight MCP server enabling LLMs to read and interact with your PDFs and EPUBs.

137
Stars
23
Forks
0
Releases

Overview

Ebook-MCP is a powerful MCP server for processing electronic books. Built on the Model Context Protocol, it provides standardized APIs for seamless integration between LLM applications and e-book processing capabilities, currently supporting EPUB and PDF formats. The server exposes endpoints to discover and extract EPUB and PDF metadata, tables of contents, and structural information, as well as content extraction at the chapter or page level, with Markdown output to facilitate ingestion by LLMs. It supports batch processing of EPUB and PDF files and offers a reference API with functions such as get_all_epub_files, get_metadata, get_toc, and get_chapter_markdown, along with corresponding PDF APIs like get_all_pdf_files, get_pdf_metadata, get_pdf_toc, get_pdf_page_text, and get_pdf_chapter_content. Dependencies include ebooklib, PyPDF2, PyMuPDF, beautifulsoup4, html2text, pydantic, and fastmcp. The project describes a three-layer architecture: an Agent Layer for translation and LLM interactions, an MCP Tool Layer for extraction and generation, and a System/Base Layer for IO and parsing. Ebook-MCP aims to enable natural language conversations and interactive reading experiences over personal library content.

Details

Owner
onebirdrocks
Language
Python
License
Apache License 2.0
Updated
2025-12-07

Features

EPUB metadata extraction

Extracts EPUB metadata such as title, author, and publication date.

EPUB table of contents extraction

Extracts the EPUB's table of contents for navigation.

EPUB chapter content extraction (Markdown)

Retrieves chapter content in Markdown format.

EPUB batch processing

Batch processes multiple EPUB files.

PDF metadata extraction

Extracts PDF metadata such as title, author, creation date.

PDF table of contents extraction

Extracts the PDF's table of contents.

PDF content extraction by page number

Extracts content from a specific PDF page number.

PDF chapter content extraction

Retrieves content for a given PDF chapter title.

Audience

LLM developersIntegrate e-book APIs into LLM applications for reading and conversational access to EPUB/PDF content.
AI reading assistantsEnable natural language queries and interactive reading experiences over personal ebooks.
E-book app teamsBuild AI-powered ebook readers and chat interfaces for personal libraries.

Tags

MCP serverebook processingEPUBPDFModel Context ProtocolLLM integrationmetadata extractiontable of contentschapter contentMarkdown outputbatch processingebooklibPyPDF2PyMuPDF