RAG Local

RAG Local

This MCP server for storing and retrieving text passages locally based on their semantic meaning.

15
Stars
2
Forks
0
Releases

Overview

Memory Server (mcp-rag-local) provides a simple API to store and retrieve text passages by semantic meaning rather than keywords. It uses Ollama to generate text embeddings and ChromaDB for vector storage and fast similarity search, enabling semantic retrieval of stored content. You can memorize a single text or multiple texts, and later query for the most relevant passages. The MCP also supports memorizing contents from PDF files via the memorize_pdf_file tool: the reader processes up to 20 pages at a time, the LLM chunks the text into meaningful segments, and memorize_multiple_texts stores the chunks; the process repeats until the entire document is memorized. It supports conversational chunking for very large texts, where the LLM iteratively chunks and memorizes. Retrieved results include the relevant texts along with a human-readable description of their relevance. The server runs with Docker Compose, exposing ports for ChromaDB and Ollama, and provides a web-based admin GUI at http://localhost:8322. Setup involves pulling the embedding model (all-minilm:l6-v2) and configuring MCP to run the main.py.

Details

Owner
renl
Language
Python
License
MIT License
Updated
2025-12-07

Features

Semantic memory storage

Store and retrieve text passages based on semantic meaning using embeddings rather than keywords.

Memorize multiple texts

Memorize several texts in a single operation for later semantic retrieval.

PDF memorization

Memorize contents of a PDF by reading up to 20 pages at a time, chunking into meaningful segments, and storing.

Conversational chunking

LLM-assisted splitting of long texts into short, meaningful chunks and memorizing them iteratively.

Semantic retrieval

Retrieve the most relevant stored texts for a query, with human-readable relevance descriptions.

Local vector store (ChromaDB)

Uses ChromaDB for vector storage and fast similarity search.

Embedding generation with Ollama

Generates embeddings using Ollama for the vector store.

MCP tooling and deployment

Includes MCP configuration and Docker Compose-based deployment to run ChromaDB, Ollama, and the MCP server.

Audience

DevelopersBuild local semantic memory systems by storing and retrieving text passages using embeddings and vector search.
Data scientists / ML engineersPrototype and test retrieval workflows with PDFs and long documents in a local setup.
ResearchersStudy content-based retrieval methods in privacy-conscious, local deployments with an admin GUI for inspection.

Tags

semantic-memorytext-embeddingvector-searchChromaDBOllamaPDFchunkinglocal-memoryretrievalmemory-serverMCP