DataHub

DataHub

Search your data assets, traverse data lineage, write SQL queries, and more using DataHub metadata.

64
Stars
27
Forks
14
Releases

Overview

DataHub MCP Server is a Model Context Protocol server implementation that sits atop DataHub’s metadata platform to empower AI agents and developers with semantic data discovery and provenance capabilities. It exposes a set of MCP tools that enable searching across datasets, assets, and metadata using structured queries, retrieving lineage paths, and inspecting schema fields. With features like Structured Search with Context Filtering, users can use wildcard, field-specific searches, and Boolean logic to narrow results. The SQL intelligence component supports accessing existing queries and generating new ones, helpful for understanding join patterns, filters, and aggregation behavior. The server also exposes table- and column-level lineage to trace data flow, transformations, and dependencies across upstream and downstream nodes (including multiple hops). It additionally provides a view into the data ecosystem by exposing domains, owners, tags, glossaries, and data platforms, so teams can understand context before searching. The documentation references usage in the DataHub MCP server docs and demonstrates an end-to-end agent workflow that searches, inspects metadata, inspects lineage, retrieves example queries, and constructs SQL.

Details

Owner
acryldata
Language
Python
License
Apache License 2.0
Updated
2025-12-07

Features

Structured Search with Context Filtering

Go beyond keyword matching with structured query syntax, including wildcard matching, field searches, and boolean logic.

SQL Intelligence & Query Generation

Access and generate SQL queries, understand join patterns, common filters, and production query patterns.

Table & Column-Level Lineage

Trace data flow upstream and downstream at both table and column levels, including transformations and multiple hops.

Understands Your Data Ecosystem

Discover data domains, owners, tags, glossary terms, and browse across data platforms and environments.

Audience

AI agentsEnable AI agents to search, reason over lineage, and generate SQL queries.
Data engineersUse to discover data assets, understand provenance, and optimize data workflows.
Data scientistsSupport rapid data discovery and prototyping by querying rich metadata.

Tags

DataHubMCP ServerModel Context Protocoldata discoverydata lineageSQL generationstructured searchmetadata