Databricks Smart SQL

Databricks Smart SQL

MCP server enabling LLM agents to explore Unity Catalog metadata and lineage to generate SQL queries.

29
Stars
16
Forks
0
Releases

Overview

The Databricks MCP Server provides an agent-facing interface to Databricks Unity Catalog metadata and data lineage. It is designed to empower LLMs to autonomously understand data assets (catalogs, schemas, tables, columns), inspect data processing code (notebooks and jobs), and craft SQL queries against a Databricks SQL warehouse. Core capabilities include tools to list catalogs, describe catalogs and schemas, describe tables with optional column details and lineage, and to execute SQL queries via the Databricks SDK. In agent mode, the server supports iterative exploration—discovering sources, inspecting table structures, analyzing upstream/downstream lineage, and inspecting notebooks for business logic and data quality checks—culminating in query construction. It also provides a Markdown-friendly output format for LLM consumption. The server supports metadata-as-code workflows with Terraform examples to define Unity Catalog assets and a Terraform-import workflow. For long-running queries, a 50-second wait timeout is documented. Standalone operation via main.py and integration with Cursor are supported. Dependencies include databricks-sdk, python-dotenv, mcp[cli], asyncio, and httpx.

Details

Owner
RafaelCartenet
Language
Python
License
MIT License
Updated
2025-12-07

Features

Execute SQL Queries

Run arbitrary SQL queries using the Databricks SDK via execute_sql_query(sql: str) and return formatted results.

UC Catalog Discovery Tools

List and describe Unity Catalog assets (catalogs, schemas, and tables) to discover available data sources.

Schema and Table Description with Columns

Describe schemas and tables; optionally include column-level details for query construction.

Table Lineage and Code Discovery

Describe table lineage including upstream/downstream tables, notebooks, and jobs; enables code exploration of data transformations.

Markdown Output for LLMs

All descriptive tools return information in Markdown format optimized for consumption by LLMs.

Agent Workflow Support

Supports iterative agent workflows: catalog/schema discovery, table inspection, lineage analysis, and query execution.

Metadata-as-Code with Terraform

Illustrates managing Unity Catalog metadata as code with Terraform resources and import workflows.

Long-Running Query Handling

Notes a 50-second wait_timeout for SQL execution and how long-running queries are polled.

Audience

LLM agentsEnable autonomous discovery of Unity Catalog metadata and lineage for SQL generation.
Data engineersGovern Unity Catalog assets and query lineage via MCP tooling and Terraform.
Data scientistsUnderstand data context and transformation logic for exploratory analysis via notebooks.
Cursor usersIntegrate with Cursor to manage MCP workflows and agent queries.

Tags

DatabricksUnity CatalogUnity Catalog MetadataMCPLLMAI agentsSQL queriesLineageNotebookJobsTerraformCursorDatabricks SDK