Dataset Viewer

Dataset Viewer

Browse and analyze Hugging Face datasets with features like search, filtering, statistics, and data export

30
Stars
13
Forks
0
Releases

Overview

An MCP server designed to interact with the Hugging Face Dataset Viewer API, enabling browsing, analysis, and export of datasets hosted on the Hugging Face Hub. It leverages the dataset:// URI scheme, supports dataset configurations and splits, and offers paginated access to dataset contents. Authentication is supported for private datasets, and the server provides capabilities to search and filter dataset contents, as well as retrieve statistics. Core capabilities are exposed as MCP endpoints, including validate (check dataset existence and accessibility), get_info (detailed dataset information), get_rows (paginated data), get_first_rows (initial rows of a split), get_statistics (split-level statistics), search_dataset (text search within a dataset), filter (SQL-like conditional filtering with optional ordering and paging), and get_parquet (download entire dataset as Parquet). Installation requires Python 3.12+ and uv, with dev-mode setup via uv add, and HUGGINGFACE_TOKEN for private data. Claude Desktop integration is supported via a config snippet. The README enumerates prerequisites, setup steps, and environment variables to enable private dataset access.

Details

Owner
privetin
Language
Python
License
MIT License
Updated
2025-12-07

Features

validate

Check if a dataset exists and is accessible, with optional auth_token for private datasets.

get_info

Get detailed information about a dataset, with optional auth_token.

get_rows

Get paginated contents of a dataset, given dataset, config, and split, with optional page and auth_token.

get_first_rows

Get first rows from a dataset split, given dataset, config, and split, with optional auth_token.

get_statistics

Get statistics about a dataset split, given dataset, config, and split, with optional auth_token.

search_dataset

Search for text within a dataset, given dataset, config, split, and query, with optional auth_token.

filter

Filter rows using SQL-like conditions (where), with optional ordering, paging, and auth_token.

get_parquet

Download the entire dataset in Parquet format, with optional auth_token.

Audience

DevelopersIntegrate dataset browsing, querying, and analysis into applications using MCP endpoints.
Data scientistsExplore dataset contents, perform searches and retrieve statistics for analysis via API.

Tags

huggingfacedataset-viewerdatasetMCPAPIdata-browsingqueryfilterstatisticsParquetauthenticationprivate-datasets