What features does Cartesia MCP server provide?

Q: What features does Cartesia MCP server provide?

Text-to-speech generation: Convert text into audio using Cartesia's TTS capabilities, enabling natural-sounding speech.. Voice localization: Localize speech into different languages or locales to support multilingual outputs.. Infill audio between segments: Infill or bridge audio between two existing clips to create seamless transitions.. Voice cloning / voice switching: Change the voice in a clip or clone a voice across outputs.. Multi-client integration: Connect with clients like Cursor, Claude Desktop, and OpenAI agents via a common MCP interface.. Configurable CLI server: Run the MCP server as a CLI tool installed with pip; specify the executable path and environment variables in config.. API key authentication: Authenticate with Cartesia using the CARTESIA_API_KEY environment variable.. Output directory support: Optionally specify OUTPUT_DIRECTORY to store generated audio and related files.

Cartesia MCP Server 2026: Features & Setup Guide

Overview

The Cartesia MCP server provides a bridge for clients such as Cursor, Claude Desktop, and OpenAI agents to access Cartesia's API. It supports generating speech from text, localizing speech into different languages, and infilling audio between existing segments. The server can be installed via pip and run as a command-line executable. Clients configure the MCP server through integration files (Claude Desktop uses claude_desktop_config.json; Cursor uses .cursor/mcp.json or a global config). To use, you must have a Cartesia account with API keys; you can obtain an API key from the Cartesia Playground API Keys section (New). When running, you can specify OUTPUT_DIRECTORY to store generated files. The Claude Desktop example shows how to set environment variables including CARTESIA_API_KEY and OUTPUT_DIRECTORY. The README also mentions a free tier with 20,000 credits per month. The MCP server exposes commands to list voices, convert text to audio, localize speech, infill audio, and switch voices.

Features

Text-to-speech generation

Convert text into audio using Cartesia's TTS capabilities, enabling natural-sounding speech.

Voice localization

Localize speech into different languages or locales to support multilingual outputs.

Infill audio between segments

Infill or bridge audio between two existing clips to create seamless transitions.

Voice cloning / voice switching

Change the voice in a clip or clone a voice across outputs.

Multi-client integration

Connect with clients like Cursor, Claude Desktop, and OpenAI agents via a common MCP interface.

Configurable CLI server

Run the MCP server as a CLI tool installed with pip; specify the executable path and environment variables in config.

API key authentication

Authenticate with Cartesia using the CARTESIA_API_KEY environment variable.

Output directory support

Optionally specify OUTPUT_DIRECTORY to store generated audio and related files.