Overview
Features
Text-to-speech generation
Convert text into audio using Cartesia's TTS capabilities, enabling natural-sounding speech.
Voice localization
Localize speech into different languages or locales to support multilingual outputs.
Infill audio between segments
Infill or bridge audio between two existing clips to create seamless transitions.
Voice cloning / voice switching
Change the voice in a clip or clone a voice across outputs.
Multi-client integration
Connect with clients like Cursor, Claude Desktop, and OpenAI agents via a common MCP interface.
Configurable CLI server
Run the MCP server as a CLI tool installed with pip; specify the executable path and environment variables in config.
API key authentication
Authenticate with Cartesia using the CARTESIA_API_KEY environment variable.
Output directory support
Optionally specify OUTPUT_DIRECTORY to store generated audio and related files.
Who Is This For?
- Developers:Integrate Cartesia TTS and voice features into apps or assistants using the MCP server and config-based setups.
- AI/voice engineers and product teams:Build voice-enabled workflows with Cursor, Claude Desktop, or OpenAI agents by accessing Cartesia's API through MCP.




