What features does Fish Audio MCP server provide?

Q: What features does Fish Audio MCP server provide?

High-Quality TTS: Leverage Fish Audio's state-of-the-art TTS models to generate natural-sounding speech.. Streaming Support: Real-time audio streaming for low-latency applications.. Multiple Voices: Support for custom voice models via reference IDs.. Smart Voice Selection: Select voices by ID, name, or tags.. Voice Library Management: Configure and manage multiple voice references.. Flexible Configuration: Environment variable-based configuration for easy setup.. Multiple Audio Formats: Support for MP3, WAV, PCM, and Opus.. Easy Integration: Simple setup with any MCP-compatible client.

Fish Audio MCP Server 2026: Features & Setup Guide

Overview

An MCP server that provides seamless integration between Fish Audio's Text-to-Speech API and LLMs such as Claude, enabling natural language-driven speech synthesis. It supports two primary tools: fish_audio_tts, which generates speech from text with configurable voice references, streaming, format, bitrate, latency, and playback options; and fish_audio_list_references, which lists all configured voice references. The server allows selecting voices by ID, name, or tags and supports both single-reference and multi-reference configurations via environment variables. Output formats include MP3, WAV, PCM, and Opus, with configurable MP3 bitrate. Configuration is environment-variable driven, including API keys, model IDs, references, default references, output formats, streaming modes, latency, and output directory. The MCP server aims for low-latency streaming and real-time playback, and it emphasizes easy integration with any MCP-compatible client. It includes API documentation, error handling, troubleshooting guidance, and a clear project structure with a dedicated Fish Audio API client and TTS tool, enabling maintainable development and reliable text-to-speech capabilities for LLM-powered workflows.

Features

High-Quality TTS

Leverage Fish Audio's state-of-the-art TTS models to generate natural-sounding speech.

Streaming Support

Real-time audio streaming for low-latency applications.

Multiple Voices

Support for custom voice models via reference IDs.

Smart Voice Selection

Select voices by ID, name, or tags.

Voice Library Management

Configure and manage multiple voice references.

Flexible Configuration

Environment variable-based configuration for easy setup.

Multiple Audio Formats

Support for MP3, WAV, PCM, and Opus.

Easy Integration

Simple setup with any MCP-compatible client.