Fish Audio

Fish Audio

Text-to-Speech integration with Fish Audio's API, supporting multiple voices, streaming, and real-time playback

10
Stars
5
Forks
0
Releases

Overview

An MCP server that provides seamless integration between Fish Audio's Text-to-Speech API and LLMs such as Claude, enabling natural language-driven speech synthesis. It supports two primary tools: fish_audio_tts, which generates speech from text with configurable voice references, streaming, format, bitrate, latency, and playback options; and fish_audio_list_references, which lists all configured voice references. The server allows selecting voices by ID, name, or tags and supports both single-reference and multi-reference configurations via environment variables. Output formats include MP3, WAV, PCM, and Opus, with configurable MP3 bitrate. Configuration is environment-variable driven, including API keys, model IDs, references, default references, output formats, streaming modes, latency, and output directory. The MCP server aims for low-latency streaming and real-time playback, and it emphasizes easy integration with any MCP-compatible client. It includes API documentation, error handling, troubleshooting guidance, and a clear project structure with a dedicated Fish Audio API client and TTS tool, enabling maintainable development and reliable text-to-speech capabilities for LLM-powered workflows.

Details

Owner
da-okazaki
Language
TypeScript
License
MIT License
Updated
2025-12-07

Features

High-Quality TTS

Leverage Fish Audio's state-of-the-art TTS models to generate natural-sounding speech.

Streaming Support

Real-time audio streaming for low-latency applications.

Multiple Voices

Support for custom voice models via reference IDs.

Smart Voice Selection

Select voices by ID, name, or tags.

Voice Library Management

Configure and manage multiple voice references.

Flexible Configuration

Environment variable-based configuration for easy setup.

Multiple Audio Formats

Support for MP3, WAV, PCM, and Opus.

Easy Integration

Simple setup with any MCP-compatible client.

Audience

DevelopersIntegrate Fish Audio TTS into MCP workflows to generate speech from text within LLM-enabled applications.
Voice model managersConfigure and manage multiple voice references (IDs, names, tags) for customizable voices.

Tags

TTSFish AudioMCP serverStreamingVoice referencesVoice selectionVoice libraryEnvironment variable configurationAudio formatsReal-time playback