Convert text to speech
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Specify which TTS model to use. We recommend s1
s1, speech-1.6, speech-1.5 Request body for text-to-speech synthesis.
Text to convert to speech.
Controls expressiveness. Higher is more varied, lower is more consistent.
0 <= x <= 1Controls diversity via nucleus sampling.
0 <= x <= 1Inline voice references for zero-shot cloning. Requires MessagePack (not JSON). Ignored if reference_id is provided.
Voice model ID from the Fish Audio library or your custom models.
Speed and volume adjustments for the output.
Text segment size for processing.
100 <= x <= 300Normalizes text for English and Chinese, improving stability for numbers.
Output audio format.
wav, pcm, mp3, opus Audio sample rate in Hz. When null, uses the format's default (44100 Hz for most formats, 48000 Hz for opus).
MP3 bitrate in kbps. Only applies when format is mp3.
64, 128, 192 Opus bitrate in bps. -1000 for automatic. Only applies when format is opus.
-1000, 24, 32, 48, 64 Latency-quality trade-off. normal: best quality, balanced: reduced latency, low: lowest latency.
low, normal, balanced Maximum audio tokens to generate per text chunk.
Penalty for repeating audio patterns. Values above 1.0 reduce repetition.
Minimum characters before splitting into a new chunk.
0 <= x <= 100Use previous audio as context for voice consistency.
Early stopping threshold for batch processing.
0 <= x <= 1Request fulfilled, document follows