This endpoint requires a PRO subscription or higher to access voice cloning features.
Request Parameters
The text content to be synthesized into speech. Maximum length: 5000 characters.
The target language for speech synthesis with locale (e.g., “en-US”, “fr-FR”, “es-ES”).
Must match one of the supported languages from Get Options.
One or more audio files containing reference voice samples for cloning.Supported formats: WAV, MP3, M4A, FLAC
Maximum file size: 50MB per file
Recommended length: 3-30 seconds for optimal results
Quality requirements: Clear audio with minimal background noise
Maximum file size: 50MB per file
Recommended length: 3-30 seconds for optimal results
Quality requirements: Clear audio with minimal background noise
Response
Success Response (200)
Synthesized audio as a binary WAV file.
audio/wav
File Format: 16-bit PCM WAV, 44.1kHz sample rate