Voice Synthesis
Voice Cloning & Synthesis
Generate high-quality emotional text-to-speech with voice cloning capabilities
POST
The Voice Cloning & Synthesis API enables you to generate natural-sounding speech with emotional transfer capabilities. This advanced endpoint uses AI-powered voice cloning technology to synthesize speech that matches the characteristics and emotional tone of reference audio samples.
Content-Type:
File Format: 16-bit PCM WAV, 44.1kHz sample rate
This endpoint requires a PRO subscription or higher to access voice cloning features.
Request Parameters
The text content to be synthesized into speech. Maximum length: 5000 characters.
The target language for speech synthesis with locale (e.g., “en-US”, “fr-FR”, “es-ES”).
Must match one of the supported languages from Get Options.
One or more audio files containing reference voice samples for cloning.Supported formats: WAV, MP3, M4A, FLAC
Maximum file size: 50MB per file
Recommended length: 3-30 seconds for optimal results
Quality requirements: Clear audio with minimal background noise
Maximum file size: 50MB per file
Recommended length: 3-30 seconds for optimal results
Quality requirements: Clear audio with minimal background noise
Response
Success Response (200)
Synthesized audio as a binary WAV file.
audio/wavFile Format: 16-bit PCM WAV, 44.1kHz sample rate