Resemble AI: Voice Cloning & Deepfake Detection API

Enterprise AI platform for generative voice (TTS, voice cloning) and multimodal deepfake detection. Trusted by Fortune 500s. Open-source Chatterbox TTS, DETECT-3B Omni detection, PerTH watermarking.

Resemble AI provides two core capabilities: generative voice AI through Chatterbox (MIT-licensed TTS with zero-shot voice cloning from 5 seconds of audio) and deepfake detection via DETECT-3B Omni (98%+ accurate multimodal detector ranked #1 on public leaderboards). The platform includes PerTH watermarking for invisible content provenance tracking. Designed for enterprises, it's trusted by Fortune 500s and government agencies, with on-premise deployment and SOC 2 compliance available.

Pricing

Flex Plan (pay-as-you-go): TTS $0.0005/sec, Detection $0.04/sec, Voice Agents $0.001/sec, Intelligence $0.03/sec. Creator Plan: $1 first month, $29/month after (10k seconds TTS). Professional: $99/month (80k seconds). Business: $499/month (320k seconds). Enterprise: custom pricing with volume discounts up to 80%. SOC 2 and on-premise deployment available.

Frequently Asked Questions

Does Resemble AI have a free tier?

Yes. The Flex Plan is pay-as-you-go starting at $0 with per-second billing (TTS $0.0005/sec, Detection $0.04/sec). Credits never expire. Creator Plan begins at $29/month with 10,000 seconds of TTS included.

How accurate is DETECT-3B Omni deepfake detection?

DETECT-3B Omni achieves 98% accuracy across 40+ languages and is ranked #1 on public leaderboards (Speech DeepFake Arena, DFBench). Audio detection maintains EER <6%, video ~4.5%, image ~9%. Real-time detection under 300ms.

What is Chatterbox and why is it open-source?

Chatterbox is Resemble's state-of-the-art text-to-speech model (MIT licensed, 22.5k GitHub stars) that outperforms ElevenLabs in blind evaluations (63.75% user preference). It features zero-shot voice cloning from 5 seconds of audio, emotion control, and built-in PerTH watermarking. Open-source because Resemble believes in transparency and developer empowerment.

Can I deploy Resemble AI on-premise?

Yes. Chatterbox and DETECT-3B Omni can run entirely on your infrastructure (air-gapped). Full on-premise deployment, zero data egress, and no external API calls. Available on Enterprise tier and custom deployments.

What languages does Chatterbox support?

Chatterbox supports 23 languages natively: English, Spanish, French, German, Italian, Portuguese, Russian, Dutch, Polish, Turkish, Swedish, Danish, Finnish, Norwegian, Greek, Hebrew, Arabic, Hindi, Japanese, Korean, Chinese (Simplified), Malay, and Romanian.

How does PerTH watermarking work?

PerTH embeds imperceptible neural watermarks using psychoacoustic principles—exploiting inaudible frequency ranges where human hearing is blind. The watermark survives compression (MP3, AAC, OGG), resampling, and editing while maintaining near-100% detection accuracy. It's embedded automatically in all Chatterbox-generated audio.

Is Resemble AI SOC 2 compliant?

Yes. Enterprise tier includes SOC 2 Type II compliance. Custom SLAs, SSO/SAML authentication, and enterprise-grade security are available. On-premise and private cloud deployment options support the strictest data residency requirements.