Replacing Exotel with a Self-Hosted Voice AI Gateway — 60% Cost Reduction at 500K Calls/Day
A production-grade telephony gateway that bridges PSTN networks with AI voice applications, enabling a large social-impact foundation to cut per-minute call costs by 60% while handling half a million calls daily — with zero changes to existing AI application code.
Platform
On-Premises / Cloud (DigitalOcean + Azure)
Duration
4 Months
60%
Cost reduction
500K
Calls/day capacity
<200ms
One-way audio latency
Project overview
Replaced Exotel Voicebot with a self-hosted Asterisk-based gateway that connects directly to BSNL SIP trunks. Achieved 60% cost reduction (from Rs 0.50 to Rs 0.20 per minute), 100% Exotel WebSocket protocol compatibility with zero AI code changes, and a distributed architecture that handles the BSNL VPN constraint.
Platform
On-Premises / Cloud (DigitalOcean + Azure)
Duration
4 Months
Type
Voice AI & Telephony
Stack
12 technologies
The challenge
A large social-impact foundation running AI-powered voice surveys across India was spending over Rs 10 million per month on Exotel Voicebot at Rs 0.50 per minute. Migrating away from Exotel seemed impossible because the AI applications were tightly integrated with Exotel’s proprietary WebSocket protocol. Additionally, the cheaper BSNL SIP trunk required a Pritunl VPN that blocked the gateway’s outbound internet access — creating an impossible networking constraint.
Exotel vendor lock-in at Rs 0.50/min — 60% higher than direct BSNL trunk rates
AI applications hardcoded to Exotel’s WebSocket protocol — migration would require rewriting every voice bot
BSNL SIP trunk requires Pritunl VPN which blocks internet access — the gateway cannot reach AI apps
Need to scale from 100K to 500K calls/day (3,000–6,000 concurrent) with high availability
Real-time audio quality requirements: <200ms latency, <1% packet loss across the bridge
What we set out to do
- 01
Build a drop-in Exotel replacement with 100% WebSocket protocol compatibility
- 02
Connect directly to BSNL SIP trunks at Rs 0.20/min — cutting costs by 60%
- 03
Solve the VPN networking constraint without compromising security or reliability
- 04
Achieve sub-200ms audio latency for natural AI voice conversations
- 05
Design for 500K calls/day with horizontal scaling and zero single points of failure
How we solved it
Exotel Protocol Reverse Engineering
Built a WebSocket CLIENT that connects to existing AI app servers using the exact Exotel protocol format — connected, start, media events with Base64 PCM audio. AI apps see the gateway as Exotel — zero code changes required.
Key decision
Gateway as WebSocket client (not server) to match Exotel pattern
Result
100% protocol compatibility. Instant migration with zero AI code rewrites.
Custom Audio Transcoding Pipeline
Built a real-time bidirectional audio bridge: PSTN μ-law RTP (8kHz, 160 samples/20ms) ↔ 16-bit PCM ↔ Base64 string. Custom codec functions process each 20ms chunk in under 1ms — 30x faster than real-time.
Key decision
Custom codec over FFmpeg for minimal latency and zero external dependencies
Result
Sub-1ms transcoding per chunk. <150ms end-to-end audio latency.
Distributed Architecture for VPN Constraint
Separated the system into two servers: Gateway (DigitalOcean, internet-connected, no VPN) and Asterisk PBX (Azure, VPN-connected to BSNL). RTP media flows between them over a private network link, adding only 1–2ms latency.
Key decision
Split Gateway and Asterisk across servers to solve VPN/internet conflict
Result
Both BSNL trunk access and AI app connectivity work simultaneously.
Capacity-Aware Session Management
Built a stateless Session Manager with PostgreSQL persistence and Redis caching. Node pair allocator distributes calls across Gateway-Asterisk pairs using least-connections routing. Session state survives gateway restarts.
Key decision
Stateless orchestrator + stateful workers with Redis capacity tracking
Result
Even load distribution. Session persistence across restarts.
Pluggable Trunk Provider Interface
Designed an abstraction layer supporting BSNL, Exotel, and Twilio as interchangeable SIP trunk providers. Switching providers requires only a YAML config change — no code modifications.
Key decision
TrunkProvider interface with YAML-based configuration
Result
Zero vendor lock-in. Failover between providers in seconds.
Measurable impact
60%
Cost reduction (Rs 0.50 to Rs 0.20/min)
0
AI application code changes required
<200ms
One-way audio latency achieved
500K
Daily call capacity (design target)
<1%
Packet loss rate
$400/mo
Infrastructure cost vs Rs 10M/mo Exotel
Tech stack
What we learned
This project proved that enterprise telephony vendor lock-in is solvable with precise protocol engineering. By replicating Exotel’s exact WebSocket format, we eliminated the migration barrier entirely. The distributed architecture — born from a real-world VPN constraint — became a strength: it enables independent scaling of the media gateway and PBX layers.
- 01
Protocol-level compatibility eliminates migration friction — the AI apps never knew they switched providers
- 02
Network constraints (VPN conflicts) can be solved architecturally by separating concerns across servers
- 03
Custom audio codecs outperform generic libraries when latency budgets are tight (<200ms)
- 04
Pluggable provider interfaces prevent future lock-in — switching trunks is now a config change, not a code change
Ready to build something that matters?
We solve problems that don't have Stack Overflow answers. Let's talk.
Book a Discovery Call