What is Retell AI and how does it work?

Retell AI is a voice AI platform that handles phone calls using AI agents. It provides automatic speech recognition (ASR), text-to-speech synthesis, and a conversation engine. You define the agent's behavior with a system prompt and tools, then connect it to a phone number. Retell AI integrates with Claude API for reasoning and supports webhooks for CRM integration.

How much does Retell AI cost compared to VAPI?

Retell AI pricing starts at $0.10/minute with optimization possible down to $0.05/minute using proper voice and LLM configuration. VAPI is comparable in pricing but Retell AI offers better latency for most use cases. For high-volume deployments (10,000+ calls/month), negotiate enterprise pricing with both providers.

How do you connect Retell AI to Claude API?

To connect Retell AI to Claude API: (1) Create a custom LLM endpoint in your backend that wraps the Anthropic API, (2) configure the endpoint URL in Retell AI's agent settings, (3) format the conversation history to match Anthropic's messages format, and (4) handle tool calls for calendar booking, CRM updates, and other integrations. Carlos has deployed this stack for clients achieving 80% reduction in manual labor.

What is BellaBot and how was it built?

BellaBot is a voice AI agent built by Carlos Aragon using Retell AI and Claude API for a client's business. It handles inbound calls, screens callers, qualifies leads, and schedules appointments automatically. The agent reduced manual labor by 80% and handles 95%+ of routine inquiries without human intervention. It uses Supabase for call logging and n8n for CRM sync.

Retell AI Setup Guide: Build Voice Agents That Actually Work

Introduction: Why Voice AI is Game-Changing

Voice AI is no longer futuristic—it's practical, affordable, and transforming how businesses handle customer interactions. After spending two years building Retell AI implementation systems for clients like VIXI and CQ Marketing, I can tell you this technology is ready for production.

The catalyst for me diving deep into voice AI was a simple problem: one of my clients was spending $8,000/month on virtual assistants just to answer basic customer questions and screen calls. The questions were repetitive: "What are your hours?" "Do you take insurance?" "Can I reschedule my appointment?" Yet every call required human attention because voicemail wasn't cutting it.

I built BellaBot—a voice AI for business that handles incoming calls, answers common questions, books appointments, and routes urgent cases to humans. The result? An 80% reduction in manual labor, 24/7 availability, and happier customers who get instant answers instead of waiting for callbacks.

This guide covers everything I learned building BellaBot and other voice agents. Whether you're automating customer support, lead qualification, or appointment scheduling, you'll learn the exact setup process, code patterns, and optimization strategies that took me months to figure out.

Retell vs VAPI: Which to Choose?

The two leading platforms for building voice agent development solutions are Retell and VAPI. I've used both extensively, and here's my honest comparison based on real production experience.

Feature Comparison Table

Feature	Retell AI	VAPI
Cost per minute	$0.05 - $0.10	$0.08 - $0.15
Voice quality	Excellent (Eleven Labs)	Very good (multiple)
Latency (response time)	600-900ms	800-1200ms
LLM options	Claude, GPT-4, custom	GPT-4, GPT-3.5
API flexibility	High (webhooks, function calling)	Medium
Phone number provisioning	Built-in (Twilio integration)	BYO Twilio
Call recording	Yes (automatic)	Yes
Documentation quality	Very good	Good
Best for	Complex workflows, Claude integration	Simple call flows, quick setup

My Recommendation

I choose Retell AI for 90% of my client projects. Here's why:

Claude Sonnet 4.6 integration — Retell supports Claude API natively, which gives significantly better reasoning for complex conversations compared to GPT-4
Lower latency — The 600-900ms response time feels natural. VAPI's 1+ second delays create awkward pauses
Function calling flexibility — I can trigger webhooks mid-conversation to check availability, pull customer data, or book appointments in real-time
Cost optimization — With proper caching and prompt engineering, I get costs down to $0.05/minute vs VAPI's $0.08+ floor

The only scenario where I'd recommend VAPI is if you need a super simple agent deployed in under an hour with zero custom code. For everything else—especially if you want advanced conversation flows or Claude integration—Retell is the better choice.

Prerequisites: Account Setup and API Keys

Before building your first voice agent, you'll need accounts and API keys for the following services. Budget approximately $100-200 for initial testing across all platforms.

Required Accounts:

Retell AI account — Sign up at retellai.com (free trial includes 100 minutes)
Anthropic API — Get Claude API key from console.anthropic.com ($25 credit for new accounts)
Supabase project — For call logging and data storage (free tier works fine)
n8n instance — Self-hosted or cloud (n8n.io) for post-call automation
Twilio account (optional) — Only if you need custom phone numbers beyond Retell's provisioning

Environment Setup

Store all credentials securely in environment variables. Here's my standard .env structure for voice AI projects:

# .env.local
RETELL_API_KEY="your_retell_api_key_here"
ANTHROPIC_API_KEY="sk-ant-..."
SUPABASE_URL="https://your-project.supabase.co"
SUPABASE_SERVICE_KEY="your_service_role_key"
N8N_WEBHOOK_URL="https://your-n8n.app/webhook/call-completed"
TWILIO_ACCOUNT_SID="optional_if_using_retell_phones"
TWILIO_AUTH_TOKEN="optional"

Never commit these to GitHub. Use Vercel's environment variables dashboard for production deployments.

Step 1: Creating Your First Voice Agent

Let's build a simple appointment booking agent step-by-step. This agent will handle incoming calls, check appointment availability, and schedule meetings directly into your calendar.

Agent Configuration

First, create a new agent in Retell dashboard with these settings:

Agent name: AppointmentBooker
Voice model: Eleven Labs "Rachel" (professional female voice)
LLM model: Claude Sonnet 4.6
Language: English (US)
Initial message: "Hi! Thanks for calling. I'm here to help you schedule an appointment. Can I get your name?"

System Prompt Engineering

The system prompt is critical. Here's the exact prompt I use for BellaBot (adapted for appointment booking):

You are an AI appointment scheduling assistant for [BUSINESS NAME].

Your role:
- Greet callers warmly and professionally
- Collect: caller name, phone number, preferred appointment date/time
- Check availability using the check_availability function
- Confirm appointment details before booking
- Use book_appointment function to schedule confirmed appointments
- Provide confirmation number after booking

Guidelines:
- Keep responses under 25 words to maintain conversational flow
- If caller asks questions outside your scope, say "Let me transfer you to our team"
- Always repeat back appointment details for confirmation
- Be empathetic if requested times aren't available
- Offer 2-3 alternative time slots when unavailable

Never:
- Discuss pricing (transfer to human)
- Make medical/legal advice
- Book appointments without explicit caller confirmation
- Share personal information about other clients

Notice the emphasis on brevity. Long-winded AI responses kill the conversational flow. I learned this the hard way after users complained that BellaBot "talked too much" in early versions.

Step 2: Designing Conversation Flows

Good voice agents need clear conversation flows with explicit handling for edge cases. Here's the flow design for our appointment booking agent:

Primary Conversation Flow

Greeting → AI introduces itself and asks for caller's name
Information gathering → Collect phone number and preferred appointment time
Availability check → Call check_availability function with requested date/time
Confirmation loop → Repeat details, ask "Does that work for you?"
Booking execution → Call book_appointment function if confirmed
Confirmation delivery → Provide confirmation number and recap details
Closing → "Is there anything else I can help with?" → End call

Example Flow JSON Configuration

Retell uses a JSON-based flow configuration for complex agents. Here's a simplified version of the appointment flow:

{
  "flow_name": "appointment_booking",
  "initial_node": "greeting",
  "nodes": [
    {
      "id": "greeting",
      "type": "message",
      "content": "Hi! Thanks for calling. I'm here to help you schedule an appointment. Can I get your name?",
      "next": "collect_name"
    },
    {
      "id": "collect_name",
      "type": "input_collection",
      "variable": "caller_name",
      "validation": "required",
      "next": "collect_phone"
    },
    {
      "id": "collect_phone",
      "type": "input_collection",
      "content": "Great! And what's the best phone number to reach you at?",
      "variable": "phone_number",
      "validation": "phone",
      "next": "collect_datetime"
    },
    {
      "id": "collect_datetime",
      "type": "input_collection",
      "content": "Perfect. What day and time works best for you?",
      "variable": "preferred_datetime",
      "validation": "datetime",
      "next": "check_availability"
    },
    {
      "id": "check_availability",
      "type": "function_call",
      "function": "check_availability",
      "params": {
        "datetime": "{{preferred_datetime}}"
      },
      "next_if_available": "confirm_booking",
      "next_if_unavailable": "suggest_alternatives"
    },
    {
      "id": "confirm_booking",
      "type": "message",
      "content": "I have availability on {{preferred_datetime}}. Would you like me to book that for {{caller_name}}?",
      "next": "await_confirmation"
    },
    {
      "id": "await_confirmation",
      "type": "input_collection",
      "variable": "confirmed",
      "validation": "yes_no",
      "next_if_yes": "execute_booking",
      "next_if_no": "collect_datetime"
    },
    {
      "id": "execute_booking",
      "type": "function_call",
      "function": "book_appointment",
      "params": {
        "name": "{{caller_name}}",
        "phone": "{{phone_number}}",
        "datetime": "{{preferred_datetime}}"
      },
      "next": "provide_confirmation"
    },
    {
      "id": "provide_confirmation",
      "type": "message",
      "content": "Perfect! You're all set for {{preferred_datetime}}. Your confirmation number is {{booking_id}}. We'll send a text reminder 24 hours before.",
      "next": "closing"
    },
    {
      "id": "closing",
      "type": "message",
      "content": "Is there anything else I can help with?",
      "next_if_yes": "greeting",
      "next_if_no": "end_call"
    },
    {
      "id": "suggest_alternatives",
      "type": "function_call",
      "function": "get_alternative_slots",
      "params": {
        "requested_datetime": "{{preferred_datetime}}"
      },
      "next": "present_alternatives"
    },
    {
      "id": "end_call",
      "type": "hangup",
      "message": "Thanks for calling! Have a great day."
    }
  ]
}

This flow handles the happy path plus one edge case (unavailable slot). In production, you'll add more branches for: wrong phone format, unclear datetime input, caller wants to cancel/reschedule, technical errors with booking system, etc.

Step 3: Integration with Claude/OpenAI

Retell supports custom LLM endpoints, which means you can use Claude API directly for superior reasoning. Here's how I integrate Claude Sonnet 4.6 for complex conversation handling.

Custom LLM Endpoint

Create a Next.js API route that wraps Claude API and returns responses in Retell's expected format:

// app/api/voice-llm/route.ts
import Anthropic from "@anthropic-ai/sdk";
import { NextResponse } from "next/server";

const anthropic = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
});

export async function POST(request: Request) {
  try {
    const { conversation_history, tools } = await request.json();

    // Convert Retell conversation format to Claude messages format
    const messages = conversation_history.map((msg: any) => ({
      role: msg.role === "agent" ? "assistant" : "user",
      content: msg.content,
    }));

    const response = await anthropic.messages.create({
      model: "claude-sonnet-4.6-2025-01-29",
      max_tokens: 150, // Keep responses short for voice
      temperature: 0.7,
      system: `You are an AI appointment scheduling assistant...
      (include full system prompt here)`,
      messages: messages,
      tools: tools || [], // Function definitions from Retell
    });

    // Extract text and function calls from Claude response
    const textContent = response.content.find(
      (block) => block.type === "text"
    );

    const toolUse = response.content.find(
      (block) => block.type === "tool_use"
    );

    return NextResponse.json({
      response: textContent?.text || "",
      function_call: toolUse ? {
        name: toolUse.name,
        arguments: toolUse.input,
      } : null,
      stop_reason: response.stop_reason,
    });

  } catch (error) {
    console.error("Claude API error:", error);
    return NextResponse.json(
      { error: "Failed to generate response" },
      { status: 500 }
    );
  }
}

Function Definitions for Tools

Define the functions your agent can call mid-conversation. Here are the appointment booking tools:

// Function definitions passed to Claude
const tools = [
  {
    name: "check_availability",
    description: "Check if requested appointment time is available",
    input_schema: {
      type: "object",
      properties: {
        datetime: {
          type: "string",
          description: "ISO 8601 datetime string (e.g., 2026-03-15T14:00:00Z)"
        }
      },
      required: ["datetime"]
    }
  },
  {
    name: "book_appointment",
    description: "Book confirmed appointment and return confirmation ID",
    input_schema: {
      type: "object",
      properties: {
        name: { type: "string" },
        phone: { type: "string" },
        datetime: { type: "string" }
      },
      required: ["name", "phone", "datetime"]
    }
  },
  {
    name: "get_alternative_slots",
    description: "Get 3 alternative time slots near requested datetime",
    input_schema: {
      type: "object",
      properties: {
        requested_datetime: { type: "string" }
      },
      required: ["requested_datetime"]
    }
  }
];

When Claude decides to call a function, Retell automatically executes your webhook endpoint (defined in Step 4) and feeds the result back into the conversation. This is how the agent can check real availability and book appointments without breaking conversational flow.

Step 4: Supabase for Call Logging and Data Storage

Every voice interaction should be logged for quality assurance, analytics, and compliance. I use Supabase to store call transcripts, outcomes, and metadata.

Database Schema

Here's the Supabase table schema I use for call logging:

-- Supabase SQL schema
CREATE TABLE call_logs (
  id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
  call_id TEXT NOT NULL UNIQUE,
  caller_phone TEXT,
  caller_name TEXT,
  started_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  ended_at TIMESTAMPTZ,
  duration_seconds INTEGER,
  transcript JSONB, -- Full conversation history
  outcome TEXT, -- 'appointment_booked', 'transferred', 'abandoned'
  booking_id TEXT, -- If appointment was booked
  recording_url TEXT,
  cost_usd DECIMAL(10,4),
  agent_version TEXT,
  metadata JSONB,
  created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Indexes for common queries
CREATE INDEX idx_call_logs_started_at ON call_logs(started_at DESC);
CREATE INDEX idx_call_logs_outcome ON call_logs(outcome);
CREATE INDEX idx_call_logs_caller_phone ON call_logs(caller_phone);

-- Row Level Security (if needed)
ALTER TABLE call_logs ENABLE ROW LEVEL SECURITY;

Webhook for Call End Event

Retell sends a webhook when each call ends. Here's my handler that logs to Supabase:

// app/api/retell-webhook/route.ts
import { createClient } from "@supabase/supabase-js";
import { NextResponse } from "next/server";

const supabase = createClient(
  process.env.SUPABASE_URL!,
  process.env.SUPABASE_SERVICE_KEY!
);

export async function POST(request: Request) {
  try {
    const event = await request.json();

    if (event.event_type === "call_ended") {
      // Log call to Supabase
      await supabase.from("call_logs").insert({
        call_id: event.call_id,
        caller_phone: event.from_number,
        started_at: event.start_timestamp,
        ended_at: event.end_timestamp,
        duration_seconds: event.call_duration,
        transcript: event.transcript,
        outcome: event.call_analysis?.outcome || "unknown",
        recording_url: event.recording_url,
        cost_usd: event.cost,
        metadata: {
          agent_id: event.agent_id,
          disconnect_reason: event.disconnect_reason,
        },
      });

      // Trigger n8n workflow for post-call actions
      await fetch(process.env.N8N_WEBHOOK_URL!, {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({
          call_id: event.call_id,
          outcome: event.call_analysis?.outcome,
          transcript: event.transcript,
        }),
      });
    }

    return NextResponse.json({ received: true });
  } catch (error) {
    console.error("Webhook error:", error);
    return NextResponse.json(
      { error: "Webhook processing failed" },
      { status: 500 }
    );
  }
}

This setup gives you a complete audit trail of every call, which is essential for debugging ("Why did the agent say that?") and proving ROI to clients ("We handled 847 calls this month, booking 312 appointments automatically").

Step 5: n8n for Post-Call Actions

The call doesn't end when the customer hangs up. You need follow-up actions: send confirmation SMS, update CRM, notify staff of urgent cases, generate daily reports. This is where n8n workflow automation shines.

Post-Call Workflow

Here's the n8n workflow I built for BellaBot's post-call automation:

Webhook Trigger — Receives call_ended event from Retell
Switch Node — Routes based on outcome (booked, transferred, abandoned)
If "appointment_booked":
- Send confirmation SMS via Twilio
- Add to Google Calendar
- Update Supabase appointments table
- Send Slack notification to team
If "transferred":
- Create high-priority ticket in support system
- Email transcript to on-call team member
If "abandoned":
- Add to follow-up list for manual callback
- Log in CRM with "incomplete" status

This automation runs 24/7 without human intervention. For a client handling 50+ calls per day, this workflow saves approximately 3 hours of manual admin work daily.

Handling Edge Cases and Errors

Real-world voice AI encounters all sorts of unexpected situations. Here's how I handle the most common edge cases in production:

Edge Case 1: Caller Speaks Unclearly

Problem: Speech-to-text misinterprets caller's name or phone number.

Solution: Add confirmation loops. After collecting critical info, have the agent repeat it back: "Just to confirm, your phone number is 214-555-0123, is that correct?" Give caller chance to correct before proceeding.

Edge Case 2: System Unavailable

Problem: Your booking API is down or slow to respond.

Solution: Implement timeout handling. If check_availability takes more than 5 seconds, agent says: "I'm having trouble checking availability right now. Can I take your information and have someone call you back within the hour?"

Edge Case 3: Caller Asks Off-Topic Questions

Problem: "What's your return policy?" or "Are you hiring?" (not appointment-related).

Solution: Train the agent to recognize out-of-scope requests and offer transfer: "That's a great question, but I specialize in scheduling. Let me transfer you to someone who can help with that."

Edge Case 4: Dead Air / Long Pauses

Problem: Caller goes silent for 10+ seconds (maybe they got distracted).

Solution: Set a silence timeout in Retell config. After 8 seconds of silence, agent prompts: "Are you still there? I'm happy to help when you're ready." After 20 seconds total, politely end call to free up resources.

Cost Optimization: From $0.10 to $0.05 per Minute

Voice AI costs add up quickly at scale. When BellaBot started handling 100+ calls per day, I optimized costs from $0.10/minute to $0.05/minute through these strategies:

Strategy 1: Prompt Caching with Claude

Claude's prompt caching reduces costs by 90% for repeated system prompts. Enable it in your API calls:

const response = await anthropic.messages.create({
  model: "claude-sonnet-4.6-2025-01-29",
  max_tokens: 150,
  system: [
    {
      type: "text",
      text: `You are an AI appointment scheduling assistant...
      (long system prompt here)`,
      cache_control: { type: "ephemeral" } // Enable caching
    }
  ],
  messages: messages,
});

This single change reduced Claude API costs by $247/month for a client handling 3,000 calls.

Strategy 2: Shorter Max Tokens

Set max_tokens: 150 instead of default 1024. Voice responses should be brief anyway—this cuts token usage by 85% with no quality loss.

Strategy 3: Use GPT-3.5 for Simple Flows

For basic FAQ answering (not complex reasoning), switch to GPT-3.5-turbo which costs 10x less than Claude or GPT-4. Reserve premium models for complex conversation logic only.

Strategy 4: Smart Call Routing

Use a simpler IVR (Interactive Voice Response) upfront: "Press 1 for appointments, Press 2 for support." Only route to AI agent when needed, avoiding expensive LLM calls for obvious routing decisions.

Cost Breakdown (per minute)

Before optimization: $0.10/min (Claude + Retell + Eleven Labs)
After optimization: $0.05/min (Cached prompts + shorter responses + GPT-3.5 for FAQs)
At 100 calls/day (avg 3 min each): Savings = $450/month

Case Study: BellaBot for VIXI — 80% Manual Labor Reduction

Let me share the real numbers from building BellaBot, the voice AI agent I deployed for VIXI clients in the veterinary and home services industries.

The Problem

One of my clients (a veterinary franchise with 12 locations) was drowning in inbound calls. They averaged 180 calls per day across all locations—mostly simple questions:

"What are your hours?"
"Do you accept walk-ins for emergencies?"
"How much does a checkup cost?"
"Can I schedule an appointment for my dog?"

They had three full-time receptionists dedicated solely to answering phones. Annual cost: $120,000 in salaries plus missed calls during busy periods leading to lost appointments.

The Solution: BellaBot

I built BellaBot using the exact architecture described in this guide:

Retell AI for voice infrastructure
Claude Sonnet 4.6 for conversation intelligence
Supabase connected to their practice management system (VETport API)
n8n workflows for appointment booking and SMS confirmations
Custom function calling to check real-time vet availability

Implementation Timeline

Week 1: System design, API integrations, initial prompt engineering
Week 2: Testing with recorded calls, conversation flow refinement
Week 3: Pilot at 2 locations (soft launch with human monitoring)
Week 4: Full rollout to all 12 locations

Results After 90 Days

5,847 total calls handled by BellaBot
4,678 calls (80%) fully resolved without human intervention
- 2,341 appointments booked automatically
- 1,889 FAQ questions answered
- 448 calls transferred to appropriate department
Average call duration: 2.8 minutes
Customer satisfaction: 4.6/5 (post-call surveys)
Cost per call: $0.14 (vs $3.20 for human receptionist at $20/hr wage)
Annual savings: $96,000 (reduced receptionist hours by 80%)
After-hours coverage: 24/7 (previously had zero coverage 6pm-8am)

What Surprised Me

The most unexpected finding? Customers preferred talking to BellaBot for simple requests. Survey feedback included:

"No hold time—I got my appointment booked in 90 seconds"
"I called at 11pm and actually got through!"
"The voice is so natural, I didn't realize it was AI until the confirmation"

The receptionists weren't replaced—they were reassigned to higher-value tasks like handling complex billing issues, following up on missed appointments, and managing emergency triage. Job satisfaction actually increased because they weren't dealing with repetitive questions all day.

Lessons Learned

Three critical lessons from this deployment:

Pilot testing is essential. We found 7 conversation edge cases in the first 50 pilot calls that would have caused major issues at full scale.
Humans need to stay in the loop. BellaBot transfers to humans for complex medical questions, pricing disputes, and angry customers. Don't try to automate 100%.
Continuous improvement matters. We review 10 random call transcripts weekly and update prompts based on recurring issues. The agent gets better every month.

Conclusion: Deployment Checklist

You now have everything you need to build a production-ready Retell AI implementation. Before going live, run through this final checklist:

Pre-Launch Checklist

✅ System prompt tested with 20+ diverse scenarios
✅ Function calling works reliably (95%+ success rate in tests)
✅ Edge case handling for unclear speech, system errors, out-of-scope requests
✅ Supabase logging configured with proper indexes
✅ n8n workflows tested for all call outcomes
✅ Call recording enabled (check local recording laws!)
✅ Monitoring dashboard set up (call volume, costs, outcomes)
✅ Transfer-to-human flow works smoothly
✅ Cost optimization applied (prompt caching, short responses)
✅ Pilot tested with 50+ real calls before full launch

Voice AI is no longer experimental—it's a proven solution for reducing operational costs while improving customer experience. If you're in the Dallas-Fort Worth area or running a business that handles high call volume, now is the time to implement automated call screening.

I offer full voice AI development services including Retell setup, conversation flow design, system integrations, and deployment. The typical implementation takes 3-4 weeks and pays for itself within 6 months through labor savings.

Ready to build your own voice agent? Let's talk. I'll review your use case, estimate costs and ROI, and show you exactly how voice AI can transform your customer operations.

Want to see more of my work? Check out my portfolio homepage or read about my Hyros attribution expertise and n8n workflow automation services.