Skip to content
Skip to main content
From Idea to Deployed AI Agent: My Full Stack in Allen, TX
16 min readBy Carlos Aragon

From Idea to Deployed AI Agent: My Full Stack in Allen, TX

Most tutorials show you one tool in isolation. Reality is messier. Shipping a real AI agent means six or seven moving parts working together without breaking. I've built and iterated this stack from Allen, TX for two years. Here's the full picture.

The Gap Between "It Works in a Notebook" and "It's Running in Production"

The first AI agent I shipped took me three weeks. Not because Claude was hard to call — the API is straightforward. Not because the logic was complicated. It took three weeks because I kept discovering new layers of infrastructure I hadn't thought about. Where does the data go? How does the agent get triggered? What happens when it fails? How do clients see the output?

Two years and a dozen production agents later, I have answers to all of those questions. The answers look like a specific set of tools, each doing one job, all wired together. This post is the map I wish I had when I started.

I run VIXI LLC out of Allen, TX. We build AI automation systems for marketing agencies and B2B companies — lead qualification agents, voice screeners, attribution pipelines, client reporting bots. Every one of them runs on the same underlying stack. The tools change slightly depending on the use case, but the architecture stays the same.

The seven layers of my production AI stack:

  • Claude — reasoning, tool use, and generation
  • OpenClaw — agent runtime and CLI orchestration
  • OpenMOSS — multi-agent task queue and coordination
  • n8n — integration glue and workflow automation
  • Retell AI — voice interface for phone-based agents
  • Supabase — persistent memory, database, and real-time events
  • Vercel — deployment, API routes, and edge functions

Let's go through each one, what it does, why I chose it, and how it connects to the others.

Layer 1: Claude — The AI Brain

Claude is the reasoning layer for every agent I build. I've tested GPT-4, Gemini, and a handful of open-source models. For production work that involves long context, nuanced instruction-following, and tool use that doesn't hallucinate, Claude consistently outperforms the alternatives.

The specific capability I rely on most is tool use — Anthropic's name for structured function calling. You define a set of tools (essentially JSON Schema describing an action and its parameters), include them in the API request, and Claude will choose when to call them and with what arguments. This is how agents take real actions instead of just generating text.

Here's a minimal example of a Claude agent that can query a Supabase table:

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const tools = [
  {
    name: "query_leads",
    description: "Query the leads table from Supabase",
    input_schema: {
      type: "object",
      properties: {
        status: {
          type: "string",
          enum: ["new", "contacted", "qualified", "closed"],
          description: "Filter leads by status"
        },
        limit: {
          type: "number",
          description: "Max number of results to return"
        }
      },
      required: ["status"]
    }
  }
];

const response = await client.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: 1024,
  tools,
  messages: [
    {
      role: "user",
      content: "Show me the 10 newest uncontacted leads"
    }
  ]
});

// Claude returns a tool_use block when it wants to call query_leads
if (response.stop_reason === "tool_use") {
  const toolUse = response.content.find(b => b.type === "tool_use");
  console.log("Tool called:", toolUse.name);
  console.log("Arguments:", toolUse.input);
  // → { status: "new", limit: 10 }
}

Every agent I build starts here: a system prompt defining the agent's role, a set of tools it can call, and a loop that processes tool results and continues the conversation until the task is done. The complexity comes from the infrastructure around this loop, not the loop itself.

Layer 2: OpenClaw — Agent Runtime and CLI Orchestration

Writing raw Claude API calls works fine for one agent. For a fleet of agents that share tools, memory, and file systems, you need a consistent runtime — a layer that handles the boilerplate so each agent only defines what's unique to it.

OpenClaw is that runtime. It's a CLI-first agent framework that wraps Claude API calls with a structured workspace model. Each agent lives in its own workspace directory with defined tool access, a persistent memory file, and a standard way to hand off results to other systems.

The workspace convention matters because it lets OpenMOSS know where to look. When a MOSS task completes, it reads the agent's workspace output directory to pick up the result. When a new task starts, it writes the task context into the workspace input directory. The agents themselves don't need to know anything about the task queue — they just read from a known location and write to a known location.

# OpenClaw workspace structure
~/.openclaw/
  workspace/
    agents/
      bella-bot/
        system-prompt.md
        tools.json
        memory/
        output/
      dev-bot/
        system-prompt.md
        tools.json
        memory/
        output/
    moss-tasks/       # task lock files + outputs
    shared/           # cross-agent shared context

# Trigger a specific agent on a task
openclaw run bella-bot --task "Qualify lead: John Smith, ACME Corp"

# Run with a specific MOSS task context
openclaw run dev-bot --moss-task f1a75d48-67a0-45ba-923e-b6633daa96fe

For agents that run on a schedule or in response to events, I wrap the OpenClaw CLI call in an n8n Execute Command node. The n8n workflow handles the trigger logic; OpenClaw handles the actual agent execution. Clean separation of concerns.

Layer 3: OpenMOSS — The Multi-Agent Task Queue

OpenMOSS is a self-hosted REST API that acts as a shared task queue for all my agents. The core insight behind it: agents running in parallel need a source of truth about what's in progress, what's done, and what needs to happen next. Without that, you get race conditions, duplicated work, and failures with no audit trail.

Every significant agent action in my stack is represented as a MOSS task. Tasks have a lifecycle: pendingrunningcompleted (or failed). Any agent — or any external system — can create a task, claim it, update it, and mark it complete via the REST API.

# Create a new MOSS task
curl -X POST "http://localhost:6565/api/tasks" \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "title": "Qualify inbound lead: Sarah Chen",
    "agent": "bella-bot",
    "priority": "high",
    "metadata": {
      "lead_id": "lead_8823",
      "source": "website_form"
    }
  }'

# Response includes the task ID
# { "id": "abc123", "status": "pending", ... }

# Agent claims the task when it starts
curl -X PUT "http://localhost:6565/api/tasks/abc123/status" \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"status": "running"}'

# Mark complete when done
curl -X PUT "http://localhost:6565/api/tasks/abc123/status" \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"status": "completed", "result": "Lead qualified. Score: 87/100."}'

The task ID is the shared reference point across all systems. n8n creates the task and stores the ID. OpenClaw picks it up and runs the agent. Supabase logs the result with the same ID. That ID is what lets you reconstruct exactly what happened and when — without manually correlating logs from three different systems.

Layer 4: n8n — The Integration Glue

n8n is where external triggers meet my AI infrastructure. A new form submission, a Calendly booking, a Stripe payment, a Hyros attribution event — all of these enter the stack through n8n webhooks. From there, n8n creates MOSS tasks, triggers OpenClaw agents, updates Supabase records, and fires notifications.

The reason I use n8n instead of writing webhook handlers in code: the visual editor makes it trivial to debug, modify, and hand off to non-engineers. When a client wants to add a new trigger ("also fire the agent when a deal stage changes in HubSpot"), I add two nodes in n8n, not a new deployment.

I also maintain n8n-nodes-hyros, a custom community node for Hyros attribution that has over 4,500 installs. It's a good example of how n8n's extensibility pays off — one custom node and any Hyros event can trigger any agent in my stack.

// n8n workflow (simplified JSON representation)
// Trigger: Webhook (POST /webhook/new-lead)
// Step 1: HTTP Request → Create MOSS task
{
  "method": "POST",
  "url": "http://localhost:6565/api/tasks",
  "headers": { "Authorization": "Bearer {{ $env.MOSS_KEY }}" },
  "body": {
    "title": "Qualify: {{ $json.name }}",
    "agent": "bella-bot",
    "metadata": {
      "lead_id": "{{ $json.id }}",
      "email": "{{ $json.email }}"
    }
  }
}
// Step 2: Supabase → Insert pending lead record
// Step 3: Execute Command → openclaw run bella-bot
// Step 4: Wait for MOSS task completion (polling loop)
// Step 5: Slack notification with result

One pattern I use constantly: a polling sub-workflow in n8n that checks the MOSS task status every 30 seconds and waits for completed before proceeding. It sounds primitive but it's reliable — and because it's n8n, I can see exactly where the workflow is at any point in time.

Layer 5: Retell AI — Giving Agents a Voice

Not every agent interaction happens in a chat interface. For my VIXI clients who qualify inbound leads over the phone, the interface is voice — and Retell is the layer that makes that work.

Retell handles the telephony complexity: inbound call routing, real-time speech-to-text, LLM response generation, and text-to-speech output — all in under 800ms latency end-to-end. Under the hood, you configure a Retell agent with a Claude model and a system prompt. Retell handles the streaming, the turn-taking, and the audio. I handle the business logic in the system prompt and the tools.

The integration with the rest of my stack works through webhooks. When a Retell call ends, it POSTs the full transcript to an n8n webhook. n8n parses the outcome ("qualified", "not interested", "left voicemail"), creates a MOSS task for any follow-up work, and writes the call log to Supabase with the MOSS task ID for traceability.

BellaBot — my inbound call screening agent:

  • Retell handles inbound calls to a dedicated DID number
  • Claude (claude-sonnet-4-6) processes the conversation in real-time
  • On call end, Retell webhook → n8n → Supabase call_logs table
  • If qualified: MOSS task created → dev-bot or Orus picks up next step
  • Client sees results in a Vercel-hosted dashboard, updated in real-time via Supabase

Voice changes the value proposition significantly for agency clients. Sending AI-written emails is table stakes. Picking up the phone — and doing it consistently at 2am when humans aren't available — is something different. Retell + Claude is how that works.

Layer 6: Supabase — Persistent Memory and Real-Time Data

AI agents are stateless by default. Each conversation starts fresh. For production use, that's not acceptable — agents need to remember context across sessions, clients need their data isolated from each other, and every significant action needs a durable audit trail.

Supabase solves all three. It's PostgreSQL with a REST API, built-in auth, row-level security, and real-time subscriptions. The real-time layer is especially useful: when an agent writes a result to a table, Supabase fires an event that n8n picks up immediately — no polling required.

My core schema looks like this:

-- Core tables in my production Supabase instance

-- Agent task audit trail (synced from OpenMOSS)
create table agent_tasks (
  id uuid primary key default gen_random_uuid(),
  moss_task_id text unique not null,
  agent_name text not null,
  status text not null default 'pending',
  result jsonb,
  client_id uuid references clients(id),
  created_at timestamptz default now(),
  completed_at timestamptz
);

-- Call logs from Retell webhooks
create table call_logs (
  id uuid primary key default gen_random_uuid(),
  retell_call_id text unique not null,
  transcript text,
  outcome text,         -- 'qualified', 'not_interested', 'voicemail'
  lead_score int,
  moss_task_id text,    -- links to agent_tasks
  client_id uuid references clients(id),
  created_at timestamptz default now()
);

-- Per-client agent memory (long-term context)
create table agent_memory (
  id uuid primary key default gen_random_uuid(),
  agent_name text not null,
  client_id uuid references clients(id),
  memory_key text not null,
  memory_value jsonb not null,
  updated_at timestamptz default now(),
  unique(agent_name, client_id, memory_key)
);

Row-level security on every table means each client's data is completely isolated. A single Supabase project handles all clients because RLS policies ensure no query can return rows belonging to a different client_id. That's multi-tenancy without the infrastructure overhead of running separate databases.

Layer 7: Vercel — Deployment and Edge API Routes

Every client-facing surface in my stack — dashboards, onboarding forms, agent chat interfaces, this website — is a Next.js app deployed on Vercel. The deploy story is the best thing about Vercel: git push origin main and you're live in 60 seconds.

Beyond static deploys, I use Vercel API routes as lightweight Claude proxy endpoints. Instead of exposing my Anthropic API key to the client, the browser calls a /api/chat route on Vercel that forwards the request to Claude and streams the response back. The key stays server-side; the latency stays low because Vercel Edge runs close to users.

// app/api/chat/route.ts — Vercel Edge API route
import Anthropic from "@anthropic-ai/sdk";

export const runtime = "edge";

export async function POST(req: Request) {
  const { messages, systemPrompt } = await req.json();

  const client = new Anthropic({
    apiKey: process.env.ANTHROPIC_API_KEY,
  });

  const stream = await client.messages.stream({
    model: "claude-sonnet-4-6",
    max_tokens: 1024,
    system: systemPrompt,
    messages,
  });

  // Stream response back to the browser
  const encoder = new TextEncoder();
  const readable = new ReadableStream({
    async start(controller) {
      for await (const chunk of stream) {
        if (
          chunk.type === "content_block_delta" &&
          chunk.delta.type === "text_delta"
        ) {
          controller.enqueue(
            encoder.encode(`data: ${JSON.stringify({ text: chunk.delta.text })}

`)
          );
        }
      }
      controller.close();
    },
  });

  return new Response(readable, {
    headers: {
      "Content-Type": "text/event-stream",
      "Cache-Control": "no-cache",
    },
  });
}

Edge runtime is the right choice here because Claude streaming responses need to stay open for several seconds. Traditional serverless functions time out; Edge functions stay open as long as the stream is active.

How It All Connects: A Real Workflow End to End

Let me walk through a complete example: a new lead submits the contact form on a client's Vercel-hosted landing page. Here's what happens in the next 90 seconds.

  1. 1
    Form submit → Vercel API route

    The Next.js form POSTs to /api/leads. The route validates input, inserts a record into Supabase leads table, and returns 200.

  2. 2
    Supabase real-time → n8n webhook

    The new row triggers a Supabase real-time event. n8n is subscribed via a webhook URL configured in the Supabase function. n8n receives the full row payload.

  3. 3
    n8n creates MOSS task

    n8n POSTs to the OpenMOSS API to create a task: bella-bot: Qualify lead [name]. Stores the MOSS task ID alongside the lead record in Supabase.

  4. 4
    OpenClaw runs BellaBot

    n8n executes openclaw run bella-bot --moss-task [id]. OpenClaw reads the task context, initializes BellaBot's workspace, and starts a Claude conversation with the lead data.

  5. 5
    Retell dials the lead

    BellaBot's tools include a Retell API call. Claude decides to call initiate_call with the lead's phone number. Retell places the outbound call and streams the conversation through Claude.

  6. 6
    Result → Supabase + MOSS complete

    Call ends. Retell webhook → n8n logs transcript to call_logs. BellaBot marks MOSS task completed with outcome. Supabase leads row updated with qualification score.

  7. 7
    Client dashboard updates in real-time

    The Vercel-hosted client dashboard subscribes to the Supabase leads table. The new row appears instantly with the qualification score, call transcript, and MOSS task status — no refresh required.

Total elapsed time from form submission to a qualified lead appearing on the client dashboard: under 90 seconds on average. Without this stack, a human SDR would check the form submission hours later, make a call, and manually update a CRM. The agent does it while they sleep.

What I'd Tell Someone Starting This Stack Today

If you're building your first production AI agent, don't install all seven layers on day one. You'll spend two weeks on infrastructure and zero time on the actual agent logic. Here's the order that makes sense:

Week 1: Claude + Supabase

Get Claude calling tools and writing results to a Supabase table. That's a working agent. Ship it.

Week 2: Add n8n for triggers

Stop writing custom webhook handlers. Set up n8n to receive external events and trigger your agent. One n8n workflow replaces a lot of glue code.

Week 3–4: Add Vercel for the front end

If you need a client-facing interface, put it on Vercel. Use API routes to proxy Claude calls so your key stays server-side.

When you have 3+ agents: Add OpenClaw + OpenMOSS

You'll know you need this when you start manually coordinating between agents across sessions. OpenMOSS eliminates that coordination tax.

When a client asks for voice: Add Retell

Don't pre-build the voice layer unless you have a confirmed use case. Retell onboarding is fast when you need it.

The biggest mistake I see is over-engineering the stack before you understand the problem. Start with the simplest thing that works. Add layers when you feel the pain that layer solves — not before.

Stack Summary

ToolRoleWhen to add it
ClaudeAI reasoning + tool useDay 1
SupabaseDatabase + memory + real-time eventsDay 1
n8nIntegration glue + triggersWeek 2
VercelFront end + API routes + Edge streamingWeek 3
OpenClawAgent runtime + workspace conventions3+ agents
OpenMOSSMulti-agent task queue + coordination3+ agents
Retell AIVoice interface + telephonyVoice use case confirmed

Building from Allen, TX

I get asked sometimes why I publish this stuff — the stack details, the tool choices, the code examples. The honest answer is that I learned everything here from other people writing in public, and I think the AI agent ecosystem moves faster when practitioners share what actually works versus what they wish worked.

This stack isn't perfect. There are rough edges in every layer. But it's the stack that runs in production today, handles real client work, and has earned real money from real outcomes. That's the bar I hold it to.

If you're building something with this stack or thinking about hiring someone who runs it in production, I work with a small number of clients through VIXI. Feel free to reach out.

Need this stack built for your business?

I build production AI agent systems for marketing agencies and B2B companies. If you need lead qualification, voice AI, or automation pipelines — let's talk.