AI/LLM Developer

Production AI Systems, End to End

Shipped a Claude-powered lead qualification platform processing 500+ real leads with zero downtime. Built production RAG with 94.6% accuracy on adversarial eval. Published to PyPI. Contributing to LiteLLM (27K+ stars). 9,956+ automated tests across production repos.

500+ Leads Processed 94.6% RAG Accuracy PyPI Published MCP Server Toolkit 9,956+ Tests Open Source Contributor

What I Build

Production AI infrastructure — from retrieval pipelines to multi-agent systems

Production RAG Pipelines

Hybrid retrieval (BM25 + cosine + RRF), citation-aware answers, agentic ReAct reasoning, semantic caching (88% hit rate), 28-fixture adversarial eval suite with prompt injection defense. CI regression gate at 94.6% accuracy.

Agentic AI / Multi-Agent Systems

Domain-specific agent mesh with ReAct orchestration, 3-tier cache (L1 memory, L2 Redis, L3 PostgreSQL), per-agent model routing (Haiku/Sonnet/Opus), circuit-breaker failover, human handoff protocols. MCP server toolkit published to PyPI.

LLM Evaluation Infrastructure

Golden eval suites with RAGAS scoring, LLM-as-judge CI gates, adversarial fixtures (prompt injection, data exfiltration, roleplay override). Levenshtein similarity, Brier score calibration, field-level weighted accuracy.

Projects

Production · Live Client

Jorge Real Estate AI

3 Claude-powered SMS bots handling lead qualification for a real estate firm. 500+ leads processed, under 500ms response time, bilingual EN/ES, zero downtime over 3-month production run.

Capabilities

  • Lead Intake, Buyer, and Seller qualification bots
  • Tiered model routing (Haiku/Sonnet/Opus)
  • GoHighLevel CRM integration via webhooks
  • Bilingual English/Spanish with no quality degradation

Stack

  • Python, FastAPI, Redis, PostgreSQL
  • Claude API (tool_use, streaming, multi-turn)
  • GoHighLevel API, Twilio SMS
  • 1,702 tests · Render deployment
Production RAG · Live Demo

DocExtract AI

Async document processing with hybrid retrieval, citation-aware answers, and agentic ReAct reasoning. 94.6% extraction accuracy on 28-fixture golden eval (12 adversarial cases including 4 prompt injection attacks).

Capabilities

  • Hybrid retrieval: BM25 + cosine + RRF
  • Semantic caching (88% hit rate)
  • Circuit breaker model fallback
  • RAGAS evaluation + LLM-as-judge CI gate

Stack

  • FastAPI, ARQ, pgvector, Claude API
  • Sentence Transformers, Streamlit
  • Kubernetes manifests, AWS Terraform IaC
  • 1,185 tests · 87%+ coverage
Multi-Agent Orchestration

EnterpriseHub

Domain-specific agent mesh with 3-tier cache achieving 88% aggregate hit rate. 8 agent capabilities, circuit-breaker failover, per-agent model routing, OWASP-hardened security, and OpenTelemetry instrumentation.

Capabilities

  • Lead Intake, Buyer, Seller agent mesh
  • L1 memory, L2 Redis, L3 PostgreSQL cache
  • Per-agent model routing (Haiku/Sonnet/Opus)
  • Ed25519 webhook verification, Redis rate limiting

Stack

  • FastAPI, PostgreSQL, Redis, LangGraph
  • Claude API, Prometheus, Grafana
  • OpenTelemetry, 9-panel dashboard configs
  • 6,657 tests
GitHub →
PyPI Package · Published

mcp-server-toolkit

9 pre-built MCP servers with A2A adapter, auto-caching, rate limiting, auth middleware. MCPTestClient for testing without live API keys. Reduces LLM tool integration from days to a single import.

9 MCP servers · A2A adapter · 412 tests · 88% coverage

Open Source Contributions

LiteLLM · 27K+ stars

Typed Exception Mapping for Router Fallback

PR #24551 -- Surfaces AuthenticationError, RateLimitError, and NotFoundError distinctly through the Router fallback chain instead of swallowing as generic Exception. Enables callers to implement appropriate recovery strategies per error type.

Also: open PRs in FastAPI (80K+ stars, #15217) and pgvector-python (#151)

AI/LLM Certifications

IBM Generative AI Engineering 144 hours
DeepLearning.AI Deep Learning Specialization 120 hours
Microsoft AI & ML Engineering 75 hours
Duke University LLMOps Specialization 48 hours
IBM RAG and Agentic AI 24 hours
Google Cloud Generative AI Leader 25 hours
Claude Code in Action — Anthropic 3 hours

7 AI/LLM certifications · 439 hours · IBM, DeepLearning.AI, Microsoft, Duke, Google, Anthropic

Open to AI/LLM Roles — Remote

Targeting teams building production LLM applications, RAG systems, agentic AI, and developer tooling. Open to: AI/LLM Developer, MLOps Engineer, Agentic AI Engineer, AI Platform Engineer.

US-based (Cathedral City, CA) · Canadian citizen, no sponsorship required

caymanroden@gmail.com