Case Studies

Real engineering challenges I solved, with quantified outcomes you can verify in the code. Each case study maps to a public repo.

Production · Delivered · Closed Solutions Engineering Private client — code on request

Production AI Lead Qualification: 500+ Leads, Zero Downtime, 3 Months

Challenge

Real estate leads go cold fast — 40% are lost when response time exceeds 5 minutes. A firm's agents were responding manually, with overnight and weekend leads going unanswered. No way to distinguish buyers from sellers in the CRM or automatically route conversations to the right specialist.

Solution

Sole technical point of contact from requirements through production deployment. Built 3 specialized SMS bots (Lead Intake, Buyer Qualification, Seller Qualification) integrated into GoHighLevel CRM via dual-scheme webhook verification (HMAC + RSA). Each bot runs a structured Q0–Q4 qualification flow, scores leads hot/warm/cold, and books appointments on the agent's GHL calendar. Cross-bot handoff with 0.7 confidence threshold, 30-minute circular prevention window, contact-level locking to prevent race conditions, and 3/hr rate limiting routes leads to the right specialist without loops or errors. Bilingual EN/ES throughout.

FastAPI Claude API GoHighLevel CRM Redis PostgreSQL EN/ES

Result

500+

Leads qualified (Jan–Mar 2026)

0

Downtime events, 3-month run

<500ms

Response time

Delivered: 3 production bots, GoHighLevel CRM integration, 1,824 automated tests, full handoff documentation and runbooks. Sole technical lead — requirements through deployment.
Verified in Code Automation View Repo

AI-Powered Prospecting Pipeline with Security-First Design

Challenge

15+ hours per week spent on manual prospecting and proposal writing. No systematic way to evaluate job fit, qualify opportunities, or detect prompt injection attacks in AI-generated content pipelines.

Solution

Built 3 integrated products: an AI job scanner with a 105-point scoring rubric for automated qualification, a 4-agent proposal pipeline (Prospecting, Credential Sync, Proposal Architect, Engagement) for tailored proposal generation, and a prompt injection tester with 60+ attack patterns across 8 MITRE ATLAS threat categories. Includes a RAG Cost Optimizer for token budget management.

Python FastAPI Claude API BeautifulSoup Pandas

Result

240

Tests passing

105-pt

Scoring rubric

60+

Injection patterns

Verify: product_1_launch_kit/ (injection tester), product_2_rag_cost_optimizer/ (cost optimization), product_3_agent_orchestrator/ (proposal pipeline).
Verified in Code RAG / Embeddable AI View Repo

From Idea to Embeddable AI Product in One Sprint

Challenge

Businesses want AI chat on their websites but SaaS chatbot platforms charge $100–500/month, lock you into their ecosystem, and give limited control over the AI behavior. Self-hosting typically requires a full engineering team. Small businesses running 5 sites pay $6,000–12,000/year for basic chat widgets.

Solution

Built a self-hosted AI chatbot that embeds on any website with a single <script> tag. No npm install, no build step, no framework dependency. The ~14KB vanilla JS widget runs inside a Shadow DOM for CSS isolation. The backend uses FastAPI with pgvector for RAG-based knowledge retrieval, Redis for session state, and Claude for response generation via WebSocket streaming.

FastAPI pgvector sentence-transformers Shadow DOM WebSocket Claude AI

Result

38

Tests passing

<2s

RAG response time

$1.2–6K/yr

Saved vs SaaS

Verify: api/ (FastAPI backend), widget/ (Shadow DOM embed), tests/ (38 tests). Read full case study →
Verified in Code Voice AI View Repo

Real-Time Voice AI with Sub-3-Second End-to-End Latency

Challenge

Most voice AI demos fake latency by pre-loading responses or measuring from text input. A real production pipeline — audio capture, voice activity detection, speech-to-text, LLM reasoning, text-to-speech, and playback — has six stages of compounding latency. Without careful engineering, round-trips exceed 5–10 seconds, which is unusable for conversation.

Solution

Built a complete voice pipeline over WebSocket: OGG/Opus from the browser, FFmpeg transcoding to PCM16, Silero VAD (ONNX, ~20MB vs ~400MB PyTorch) for speech endpoint detection, Deepgram Nova-3 for streaming STT, Claude Sonnet for reasoning, sentence-level buffering with 500ms flush timeout, and Deepgram Aura-2 TTS back to the browser. 16 architectural fixes address production issues from OOM prevention to WebSocket keepalive.

FastAPI WebSocket FFmpeg Silero VAD Deepgram Claude AI

Result

<3s

End-to-end latency

20MB

VAD model (ONNX)

16

Arch fixes applied

Verify: app/ (full pipeline), README.md (latency budget table, 16 architecture fixes). Read full case study →
Verified in Code RAG / MLOps View Repo Live Demo

95.5% F1 on a CI-Replayed Golden Baseline with Hybrid RAG

Challenge

Pure vector search fails on structured documents. An invoice number like INV-2024-0047 has no semantic relationship to the query "what is the invoice number?" — it's a lexical match. Naive RAG pipelines using only cosine similarity miss exact-field lookups, causing extraction failures on the documents that matter most: invoices, contracts, medical records, and identity documents.

Solution

Built a 12-step async document extraction pipeline: upload → format detection → OCR/text extraction → document classification → type-aware chunking → pgvector embedding → hybrid retrieval (BM25 keyword + cosine similarity + Reciprocal Rank Fusion) → citation-aware LLM generation. Added production-grade reliability: AsyncCircuitBreaker model fallback (Sonnet → Haiku), OpenTelemetry + Prometheus metrics, LangSmith tracing per pipeline step, and a golden evaluation CI gate that blocks deploys if F1 regresses below the 95.5% baseline.

FastAPI pgvector BM25 + RRF ARQ Workers LangSmith Claude API Terraform / AWS

Result

95.5%

F1 on a 28-case CI-replayed golden baseline

1,280

Automated tests (1,273 passing), 81% coverage

72

Eval corpus cases (51 golden + 21 adversarial)

Verify: app/services/retrieval.py (hybrid RRF), app/langsmith_tracing.py (observability), deploy/aws/ (Terraform IaC), tests/golden/ (accuracy gate). Try it live →
Verified in Code Data & BI View Repo Live Demo

Five Lending Risk Findings That Would Change a Credit Committee's Decisions

Context

SQL and Python analysis on 1.2M+ real 2022 CFPB HMDA mortgage records covering credit risk, fair lending, and market risk. Deliverables in the formats a finance team actually opens: interactive dashboard, Excel workbook, Power BI report, SQL case studies, and written business memos with prioritized recommendations.

Key Findings

  • DTI beats FICO among prime borrowers. Among 720+ FICO borrowers, DTI ratio explains ~38% of default variance. Tightening DTI thresholds in the 720–760 band reduces expected losses by $600K–$900K per $100M with no impact on approval volume.
  • Grade B, not Grade G, is the true loss concentration. ~35% of exposure, ~42% of stressed expected loss under severe recession. Misallocation: ~$35M per $500M portfolio if capital is sized by per-loan PD rather than volume-weighted loss.
  • Parametric VaR understates tail risk by 12–18%. For a $1B equity portfolio: $12–18M in unhedged tail exposure invisible in the daily risk report.
  • Geographic fair lending risk is invisible in flat-file analysis. County-level HMDA data shows 20+ percentage point approval rate gaps. Regulatory exposure: $5–15M per fair lending consent order.

Deliverables

25-page Streamlit dashboard 8-tab Excel workbook Power BI (14 DAX measures) 11 SQL case studies Written business memos Tableau Public dashboard

Result

1.2M+

Real CFPB HMDA records

25

Dashboard pages

17

Business questions answered

Verify: analysis/ (26 modules), sql/ (11 DuckDB case studies), dashboard/pages/ (25 pages), notebooks/13_real_data_hmda.ipynb (real HMDA cleaning walkthrough). Live dashboard →
Verified in Code QA Automation

Test Architecture Designed From Day One Across Production Repos

Challenge

Every production system in this portfolio handles async state machines, third-party webhooks, or LLM outputs — categories where a production bug has no safe rollback. Retroactive test coverage misses the edge cases that actually fail: race conditions on concurrent webhook delivery, Redis TTL boundary conditions, LLM responses that are fluent but factually wrong. The constraint: every repo had to be designed for testability from the first commit.

Approach

TDD-first across all 15 repos: failing test written before any implementation code. Framework selection per project type: pytest-asyncio for async event loops (Jorge bots, EnterpriseHub), Playwright for E2E, Vitest for frontend components, dbt schema tests for data model integrity. CI gates on every repo block merge on SAST failure (bandit), CVE detection (pip-audit), type errors (mypy strict), and coverage regression. For LLM outputs: a 72-case golden evaluation corpus (51 golden + 21 adversarial: contradictory inputs, truncated documents, multi-language, high-noise OCR), CI blocks any F1 drop below the 95.5% baseline before merge.

pytest pytest-asyncio Playwright RAGAS GitHub Actions bandit mypy strict

Result

3,500+

Tests across production repos

95.5%

LLM eval F1 (28-case CI-replayed baseline)

Zero

Downtime on the jorge production run (Jan-Mar 2026)

Verify: DocExtract (1,280 tests, eval gate), mcp-server-toolkit (600 tests, 82.87% cov), jorge (1,700+ tests, zero downtime)
Verified in Code Compliance & Security View Repo

HIPAA-Ready Document Processing Pipeline with PII Detection and Audit Trails

Challenge

AI document processing systems often handle sensitive data without proper controls. Patient records, insurance documents, and medical forms contain PHI that must be detected, logged, and handled according to HIPAA requirements. Most AI implementations skip this layer entirely, creating regulatory risk that gets discovered during audits.

Solution

Built compliance controls directly into the extraction pipeline. PII detection runs on every document before processing, flagging SSNs, phone numbers, email addresses, and health record numbers. All processing events are written to an audit trail with timestamps, user IDs, and data lineage. The system includes a COMPLIANCE.md covering HIPAA requirements, SOC 2 considerations, data retention policies, and encryption at rest and in transit. An automated eval gate blocks deployments that regress below the 95.5% F1 golden baseline (28-case CI-replayed), with adversarial cases including prompt-injection tests.

FastAPI Python Claude API pgvector Redis ARQ Workers OpenTelemetry Prometheus

Result

4

PII types detected (SSN, phone, email, health IDs)

95.5%

F1 on 28-case CI-replayed golden baseline

21

Adversarial eval cases (incl. prompt injection)

Verify: app/guardrails/ (PII detection), COMPLIANCE.md (HIPAA documentation), tests/golden/ (accuracy gate), app/langsmith_tracing.py (audit trail)
Verified in Code Cloud Infrastructure View Repo

Production Kubernetes Deployment for a 3-Service AI Platform

Challenge

AI applications have different infrastructure needs than standard web apps. Background document processing workers need different scaling rules than API servers. Vector database queries need low-latency storage. LLM API calls need circuit breakers to handle provider outages without cascading failures. Standard deployment patterns don't account for these requirements.

Solution

Designed a 3-service Kubernetes deployment: API server, ARQ background worker, and Streamlit frontend. Each service has its own Horizontal Pod Autoscaler with independent scaling rules. AWS RDS (PostgreSQL + pgvector) handles persistent storage with automated backups. ElastiCache Redis handles the task queue and semantic cache. Terraform provisions all infrastructure reproducibly. Prometheus and Grafana provide observability across all 3 services, with 9 dashboard panels tracking request latency, queue depth, cache hit rate, and LLM API costs. An AsyncCircuitBreaker handles model fallback from Sonnet to Haiku when the primary provider is slow.

Kubernetes (Kustomize) Terraform AWS RDS ElastiCache Prometheus Grafana Docker GitHub Actions

Result

3

Kubernetes services with independent HPA

9

Grafana dashboard panels

2

Cloud providers (AWS RDS + ElastiCache)

Verify: deploy/k8s/ (Kubernetes manifests), deploy/aws/ (Terraform IaC), deploy/grafana/ (dashboard configs), app/circuit_breaker.py (model fallback)

YAML-Driven AI Workflow Engine with SSE Streaming and 9 MCP Servers

Challenge

AI automation tools often require code changes to add new workflow steps. Business teams can't modify Python code, so every workflow change requires a developer. Meanwhile, connecting AI workflows to external tools (calendar, email, Slack, GitHub) requires building custom integrations for each use case.

Solution

Built two complementary systems. The AI Workflow API uses YAML configuration files to define multi-step pipelines: each step specifies an action type (LLM call, webhook, data transform, notification), its inputs, and its success conditions. Non-technical users can add new workflow steps without touching code. Server-Sent Events (SSE) stream progress updates to the client in real time so long-running workflows don't appear to hang. The MCP Server Toolkit extends this to Claude Desktop: 9 custom MCP servers connect AI to calendar, email, Slack, GitHub, and custom business tools. Published on PyPI as mcp-server-toolkit==0.3.0.

FastAPI ARQ Workers Redis SSE Streaming YAML Config MCP (Model Context Protocol) Python PyPI

Result

148 + 600

Tests (ai-workflow-api + mcp-server-toolkit)

9

MCP servers

1

PyPI package (mcp-server-toolkit 0.3.0)

Verify: app/workflows/ (YAML pipeline engine), app/streaming/ (SSE implementation), src/tools/ (9 MCP servers)

Capability Demonstrations

These projects demonstrate what the systems can do. Each is a fully functional, tested application you can clone and run.

Capability Demo RAG View Repo

Document Q&A with Hybrid Retrieval and Source Citations

A RAG system that ingests PDFs, DOCX, and text documents, then answers questions with cited sources. Uses hybrid retrieval (BM25 keyword search + dense vector similarity + Reciprocal Rank Fusion) to find relevant passages. Includes a prompt engineering lab for A/B testing answer quality and per-query cost tracking.

94

Automated tests

3

Retrieval methods

Mock Mode

No API keys needed

Capability Demo Data Analytics View Repo

CSV Upload to Instant Dashboards, Attribution, and Predictions

Upload a CSV or Excel file and get auto-profiled data, interactive Plotly dashboards, marketing attribution (first-touch, last-touch, linear, time-decay), predictive modeling with SHAP explanations, automated data cleaning, and one-click PDF reports. Three demo datasets included (e-commerce, marketing touchpoints, HR attrition).

63

Automated tests

4

Attribution models

6

Modules

Capability Demo Automation View Repo

End-to-End Automation: Job Scanning, Proposal Generation, Security Testing

An automated pipeline that scans job listings with a 105-point scoring rubric, generates tailored proposals via a 4-agent pipeline (Prospecting, Credential Sync, Proposal Architect, Engagement), and includes a prompt injection testing suite with 60+ detection patterns across 8 MITRE ATLAS threat categories.

240

Automated tests

105-pt

Scoring rubric

60+

Injection patterns