Case Studies

Real engineering challenges I solved, with quantified outcomes you can verify in the code. Each case study maps to a public repo.

Verified in Code LLMOps View Repo

89% Token Cost Reduction Across a 3-Bot AI Platform

Challenge

A real estate AI platform with 3 specialized chatbots (lead qualification, buyer matching, seller advisory) was consuming 93,000 tokens per workflow. Each bot needed conversation context, system prompts, user data, and market intelligence — all sent on every API call. Token costs were scaling linearly with conversation volume.

Solution

Built a 3-tier caching system (L1 in-memory, L2 Redis with TTL, L3 PostgreSQL fallback), context window optimization that sends only relevant turns instead of full history (2.3x efficiency), and model routing by task complexity (TaskComplexity enum routes ROUTINE tasks to faster/cheaper models).

FastAPI Claude AI Redis PostgreSQL

Result

93K → 7.8K

Tokens per workflow

87%

Cache hit rate

<200ms

Orchestrator overhead

Verify: services/claude_orchestrator.py (cache layers), core/llm_client.py (TaskComplexity routing). Full benchmarks →
Production · Delivered · Closed Solutions Engineering Private client — code on request

Production AI Lead Qualification: 500+ Leads, Zero Downtime, 3 Months

Challenge

Real estate leads go cold fast — 40% are lost when response time exceeds 5 minutes. A firm's agents were responding manually, with overnight and weekend leads going unanswered. No way to distinguish buyers from sellers in the CRM or automatically route conversations to the right specialist.

Solution

Sole technical point of contact from requirements through production deployment. Built 3 specialized SMS bots (Lead Intake, Buyer Qualification, Seller Qualification) integrated into GoHighLevel CRM via dual-scheme webhook verification (HMAC + RSA). Each bot runs a structured Q0–Q4 qualification flow, scores leads hot/warm/cold, and books appointments on the agent's GHL calendar. Cross-bot handoff with 0.7 confidence threshold, 30-minute circular prevention window, contact-level locking to prevent race conditions, and 3/hr rate limiting routes leads to the right specialist without loops or errors. Bilingual EN/ES throughout.

FastAPI Claude API GoHighLevel CRM Redis PostgreSQL EN/ES

Result

500+

Leads qualified (Jan–Mar 2026)

0

Downtime events, 3-month run

<500ms

Response time

Delivered: 3 production bots, GoHighLevel CRM integration, 1,824 automated tests, full handoff documentation and runbooks. Sole technical lead — requirements through deployment.
Verified in Code Automation View Repo

AI-Powered Prospecting Pipeline with Security-First Design

Challenge

15+ hours per week spent on manual prospecting and proposal writing. No systematic way to evaluate job fit, qualify opportunities, or detect prompt injection attacks in AI-generated content pipelines.

Solution

Built 3 integrated products: an AI job scanner with a 105-point scoring rubric for automated qualification, a 4-agent proposal pipeline (Prospecting, Credential Sync, Proposal Architect, Engagement) for tailored proposal generation, and a prompt injection tester with 60+ attack patterns across 8 MITRE ATLAS threat categories. Includes a RAG Cost Optimizer for token budget management.

Python FastAPI Claude API BeautifulSoup Pandas

Result

240

Tests passing

105-pt

Scoring rubric

60+

Injection patterns

Verify: product_1_launch_kit/ (injection tester), product_2_rag_cost_optimizer/ (cost optimization), product_3_agent_orchestrator/ (proposal pipeline).
Verified in Code RAG / Embeddable AI View Repo

From Idea to Embeddable AI Product in One Sprint

Challenge

Businesses want AI chat on their websites but SaaS chatbot platforms charge $100–500/month, lock you into their ecosystem, and give limited control over the AI behavior. Self-hosting typically requires a full engineering team. Small businesses running 5 sites pay $6,000–12,000/year for basic chat widgets.

Solution

Built a self-hosted AI chatbot that embeds on any website with a single <script> tag. No npm install, no build step, no framework dependency. The ~14KB vanilla JS widget runs inside a Shadow DOM for CSS isolation. The backend uses FastAPI with pgvector for RAG-based knowledge retrieval, Redis for session state, and Claude for response generation via WebSocket streaming.

FastAPI pgvector sentence-transformers Shadow DOM WebSocket Claude AI

Result

38

Tests passing

<2s

RAG response time

$1.2–6K/yr

Saved vs SaaS

Verify: api/ (FastAPI backend), widget/ (Shadow DOM embed), tests/ (38 tests). Read full case study →
Verified in Code Voice AI View Repo

Real-Time Voice AI with Sub-3-Second End-to-End Latency

Challenge

Most voice AI demos fake latency by pre-loading responses or measuring from text input. A real production pipeline — audio capture, voice activity detection, speech-to-text, LLM reasoning, text-to-speech, and playback — has six stages of compounding latency. Without careful engineering, round-trips exceed 5–10 seconds, which is unusable for conversation.

Solution

Built a complete voice pipeline over WebSocket: OGG/Opus from the browser, FFmpeg transcoding to PCM16, Silero VAD (ONNX, ~20MB vs ~400MB PyTorch) for speech endpoint detection, Deepgram Nova-3 for streaming STT, Claude Sonnet for reasoning, sentence-level buffering with 500ms flush timeout, and Deepgram Aura-2 TTS back to the browser. 16 architectural fixes address production issues from OOM prevention to WebSocket keepalive.

FastAPI WebSocket FFmpeg Silero VAD Deepgram Claude AI

Result

<3s

End-to-end latency

20MB

VAD model (ONNX)

16

Arch fixes applied

Verify: app/ (full pipeline), README.md (latency budget table, 16 architecture fixes). Read full case study →
Verified in Code RAG / MLOps View Repo Live Demo

94.6% Extraction Accuracy on 16 Document Types with Hybrid RAG

Challenge

Pure vector search fails on structured documents. An invoice number like INV-2024-0047 has no semantic relationship to the query "what is the invoice number?" — it's a lexical match. Naive RAG pipelines using only cosine similarity miss exact-field lookups, causing extraction failures on the documents that matter most: invoices, contracts, medical records, and identity documents.

Solution

Built a 12-step async document extraction pipeline: upload → format detection → OCR/text extraction → document classification → type-aware chunking → pgvector embedding → hybrid retrieval (BM25 keyword + cosine similarity + Reciprocal Rank Fusion) → citation-aware LLM generation. Added production-grade reliability: AsyncCircuitBreaker model fallback (Sonnet → Haiku), OpenTelemetry + Prometheus metrics, LangSmith tracing per pipeline step, and a golden evaluation CI gate that blocks deploys if accuracy regresses below 94.6%.

FastAPI pgvector BM25 + RRF ARQ Workers LangSmith Claude API Terraform / AWS

Result

94.6%

Extraction accuracy (16 doc types)

1,183

Automated tests, 90%+ coverage

12%

Retrieval gain vs. pure vector

Verify: app/services/retrieval.py (hybrid RRF), app/langsmith_tracing.py (observability), deploy/aws/ (Terraform IaC), tests/golden/ (accuracy gate). Try it live →
Verified in Code Data & BI View Repo Live Demo

Five Lending Risk Findings That Would Change a Credit Committee's Decisions

Context

SQL and Python analysis on 1.2M+ real 2022 CFPB HMDA mortgage records covering credit risk, fair lending, and market risk. Deliverables in the formats a finance team actually opens: interactive dashboard, Excel workbook, Power BI report, SQL case studies, and written business memos with prioritized recommendations.

Key Findings

  • DTI beats FICO among prime borrowers. Among 720+ FICO borrowers, DTI ratio explains ~38% of default variance. Tightening DTI thresholds in the 720–760 band reduces expected losses by $600K–$900K per $100M with no impact on approval volume.
  • Grade B, not Grade G, is the true loss concentration. ~35% of exposure, ~42% of stressed expected loss under severe recession. Misallocation: ~$35M per $500M portfolio if capital is sized by per-loan PD rather than volume-weighted loss.
  • Parametric VaR understates tail risk by 12–18%. For a $1B equity portfolio: $12–18M in unhedged tail exposure invisible in the daily risk report.
  • Geographic fair lending risk is invisible in flat-file analysis. County-level HMDA data shows 20+ percentage point approval rate gaps. Regulatory exposure: $5–15M per fair lending consent order.

Deliverables

25-page Streamlit dashboard 8-tab Excel workbook Power BI (14 DAX measures) 11 SQL case studies Written business memos Tableau Public dashboard

Result

1.2M+

Real CFPB HMDA records

25

Dashboard pages

17

Business questions answered

Verify: analysis/ (26 modules), sql/ (11 DuckDB case studies), dashboard/pages/ (25 pages), notebooks/13_real_data_hmda.ipynb (real HMDA cleaning walkthrough). Live dashboard →
Verified in Code QA Automation

12,000+ Tests Across 15 Repos: Test Architecture Designed From Day One

Challenge

Every production system in this portfolio handles async state machines, third-party webhooks, or LLM outputs — categories where a production bug has no safe rollback. Retroactive test coverage misses the edge cases that actually fail: race conditions on concurrent webhook delivery, Redis TTL boundary conditions, LLM responses that are fluent but factually wrong. The constraint: every repo had to be designed for testability from the first commit.

Approach

TDD-first across all 15 repos: failing test written before any implementation code. Framework selection per project type: pytest-asyncio for async event loops (Jorge bots, EnterpriseHub), Playwright for E2E, Vitest for frontend components, dbt schema tests for data model integrity. CI gates on every repo block merge on SAST failure (bandit), CVE detection (pip-audit), type errors (mypy strict), and coverage regression. For LLM outputs: 24-fixture golden evaluation set with 8 adversarial cases (contradictory inputs, truncated documents, multi-language, high-noise OCR) — CI blocks any accuracy drop below 94.6% baseline before merge.

pytest pytest-asyncio Playwright RAGAS GitHub Actions bandit mypy strict

Result

12,000+

Tests across 15 repos

94.6%

LLM eval accuracy (24-fixture golden set)

0

Production regressions (Jan–Mar 2026)

Verify: EnterpriseHub (7,678 tests, CI gate), DocExtract (1,183 tests, RAGAS eval), Finance Analytics (1,611 tests, 4 test layers)
Verified in Code Compliance & Security View Repo

HIPAA-Ready Document Processing Pipeline with PII Detection and Audit Trails

Challenge

AI document processing systems often handle sensitive data without proper controls. Patient records, insurance documents, and medical forms contain PHI that must be detected, logged, and handled according to HIPAA requirements. Most AI implementations skip this layer entirely, creating regulatory risk that gets discovered during audits.

Solution

Built compliance controls directly into the extraction pipeline. PII detection runs on every document before processing, flagging SSNs, phone numbers, email addresses, and health record numbers. All processing events are written to an audit trail with timestamps, user IDs, and data lineage. The system includes a COMPLIANCE.md covering HIPAA requirements, SOC 2 considerations, data retention policies, and encryption at rest and in transit. An automated eval gate blocks deployments that regress below 94.6% accuracy on the 28-fixture test suite, including 4 prompt injection adversarial tests.

FastAPI Python Claude API pgvector Redis ARQ Workers OpenTelemetry Prometheus

Result

4

PII types detected (SSN, phone, email, health IDs)

94.6%

Extraction accuracy (28-fixture eval)

12

Adversarial test cases (incl. 4 prompt injection)

Verify: app/guardrails/ (PII detection), COMPLIANCE.md (HIPAA documentation), tests/golden/ (accuracy gate), app/langsmith_tracing.py (audit trail)
Verified in Code Cloud Infrastructure View Repo

Production Kubernetes Deployment for a 3-Service AI Platform

Challenge

AI applications have different infrastructure needs than standard web apps. Background document processing workers need different scaling rules than API servers. Vector database queries need low-latency storage. LLM API calls need circuit breakers to handle provider outages without cascading failures. Standard deployment patterns don't account for these requirements.

Solution

Designed a 3-service Kubernetes deployment: API server, ARQ background worker, and Streamlit frontend. Each service has its own Horizontal Pod Autoscaler with independent scaling rules. AWS RDS (PostgreSQL + pgvector) handles persistent storage with automated backups. ElastiCache Redis handles the task queue and semantic cache. Terraform provisions all infrastructure reproducibly. Prometheus and Grafana provide observability across all 3 services, with 9 dashboard panels tracking request latency, queue depth, cache hit rate, and LLM API costs. An AsyncCircuitBreaker handles model fallback from Sonnet to Haiku when the primary provider is slow.

Kubernetes (Kustomize) Terraform AWS RDS ElastiCache Prometheus Grafana Docker GitHub Actions

Result

3

Kubernetes services with independent HPA

9

Grafana dashboard panels

2

Cloud providers (AWS RDS + ElastiCache)

Verify: deploy/k8s/ (Kubernetes manifests), deploy/aws/ (Terraform IaC), deploy/grafana/ (dashboard configs), app/circuit_breaker.py (model fallback)

YAML-Driven AI Workflow Engine with SSE Streaming and 9 MCP Servers

Challenge

AI automation tools often require code changes to add new workflow steps. Business teams can't modify Python code, so every workflow change requires a developer. Meanwhile, connecting AI workflows to external tools (calendar, email, Slack, GitHub) requires building custom integrations for each use case.

Solution

Built two complementary systems. The AI Workflow API uses YAML configuration files to define multi-step pipelines: each step specifies an action type (LLM call, webhook, data transform, notification), its inputs, and its success conditions. Non-technical users can add new workflow steps without touching code. Server-Sent Events (SSE) stream progress updates to the client in real time so long-running workflows don't appear to hang. The MCP Server Toolkit extends this to Claude Desktop: 9 custom MCP servers connect AI to calendar, email, Slack, GitHub, and custom business tools. Published on PyPI as mcp-server-toolkit==0.2.0.

FastAPI ARQ Workers Redis SSE Streaming YAML Config MCP (Model Context Protocol) Python PyPI

Result

148 + 412

Tests (ai-workflow-api + mcp-server-toolkit)

9

MCP servers

1

PyPI package (mcp-server-toolkit 0.2.0)

Verify: app/workflows/ (YAML pipeline engine), app/streaming/ (SSE implementation), src/tools/ (9 MCP servers)

Capability Demonstrations

These projects demonstrate what the systems can do. Each is a fully functional, tested application you can clone and run.

Capability Demo RAG View Repo

Document Q&A with Hybrid Retrieval and Source Citations

A RAG system that ingests PDFs, DOCX, and text documents, then answers questions with cited sources. Uses hybrid retrieval (BM25 keyword search + dense vector similarity + Reciprocal Rank Fusion) to find relevant passages. Includes a prompt engineering lab for A/B testing answer quality and per-query cost tracking.

94

Automated tests

3

Retrieval methods

Mock Mode

No API keys needed

Capability Demo Data Analytics View Repo

CSV Upload to Instant Dashboards, Attribution, and Predictions

Upload a CSV or Excel file and get auto-profiled data, interactive Plotly dashboards, marketing attribution (first-touch, last-touch, linear, time-decay), predictive modeling with SHAP explanations, automated data cleaning, and one-click PDF reports. Three demo datasets included (e-commerce, marketing touchpoints, HR attrition).

63

Automated tests

4

Attribution models

6

Modules

Capability Demo Automation View Repo

End-to-End Automation: Job Scanning, Proposal Generation, Security Testing

An automated pipeline that scans job listings with a 105-point scoring rubric, generates tailored proposals via a 4-agent pipeline (Prospecting, Credential Sync, Proposal Architect, Engagement), and includes a prompt injection testing suite with 60+ detection patterns across 8 MITRE ATLAS threat categories.

240

Automated tests

105-pt

Scoring rubric

60+

Injection patterns