89% Token Cost Reduction Across a 3-Bot AI Platform
Challenge
A real estate AI platform with 3 specialized chatbots (lead qualification, buyer matching, seller advisory) was consuming 93,000 tokens per workflow. Each bot needed conversation context, system prompts, user data, and market intelligence — all sent on every API call. Token costs were scaling linearly with conversation volume.
Solution
Built a 3-tier caching system (L1 in-memory, L2 Redis with TTL, L3 PostgreSQL fallback), context window optimization that sends only relevant turns instead of full history (2.3x efficiency), and model routing by task complexity (TaskComplexity enum routes ROUTINE tasks to faster/cheaper models).
Result
93K → 7.8K
Tokens per workflow
87%
Cache hit rate
<200ms
Orchestrator overhead
services/claude_orchestrator.py (cache layers), core/llm_client.py (TaskComplexity routing). Full benchmarks →