Real numbers from real code. Every metric here is traceable to a public repo you can inspect and run yourself. For a detailed walkthrough of the cost optimization techniques, read the blog post.
Source: EnterpriseHub — services/claude_orchestrator.py, core/llm_client.py
93K
Tokens before
7.8K
Tokens after
89%
Reduction
2.3x
Context efficiency
| Technique | Implementation | Token Impact | Notes |
|---|---|---|---|
| 3-Tier Cache | L1: in-memory dict L2: Redis with TTL L3: PostgreSQL |
~60% of savings | Repeat prompts hit cache instead of API. 87% hit rate observed. |
| Context Windowing | Sliding window over conversation history | ~25% of savings | Only relevant conversation turns included. 2.3x context efficiency. |
| Model Routing | TaskComplexity enum routes to appropriate model |
~15% of savings | ROUTINE tasks → smaller model, HIGH_STAKES → full model. Router adds <50ms. |
Methodology: Token counts measured via API response metadata (input_tokens, output_tokens) on the lead qualification workflow. Before = no cache, full context. After = all optimizations enabled.
Source: EnterpriseHub — services/jorge/performance_tracker.py
| Operation | P50 | P95 | P99 | SLA Target |
|---|---|---|---|---|
| Lead Bot response | ~800ms | <2,000ms | <3,000ms | P95 < 2,000ms |
| Buyer Bot response | ~1,000ms | <2,500ms | <3,500ms | P95 < 2,500ms |
| Seller Bot response | ~1,000ms | <2,500ms | <3,500ms | P95 < 2,500ms |
| Cross-bot handoff | ~150ms | <500ms | <800ms | P95 < 500ms |
| Orchestrator overhead | <200ms added latency | Target: <200ms | ||
Methodology: PerformanceTracker uses rolling-window percentile calculation with interpolation. SLA targets defined in performance_tracker.py lines 24-28.
Source: EnterpriseHub — services/analytics_service.py, agents/jorge_seller_bot.py
Overall Cache Hit Rate
87%
Target: 90%+
L1 (In-Memory)
<1ms
Dict lookup, per-request scope
L2 (Redis)
~2ms
TTL-based, cross-request
L3 (PostgreSQL) serves as a fallback when Redis is unavailable. Intelligence context cache in the Seller Bot tracks per-session cache hits for market intelligence lookups.
Source: EnterpriseHub — services/jorge/jorge_handoff_service.py
| Safeguard | Parameter | Purpose |
|---|---|---|
| Confidence threshold | 0.7 |
Minimum score to trigger handoff (tested against 200+ transcripts) |
| Circular prevention | 30min window |
Same source→target blocked within window |
| Rate limiting | 3/hr, 10/day |
Per-contact caps prevent abuse |
| Concurrent locking | Contact-level | Prevents race conditions between bots |
| Pattern learning | Min 10 data points | Dynamic threshold adjustment from outcome history |
| Repository | Tests | CI | Python Versions |
|---|---|---|---|
| EnterpriseHub | 4,992 | Passing | 3.11 |
| Insight Engine | 521 | Passing | 3.11, 3.12 |
| DocQA Engine | 501 | Passing | 3.11, 3.12 |
| AgentForge | 423 | Passing | 3.11 |
| Scrape-and-Serve | 302 | Passing | 3.11, 3.12 |
| Jorge Bots | 279 | Passing | 3.11 |
| Revenue-Sprint | 240 | Passing | 3.10, 3.11, 3.12 |
| LLM Integration Starter | 220 | Passing | 3.11, 3.12 |
| Prompt Engineering Lab | 190 | Passing | 3.11, 3.12 |
| MCP Toolkit | 184 | Passing | 3.11, 3.12 |
| Total | 7,852 | All Green |
Source: EnterpriseHub — services/jorge/alerting_service.py, services/jorge/bot_metrics_collector.py