Benchmarks

Real numbers from real code. Every metric here is traceable to a public repo you can inspect and run yourself. For a detailed walkthrough of the cost optimization techniques, read the blog post.

Token Cost Optimization

Source: EnterpriseHubservices/claude_orchestrator.py, core/llm_client.py

93K

Tokens before

7.8K

Tokens after

89%

Reduction

2.3x

Context efficiency

Breakdown by Technique

Technique Implementation Token Impact Notes
3-Tier Cache L1: in-memory dict
L2: Redis with TTL
L3: PostgreSQL
~60% of savings Repeat prompts hit cache instead of API. 87% hit rate observed.
Context Windowing Sliding window over conversation history ~25% of savings Only relevant conversation turns included. 2.3x context efficiency.
Model Routing TaskComplexity enum routes to appropriate model ~15% of savings ROUTINE tasks → smaller model, HIGH_STAKES → full model. Router adds <50ms.

Methodology: Token counts measured via API response metadata (input_tokens, output_tokens) on the lead qualification workflow. Before = no cache, full context. After = all optimizations enabled.

Bot Response Latency

Source: EnterpriseHubservices/jorge/performance_tracker.py

Operation P50 P95 P99 SLA Target
Lead Bot response ~800ms <2,000ms <3,000ms P95 < 2,000ms
Buyer Bot response ~1,000ms <2,500ms <3,500ms P95 < 2,500ms
Seller Bot response ~1,000ms <2,500ms <3,500ms P95 < 2,500ms
Cross-bot handoff ~150ms <500ms <800ms P95 < 500ms
Orchestrator overhead <200ms added latency Target: <200ms

Methodology: PerformanceTracker uses rolling-window percentile calculation with interpolation. SLA targets defined in performance_tracker.py lines 24-28.

Cache Performance

Source: EnterpriseHubservices/analytics_service.py, agents/jorge_seller_bot.py

Overall Cache Hit Rate

87%

Target: 90%+

L1 (In-Memory)

<1ms

Dict lookup, per-request scope

L2 (Redis)

~2ms

TTL-based, cross-request

L3 (PostgreSQL) serves as a fallback when Redis is unavailable. Intelligence context cache in the Seller Bot tracks per-session cache hits for market intelligence lookups.

Multi-Agent Handoff Safeguards

Source: EnterpriseHubservices/jorge/jorge_handoff_service.py

Safeguard Parameter Purpose
Confidence threshold 0.7 Minimum score to trigger handoff (tested against 200+ transcripts)
Circular prevention 30min window Same source→target blocked within window
Rate limiting 3/hr, 10/day Per-contact caps prevent abuse
Concurrent locking Contact-level Prevents race conditions between bots
Pattern learning Min 10 data points Dynamic threshold adjustment from outcome history

Test Coverage Across Repos

Repository Tests CI Python Versions
EnterpriseHub 7,678 Passing 3.11
Insight Engine 640 Passing 3.11, 3.12
DocQA Engine 550 Passing 3.11, 3.12
AgentForge 550 Passing 3.11
Scrape-and-Serve 370 Passing 3.11, 3.12
Jorge Bots 360 Passing 3.11
Revenue-Sprint 315 Passing 3.10, 3.11, 3.12
LLM Integration Starter 250 Passing 3.11, 3.12
Prompt Engineering Lab 220 Passing 3.11, 3.12
MCP Toolkit 185 Passing 3.11, 3.12
Total 8,540 All Green

Production Monitoring

Source: EnterpriseHubservices/jorge/alerting_service.py, services/jorge/bot_metrics_collector.py

7 Default Alert Rules

  • High error rate (>5%)
  • P95 latency exceeds SLA
  • Cache hit rate drops below 80%
  • Handoff failure rate >10%
  • Token consumption spike
  • Bot health check failure
  • Queue depth threshold

Per-Bot Metrics

  • Response count & throughput
  • Latency percentiles (P50/P95/P99)
  • Cache hit rate per bot
  • Error rate with categorization
  • Handoff success/failure ratio
  • Token usage per interaction
  • Configurable cooldowns per rule