Benchmarks

Real numbers from real code. Every metric here is traceable to a public repo you can inspect and run yourself. For a detailed walkthrough of the cost optimization techniques, read the blog post.

Token Cost Optimization

Source: EnterpriseHubservices/claude_orchestrator.py, core/llm_client.py

93K

Tokens before

7.8K

Tokens after

89%

Reduction

2.3x

Context efficiency

Breakdown by Technique

Technique Implementation Token Impact Notes
3-Tier Cache L1: in-memory dict
L2: Redis with TTL
L3: PostgreSQL
~60% of savings Repeat prompts hit cache instead of API. 87% hit rate observed.
Context Windowing Sliding window over conversation history ~25% of savings Only relevant conversation turns included. 2.3x context efficiency.
Model Routing TaskComplexity enum routes to appropriate model ~15% of savings ROUTINE tasks → smaller model, HIGH_STAKES → full model. Router adds <50ms.

Methodology: Token counts measured via API response metadata (input_tokens, output_tokens) on the lead qualification workflow. Before = no cache, full context. After = all optimizations enabled.

Bot Response Latency

Source: EnterpriseHubservices/jorge/performance_tracker.py

Operation P50 P95 P99 SLA Target
Lead Bot response ~800ms <2,000ms <3,000ms P95 < 2,000ms
Buyer Bot response ~1,000ms <2,500ms <3,500ms P95 < 2,500ms
Seller Bot response ~1,000ms <2,500ms <3,500ms P95 < 2,500ms
Cross-bot handoff ~150ms <500ms <800ms P95 < 500ms
Orchestrator overhead <200ms added latency Target: <200ms

Methodology: PerformanceTracker uses rolling-window percentile calculation with interpolation. SLA targets defined in performance_tracker.py lines 24-28.

Cache Performance

Source: EnterpriseHubservices/analytics_service.py, agents/jorge_seller_bot.py

Overall Cache Hit Rate

87%

Target: 90%+

L1 (In-Memory)

<1ms

Dict lookup, per-request scope

L2 (Redis)

~2ms

TTL-based, cross-request

L3 (PostgreSQL) serves as a fallback when Redis is unavailable. Intelligence context cache in the Seller Bot tracks per-session cache hits for market intelligence lookups.

Multi-Agent Handoff Safeguards

Source: EnterpriseHubservices/jorge/jorge_handoff_service.py

Safeguard Parameter Purpose
Confidence threshold 0.7 Minimum score to trigger handoff (tested against 200+ transcripts)
Circular prevention 30min window Same source→target blocked within window
Rate limiting 3/hr, 10/day Per-contact caps prevent abuse
Concurrent locking Contact-level Prevents race conditions between bots
Pattern learning Min 10 data points Dynamic threshold adjustment from outcome history

Test Coverage Across Repos

Repository Tests CI Python Versions
EnterpriseHub 4,992 Passing 3.11
Insight Engine 521 Passing 3.11, 3.12
DocQA Engine 501 Passing 3.11, 3.12
AgentForge 423 Passing 3.11
Scrape-and-Serve 302 Passing 3.11, 3.12
Jorge Bots 279 Passing 3.11
Revenue-Sprint 240 Passing 3.10, 3.11, 3.12
LLM Integration Starter 220 Passing 3.11, 3.12
Prompt Engineering Lab 190 Passing 3.11, 3.12
MCP Toolkit 184 Passing 3.11, 3.12
Total 7,852 All Green

Production Monitoring

Source: EnterpriseHubservices/jorge/alerting_service.py, services/jorge/bot_metrics_collector.py

7 Default Alert Rules

  • High error rate (>5%)
  • P95 latency exceeds SLA
  • Cache hit rate drops below 80%
  • Handoff failure rate >10%
  • Token consumption spike
  • Bot health check failure
  • Queue depth threshold

Per-Bot Metrics

  • Response count & throughput
  • Latency percentiles (P50/P95/P99)
  • Cache hit rate per bot
  • Error rate with categorization
  • Handoff success/failure ratio
  • Token usage per interaction
  • Configurable cooldowns per rule