QA Automation Engineer

9,900+ Tests. All TDD-First.

15 production repositories built test-first from day one. pytest, CI/CD with GitHub Actions, LLM evaluation frameworks with RAGAS scoring and adversarial fixtures. Zero test failures across a 3-month production run.

9,900+ Tests 15 Production Repos 87-92% Coverage TDD-First Playwright (37 tests) Pact Contract Testing Locust Load Testing LLM Evaluation Frameworks

What I Test

Three pillars of test engineering across every project.

TDD-First Methodology

Every test written before the implementation it validates. Zero tests retroactively added. Coverage thresholds enforced from project inception, not bolted on after launch.

LLM Quality Infrastructure

Golden eval sets with adversarial fixtures. RAGAS scoring pipelines. Brier score confidence calibration. CI gates that block accuracy regressions before merge.

Production Security Gates

SAST scanning (bandit), dependency CVE auditing (pip-audit), strict type checking (mypy), and coverage enforcement on every commit. Merge blocked on any failure.

Test Automation Portfolio

Multi-Agent Platform · Full CI Gate

EnterpriseHub — 7,678 Tests

1,100+ CI-verified

pytest · pytest-asyncio · mypy strict · bandit · pip-audit · GitHub Actions

View on GitHub →
LLM Evaluation · Niche Differentiator

DocExtract AI — 1,183 Tests, 87%+ Coverage

94.6% accuracy

pytest · RAGAS · LLM-as-judge · Brier score calibration

Production Client System · Zero Failures

Jorge Real Estate AI — 1,824 Tests

0 failures in production

pytest · pytest-asyncio · Pact v3 · Locust · GitHub Actions

Private client project — code available on request
E2E · Visual Regression · Performance

Jewkes Consulting — 37 Playwright Tests

TypeScript · Next.js 14

Playwright · TypeScript · 4 viewport breakpoints

View on GitHub →
Cross-Stack Test Architecture

Finance Analytics Portfolio — 1,611 Tests

4 test layers

pytest · dbt tests · Vitest · API integration tests

View on GitHub →

Open Source Test Contributions

Writing tests for codebases I didn't build — the clearest signal of test-first thinking.

PR #24551 · Open 27K+ stars

LiteLLM

Typed exception mapping for BaseLLMHTTPHandler._handle_error on Anthropic messages API. 6 tests covering RateLimitError, ContextWindowExceededError, AuthenticationError, InternalServerError, no-model backward compatibility, and already-typed pass-through. Eliminates silent failure in Router fallback chains.

View PR #24551 →

CI/CD & Methodology

CI Pipeline (every commit)

  • SAST: bandit security scanning, blocks merge on findings
  • CVE scan: pip-audit for dependency vulnerabilities
  • Type safety: mypy strict mode across all modules
  • Linting: ruff for style and correctness
  • Coverage: 87-92% enforced per repo

Test Methodology

  • TDD-first: Failing test written before any implementation code. Zero tests written retroactively.
  • Test pyramid: Unit + integration + E2E + API contract + consumer-driven contract (Pact) layers per project
  • Load testing: Locust — p50/p95/p99 latency, req/s, error rate, concurrency under load
  • Contract testing: Pact v3 consumer contracts on third-party API boundaries; pact JSON versioned in repo
  • Adversarial fixtures: Contradictory inputs, truncated data, multi-language edge cases
  • LLM eval: RAGAS scoring, LLM-as-judge rubrics, Brier score calibration

Open to QA Roles — Remote

AI companies and SaaS companies with LLM products are a particularly strong fit (direct experience designing LLM evaluation frameworks). Open to: QA Automation Engineer, SDET, Software Engineer in Test, Test Automation Engineer.

US-based (Cathedral City, CA) · Canadian citizen, no sponsorship required · $35-50/hr or $70-90K salaried

caymanroden@gmail.com