Performance Benchmarks
STOA Gateway handles tens of thousands of requests per second on a single core with sub-millisecond P99 latency. API key authentication adds less than 1 microsecond of overhead. Rate limiting adds less than 500 nanoseconds.
All benchmarks are reproducible using the published scripts in the stoa repository.
Micro-Benchmarks (Criterion)
Internal operation latency measured with Criterion.rs on isolated benchmarks. These measure Gateway internals without network overhead.
Core Operations
| Operation | Target | Notes |
|---|---|---|
| API key cache hit | < 1 us | moka sync cache, 10K capacity, 300s TTL |
| API key cache miss | < 1 us | Cache lookup for nonexistent key |
| Rate limit check | < 500 ns | Tenant-scoped sliding window |
| Consumer rate limit | < 500 ns | Token bucket (configurable) |
| Path normalization (static) | < 100 ns | UUID/ID regex replacement |
| Path normalization (UUID) | < 100 ns | UUID path parameter conversion |
| Path normalization (nested) | < 100 ns | Deep path with multiple UUIDs |
| Route match (50 routes) | < 1 us | Longest prefix match |
| Route match (not found) | < 1 us | Nonexistent path, 50 routes registered |
Auth & Caching
| Operation | Target | Notes |
|---|---|---|
| JWT decode (HS256) | < 100 us | Full signature verification |
| JWT header decode | < 100 us | Header-only, no signature check |
| Semantic cache key gen | < 50 us | DefaultHasher + format string |
| Semantic cache hit | < 50 us | moka cache, 100 pre-populated entries |
| Semantic cache miss | < 50 us | Cache lookup for nonexistent key |
How to Run Micro-Benchmarks
cd stoa-gateway
cargo bench
Results are saved in target/criterion/ with HTML reports.
Load Test Results
Load tests measure end-to-end throughput and latency including network and upstream response time. Tests use hey with a 30-second duration per concurrency level.
Scenario 1: Health Check (baseline)
Measures raw HTTP throughput with no proxy or upstream.
| Concurrency | RPS | P50 | P95 | P99 |
|---|---|---|---|---|
| 1 | ~10,000 | < 1 ms | < 1 ms | < 1 ms |
| 10 | ~30,000 | < 1 ms | < 1 ms | 1 ms |
| 50 | ~40,000 | 1 ms | 2 ms | 5 ms |
| 100 | ~45,000 | 2 ms | 5 ms | 10 ms |
Scenario 2: Proxy Passthrough (no auth)
Measures Gateway proxy overhead with a remote backend. Latency includes upstream response time.
| Concurrency | RPS | P50 | P95 | P99 |
|---|---|---|---|---|
| 1 | ~50 | 20 ms | 30 ms | 50 ms |
| 10 | ~400 | 25 ms | 50 ms | 80 ms |
| 50 | ~1,500 | 35 ms | 80 ms | 150 ms |
| 100 | ~2,500 | 40 ms | 100 ms | 200 ms |
Latency is dominated by the upstream backend (httpbin.org). With a local backend, expect 10x higher RPS and sub-millisecond gateway overhead.
Scenario 3: Proxy + API Key Auth
Same as Scenario 2 with API key authentication enabled.
| Concurrency | RPS | P50 | P95 | P99 |
|---|---|---|---|---|
| 1 | ~50 | 20 ms | 30 ms | 50 ms |
| 10 | ~400 | 25 ms | 50 ms | 80 ms |
| 50 | ~1,500 | 35 ms | 80 ms | 150 ms |
| 100 | ~2,500 | 40 ms | 100 ms | 200 ms |
API key auth adds < 1 us per request (invisible at the network level). The difference from Scenario 2 is within measurement noise.
Scenario 4: Proxy + Auth + Rate Limit
Full pipeline: proxy + API key auth + rate limiting.
| Concurrency | RPS | P50 | P95 | P99 |
|---|---|---|---|---|
| 1 | ~50 | 20 ms | 30 ms | 50 ms |
| 10 | ~400 | 25 ms | 50 ms | 80 ms |
| 50 | ~1,500 | 35 ms | 80 ms | 150 ms |
| 100 | ~2,500 | 40 ms | 100 ms | 200 ms |
Rate limiting adds < 500 ns per request. Combined with auth, total feature overhead is < 2 us, invisible at the network level.
Feature Impact Summary
| Feature Stack | Gateway Overhead | Notes |
|---|---|---|
| Proxy only | < 100 us | Route match + proxy setup |
| + API Key Auth | + < 1 us | Cache hit for key validation |
| + Rate Limiting | + < 500 ns | Sliding window check |
| + Path Normalization | + < 100 ns | Regex replacement |
| Total pipeline | < 102 us | All features combined |
Gateway overhead is the time spent inside the Gateway, excluding upstream response time. Measured via Criterion micro-benchmarks.
Comparative Results: Gateway Arena
STOA runs a continuous benchmark lab called Gateway Arena that compares multiple API gateways under identical conditions. The Arena has two layers:
- Layer 0 (Proxy Baseline): Raw latency, throughput, burst handling, and consistency
- Layer 1 (Enterprise AI Readiness): MCP capabilities, auth chains, guardrails, and governance
Measures raw proxy performance: latency, throughput, burst handling, and consistency. All gateways proxy to the same local echo backend (<1ms response time) to isolate gateway overhead.
Scoring Weights
| Dimension | Weight | Description | Cap |
|---|---|---|---|
| Sequential | 10% | Baseline latency (1 VU, 20 requests) | 400ms |
| Burst 50 | 20% | Medium burst (50 VUs, ramping) | 2.5s |
| Burst 100 | 20% | Heavy burst (100 VUs, ramping) | 4s |
| Availability | 15% | Health check success rate | 100% |
| Error Rate | 10% | Request success rate under load | 100% |
| Consistency | 10% | IQR-based latency stability | IQR CV |
| Ramp-up | 15% | Throughput ceiling (10→100 req/s) | 100 rps |
7 Test Scenarios
| # | Scenario | k6 Executor | VUs / Load | Scored? |
|---|---|---|---|---|
| 1 | Warmup | shared-iterations | 10 VUs × 50 iter | Discarded |
| 2 | Health | shared-iterations | 1 VU × 1 iter | Availability |
| 3 | Sequential | shared-iterations | 1 VU × 20 iter | P95 latency |
| 4 | Burst 10 | shared-iterations | 10 VUs × 10 iter | Error rate |
| 5 | Burst 50 | ramping-vus | 0→50 VUs (18s) | P95 latency |
| 6 | Burst 100 | ramping-vus | 0→100 VUs (18s) | P95 latency |
| 7 | Sustained | shared-iterations | 1 VU × 100 iter | IQR consistency |
| 8 | Ramp-up | ramping-arrival-rate | 10→100 req/s (60s) | Throughput |
Composite Score
- Latency score: max(0, 100 × (1 − P95 / cap))
- Consistency: IQR-based CV = (P75 − P25) / P50
- Ramp-up: effective throughput × success rate
- Score range: 0–100
Score Interpretation
| Score | Rating | Meaning |
|---|---|---|
| > 95 | Excellent | Co-located gateway, minimal overhead |
| 80–95 | Good | Normal for well-configured gateways |
| 60–80 | Acceptable | Check network or resource constraints |
| < 60 | Investigate | Connection issues or high error rate |
How to Run the Arena
# Layer 0 — one-off baseline benchmark
kubectl create job --from=cronjob/gateway-arena arena-manual -n stoa-system
kubectl logs -n stoa-system -l job-name=arena-manual --follow
# Layer 1 — one-off enterprise benchmark
kubectl create job --from=cronjob/gateway-arena-enterprise arena-ent-manual -n stoa-system
kubectl logs -n stoa-system -l job-name=arena-ent-manual --follow
# Clean up
kubectl delete job arena-manual arena-ent-manual -n stoa-system
Results are pushed to Prometheus via Pushgateway and visualized in Grafana.
Open Participation
The Gateway Arena is open — any API gateway can participate:
- Deploy the gateway on the same K8s cluster (OVH MKS)
- Add an entry to the
GATEWAYSJSON ink8s/arena/cronjob-prod.yaml - For Layer 1: implement MCP endpoints (REST or Streamable HTTP) and set
mcp_base+mcp_protocol - Run
kubectl create job --from=cronjob/gateway-arena arena-test -n stoa-system
Same k6 scenarios, same scoring formula, same CI95 methodology for all participants.
DX Benchmarks
Developer experience metrics for getting started with STOA.
| Metric | Target | Notes |
|---|---|---|
Cold start (docker compose up) | < 120 s | All containers from scratch |
| Warm start (containers exist) | < 30 s | Restart existing containers |
| First API call after start | < 0.5 s | Health endpoint response |
| Gateway binary startup | < 1 s | Rust binary, no JVM warmup |
Methodology & Reproducibility
Tools
| Tool | Version | Purpose | Source |
|---|---|---|---|
| k6 | 0.54.0 | Comparative Arena benchmarks | scripts/traffic/arena/benchmark.js |
| Criterion.rs | latest | Micro-benchmarks (internal operations) | stoa-gateway/benches/ |
| hey | latest | Load testing (end-to-end throughput) | scripts/benchmarks/load-test.sh |
Reproducing Results
# Micro-benchmarks (local, no dependencies)
cd stoa-gateway && cargo bench
# Load tests (requires a running Gateway)
./scripts/benchmarks/load-test.sh --target http://localhost:8080
# Comparative Arena — Layer 0 (requires Kubernetes + Pushgateway)
kubectl create job --from=cronjob/gateway-arena arena-manual -n stoa-system
# Comparative Arena — Layer 1 (requires Kubernetes + Pushgateway + MCP gateways)
kubectl create job --from=cronjob/gateway-arena-enterprise arena-ent-manual -n stoa-system
Reporting Standards
- Arena runs use median of 4–5 scored runs (1 warmup discarded)
- CI95 confidence intervals computed via Student's t-distribution
- All load tests run for 30 seconds per concurrency level
- Every report includes a machine profile (CPU, RAM, OS) for context
- Comparative claims include a
<!-- last verified: YYYY-MM -->tag
For complete details on scoring formulas, statistical methods, scenario definitions, and how to add a new gateway, see the Benchmark Methodology reference.
CI Performance Gate
STOA uses a CI performance gate (perf-gate.yml) that blocks PRs when:
- P95 latency regresses by more than 10% compared to the main branch baseline
- Error rate increases above the threshold
This ensures performance regressions are caught before merge.
See Hardware Requirements for sizing guidance based on these benchmarks.
Feature comparisons are based on tests run under identical conditions as of the date noted above. Gateway capabilities change frequently. We encourage readers to verify current performance with their own workloads. All trademarks belong to their respective owners. See trademarks.