Skip to main content

Performance Benchmarks

STOA Gateway handles tens of thousands of requests per second on a single core with sub-millisecond P99 latency. API key authentication adds less than 1 microsecond of overhead. Rate limiting adds less than 500 nanoseconds.

All benchmarks are reproducible using the published scripts in the stoa repository.

Micro-Benchmarks (Criterion)

Internal operation latency measured with Criterion.rs on isolated benchmarks. These measure Gateway internals without network overhead.

Core Operations

OperationTargetNotes
API key cache hit< 1 usmoka sync cache, 10K capacity, 300s TTL
API key cache miss< 1 usCache lookup for nonexistent key
Rate limit check< 500 nsTenant-scoped sliding window
Consumer rate limit< 500 nsToken bucket (configurable)
Path normalization (static)< 100 nsUUID/ID regex replacement
Path normalization (UUID)< 100 nsUUID path parameter conversion
Path normalization (nested)< 100 nsDeep path with multiple UUIDs
Route match (50 routes)< 1 usLongest prefix match
Route match (not found)< 1 usNonexistent path, 50 routes registered

Auth & Caching

OperationTargetNotes
JWT decode (HS256)< 100 usFull signature verification
JWT header decode< 100 usHeader-only, no signature check
Semantic cache key gen< 50 usDefaultHasher + format string
Semantic cache hit< 50 usmoka cache, 100 pre-populated entries
Semantic cache miss< 50 usCache lookup for nonexistent key

How to Run Micro-Benchmarks

cd stoa-gateway
cargo bench

Results are saved in target/criterion/ with HTML reports.

Load Test Results

Load tests measure end-to-end throughput and latency including network and upstream response time. Tests use hey with a 30-second duration per concurrency level.

Scenario 1: Health Check (baseline)

Measures raw HTTP throughput with no proxy or upstream.

ConcurrencyRPSP50P95P99
1~10,000< 1 ms< 1 ms< 1 ms
10~30,000< 1 ms< 1 ms1 ms
50~40,0001 ms2 ms5 ms
100~45,0002 ms5 ms10 ms

Scenario 2: Proxy Passthrough (no auth)

Measures Gateway proxy overhead with a remote backend. Latency includes upstream response time.

ConcurrencyRPSP50P95P99
1~5020 ms30 ms50 ms
10~40025 ms50 ms80 ms
50~1,50035 ms80 ms150 ms
100~2,50040 ms100 ms200 ms

Latency is dominated by the upstream backend (httpbin.org). With a local backend, expect 10x higher RPS and sub-millisecond gateway overhead.

Scenario 3: Proxy + API Key Auth

Same as Scenario 2 with API key authentication enabled.

ConcurrencyRPSP50P95P99
1~5020 ms30 ms50 ms
10~40025 ms50 ms80 ms
50~1,50035 ms80 ms150 ms
100~2,50040 ms100 ms200 ms

API key auth adds < 1 us per request (invisible at the network level). The difference from Scenario 2 is within measurement noise.

Scenario 4: Proxy + Auth + Rate Limit

Full pipeline: proxy + API key auth + rate limiting.

ConcurrencyRPSP50P95P99
1~5020 ms30 ms50 ms
10~40025 ms50 ms80 ms
50~1,50035 ms80 ms150 ms
100~2,50040 ms100 ms200 ms

Rate limiting adds < 500 ns per request. Combined with auth, total feature overhead is < 2 us, invisible at the network level.

Feature Impact Summary

Feature StackGateway OverheadNotes
Proxy only< 100 usRoute match + proxy setup
+ API Key Auth+ < 1 usCache hit for key validation
+ Rate Limiting+ < 500 nsSliding window check
+ Path Normalization+ < 100 nsRegex replacement
Total pipeline< 102 usAll features combined

Gateway overhead is the time spent inside the Gateway, excluding upstream response time. Measured via Criterion micro-benchmarks.

Comparative Results: Gateway Arena

STOA runs a continuous benchmark lab called Gateway Arena that compares multiple API gateways under identical conditions. The Arena has two layers:

  • Layer 0 (Proxy Baseline): Raw latency, throughput, burst handling, and consistency
  • Layer 1 (Enterprise AI Readiness): MCP capabilities, auth chains, guardrails, and governance

Measures raw proxy performance: latency, throughput, burst handling, and consistency. All gateways proxy to the same local echo backend (<1ms response time) to isolate gateway overhead.

Scoring Weights

DimensionWeightDescriptionCap
Sequential10%Baseline latency (1 VU, 20 requests)400ms
Burst 5020%Medium burst (50 VUs, ramping)2.5s
Burst 10020%Heavy burst (100 VUs, ramping)4s
Availability15%Health check success rate100%
Error Rate10%Request success rate under load100%
Consistency10%IQR-based latency stabilityIQR CV
Ramp-up15%Throughput ceiling (10→100 req/s)100 rps

7 Test Scenarios

#Scenariok6 ExecutorVUs / LoadScored?
1Warmupshared-iterations10 VUs × 50 iterDiscarded
2Healthshared-iterations1 VU × 1 iterAvailability
3Sequentialshared-iterations1 VU × 20 iterP95 latency
4Burst 10shared-iterations10 VUs × 10 iterError rate
5Burst 50ramping-vus0→50 VUs (18s)P95 latency
6Burst 100ramping-vus0→100 VUs (18s)P95 latency
7Sustainedshared-iterations1 VU × 100 iterIQR consistency
8Ramp-upramping-arrival-rate10→100 req/s (60s)Throughput

Composite Score

Score = 0.10×Sequential + 0.20×Burst50 + 0.20×Burst100 + 0.15×Availability + 0.10×ErrorRate + 0.10×Consistency + 0.15×Ramp-up
  • Latency score: max(0, 100 × (1 − P95 / cap))
  • Consistency: IQR-based CV = (P75 − P25) / P50
  • Ramp-up: effective throughput × success rate
  • Score range: 0–100

Measures enterprise AI readiness across 8 dimensions. Gateways without MCP support score 0 on MCP-dependent dimensions (not N/A). The spec is open — any gateway can implement and re-run.

Participating Gateways

STOA Gateway

Stack: Rust + Tokio
MCP: Yes
License: Apache 2.0

Kong

Stack: Lua + Nginx
MCP: No (OSS)
License: Apache 2.0 (OSS)

Gravitee

Stack: Java + Vert.x
MCP: Yes
License: Apache 2.0

8 Enterprise Dimensions

DimensionWeightDescriptionCap
MCP Discovery15%GET /mcp/capabilities500ms
MCP Tool Exec20%POST /mcp/tools/list (JSON-RPC)500ms
Auth Chain15%JWT + authenticated tool call1s
Policy Engine15%OPA policy evaluation overhead200ms
AI Guardrails10%PII detection and redaction1s
Rate Limiting10%429 enforcement accuracy1s
Resilience10%Bad input → 4xx (not 500)1s
Governance5%Session and circuit-breaker endpoints2s

Per-Dimension Score

dimension = 0.6 × availability_score + 0.4 × latency_score availability = (passes / total) × 100 latency = max(0, 100 × (1 − P95 / cap))
  • Gateways without MCP score 0 on MCP dimensions (dimensions 1–5, 7)
  • Rate limiting (dim 6) and Governance (dim 8) do not require MCP
  • Score range per dimension: 0–100

Enterprise Readiness Index

ERI = Σ(weight_i × dimension_i) for all 8 dimensions
  • Total weight: 1.0 (100%)
  • MCP-dependent dimensions: 75% of total weight
  • Score range: 0–100

MCP Protocol Variants

GatewayMCP ProtocolEndpoint Pattern
STOAREST APIGET /capabilities, POST /tools/list, POST /tools/call
Gravitee 4.8Streamable HTTP (JSON-RPC 2.0)POST /mcp with JSON-RPC body
Kong OSSNone (Enterprise-only plugin)N/A

Test Infrastructure

ParameterLayer 0Layer 1
Toolk6 v0.54.0k6 v0.54.0
ScheduleEvery 30 minHourly
Runs per gateway5 (discard 1st)3 (discard 1st)
Scored runs4 (n=4)2 (n=2)
Statistical methodMedian + CI95 (t-distribution)Median + CI95 (t-distribution)
BackendLocal echo server (nginx, <1ms)Local echo server (nginx, <1ms)
CPU (guaranteed)1 core500m–1 core
Memory (guaranteed)512 MiB256–512 MiB
ClusterOVH MKS (Managed K8s)OVH MKS (Managed K8s)

CI95 Confidence Intervals

CI95 = mean ± t(α/2, n-1) × (stddev / √n) where: n = number of scored runs (4 for L0, 2 for L1) t-value = Student's t-distribution critical value α = 0.05 (95% confidence)
  • df=3 (L0): t = 3.182
  • df=1 (L1): t = 12.706
  • Wider intervals with fewer runs — by design (conservative)
  • Warmup run always discarded (JVM, cache priming)

Fairness Guarantees

  • Same backend: All gateways proxy to the same nginx echo server (static JSON, <1ms)
  • Same cluster: All K8s gateways run on OVH MKS with identical resource limits
  • Same tool: k6 v0.54.0 for all scenarios, all gateways
  • Same scoring: Identical formulas applied to all gateways — no per-gateway adjustments
  • Open methodology: All scripts are open-source in the STOA repository
  • MCP = 0 (not N/A): Gateways without MCP score 0 on MCP dimensions, maintaining a single 0–100 scale
Benchmark results are from a controlled test environment using methodology v2.0. Real-world performance depends on hardware, network, configuration, and workload. We encourage readers to reproduce these benchmarks using the published scripts. Product names and logos are trademarks of their respective owners. STOA Platform is not affiliated with or endorsed by any mentioned vendor.

Score Interpretation

ScoreRatingMeaning
> 95ExcellentCo-located gateway, minimal overhead
80–95GoodNormal for well-configured gateways
60–80AcceptableCheck network or resource constraints
< 60InvestigateConnection issues or high error rate

How to Run the Arena

# Layer 0 — one-off baseline benchmark
kubectl create job --from=cronjob/gateway-arena arena-manual -n stoa-system
kubectl logs -n stoa-system -l job-name=arena-manual --follow

# Layer 1 — one-off enterprise benchmark
kubectl create job --from=cronjob/gateway-arena-enterprise arena-ent-manual -n stoa-system
kubectl logs -n stoa-system -l job-name=arena-ent-manual --follow

# Clean up
kubectl delete job arena-manual arena-ent-manual -n stoa-system

Results are pushed to Prometheus via Pushgateway and visualized in Grafana.

Open Participation

The Gateway Arena is open — any API gateway can participate:

  1. Deploy the gateway on the same K8s cluster (OVH MKS)
  2. Add an entry to the GATEWAYS JSON in k8s/arena/cronjob-prod.yaml
  3. For Layer 1: implement MCP endpoints (REST or Streamable HTTP) and set mcp_base + mcp_protocol
  4. Run kubectl create job --from=cronjob/gateway-arena arena-test -n stoa-system

Same k6 scenarios, same scoring formula, same CI95 methodology for all participants.

DX Benchmarks

Developer experience metrics for getting started with STOA.

MetricTargetNotes
Cold start (docker compose up)< 120 sAll containers from scratch
Warm start (containers exist)< 30 sRestart existing containers
First API call after start< 0.5 sHealth endpoint response
Gateway binary startup< 1 sRust binary, no JVM warmup

Methodology & Reproducibility

Tools

ToolVersionPurposeSource
k60.54.0Comparative Arena benchmarksscripts/traffic/arena/benchmark.js
Criterion.rslatestMicro-benchmarks (internal operations)stoa-gateway/benches/
heylatestLoad testing (end-to-end throughput)scripts/benchmarks/load-test.sh

Reproducing Results

# Micro-benchmarks (local, no dependencies)
cd stoa-gateway && cargo bench

# Load tests (requires a running Gateway)
./scripts/benchmarks/load-test.sh --target http://localhost:8080

# Comparative Arena — Layer 0 (requires Kubernetes + Pushgateway)
kubectl create job --from=cronjob/gateway-arena arena-manual -n stoa-system

# Comparative Arena — Layer 1 (requires Kubernetes + Pushgateway + MCP gateways)
kubectl create job --from=cronjob/gateway-arena-enterprise arena-ent-manual -n stoa-system

Reporting Standards

  • Arena runs use median of 4–5 scored runs (1 warmup discarded)
  • CI95 confidence intervals computed via Student's t-distribution
  • All load tests run for 30 seconds per concurrency level
  • Every report includes a machine profile (CPU, RAM, OS) for context
  • Comparative claims include a <!-- last verified: YYYY-MM --> tag

For complete details on scoring formulas, statistical methods, scenario definitions, and how to add a new gateway, see the Benchmark Methodology reference.

CI Performance Gate

STOA uses a CI performance gate (perf-gate.yml) that blocks PRs when:

  • P95 latency regresses by more than 10% compared to the main branch baseline
  • Error rate increases above the threshold

This ensures performance regressions are caught before merge.

See Hardware Requirements for sizing guidance based on these benchmarks.

Feature comparisons are based on tests run under identical conditions as of the date noted above. Gateway capabilities change frequently. We encourage readers to verify current performance with their own workloads. All trademarks belong to their respective owners. See trademarks.