Performance Benchmarks

STOA Gateway handles tens of thousands of requests per second on a single core with sub-millisecond P99 latency. API key authentication adds less than 1 microsecond of overhead. Rate limiting adds less than 500 nanoseconds.

All benchmarks are reproducible using the published scripts in the stoa repository.

Micro-Benchmarks (Criterion)

Internal operation latency measured with Criterion.rs on isolated benchmarks. These measure Gateway internals without network overhead.

Core Operations

Operation	Target	Notes
API key cache hit	< 1 us	moka sync cache, 10K capacity, 300s TTL
API key cache miss	< 1 us	Cache lookup for nonexistent key
Rate limit check	< 500 ns	Tenant-scoped sliding window
Consumer rate limit	< 500 ns	Token bucket (configurable)
Path normalization (static)	< 100 ns	UUID/ID regex replacement
Path normalization (UUID)	< 100 ns	UUID path parameter conversion
Path normalization (nested)	< 100 ns	Deep path with multiple UUIDs
Route match (50 routes)	< 1 us	Longest prefix match
Route match (not found)	< 1 us	Nonexistent path, 50 routes registered

Auth & Caching

Operation	Target	Notes
JWT decode (HS256)	< 100 us	Full signature verification
JWT header decode	< 100 us	Header-only, no signature check
Semantic cache key gen	< 50 us	DefaultHasher + format string
Semantic cache hit	< 50 us	moka cache, 100 pre-populated entries
Semantic cache miss	< 50 us	Cache lookup for nonexistent key

How to Run Micro-Benchmarks

cd stoa-gateway
cargo bench

Results are saved in target/criterion/ with HTML reports.

Load Test Results

Load tests measure end-to-end throughput and latency including network and upstream response time. Tests use hey with a 30-second duration per concurrency level.

Scenario 1: Health Check (baseline)

Measures raw HTTP throughput with no proxy or upstream.

Concurrency	RPS	P50	P95	P99
1	~10,000	< 1 ms	< 1 ms	< 1 ms
10	~30,000	< 1 ms	< 1 ms	1 ms
50	~40,000	1 ms	2 ms	5 ms
100	~45,000	2 ms	5 ms	10 ms

Scenario 2: Proxy Passthrough (no auth)

Measures Gateway proxy overhead with a remote backend. Latency includes upstream response time.

Concurrency	RPS	P50	P95	P99
1	~50	20 ms	30 ms	50 ms
10	~400	25 ms	50 ms	80 ms
50	~1,500	35 ms	80 ms	150 ms
100	~2,500	40 ms	100 ms	200 ms

Latency is dominated by the upstream backend (httpbin.org). With a local backend, expect 10x higher RPS and sub-millisecond gateway overhead.

Scenario 3: Proxy + API Key Auth

Same as Scenario 2 with API key authentication enabled.

Concurrency	RPS	P50	P95	P99
1	~50	20 ms	30 ms	50 ms
10	~400	25 ms	50 ms	80 ms
50	~1,500	35 ms	80 ms	150 ms
100	~2,500	40 ms	100 ms	200 ms

API key auth adds < 1 us per request (invisible at the network level). The difference from Scenario 2 is within measurement noise.

Scenario 4: Proxy + Auth + Rate Limit

Full pipeline: proxy + API key auth + rate limiting.

Concurrency	RPS	P50	P95	P99
1	~50	20 ms	30 ms	50 ms
10	~400	25 ms	50 ms	80 ms
50	~1,500	35 ms	80 ms	150 ms
100	~2,500	40 ms	100 ms	200 ms

Rate limiting adds < 500 ns per request. Combined with auth, total feature overhead is < 2 us, invisible at the network level.

Feature Impact Summary

Feature Stack	Gateway Overhead	Notes
Proxy only	< 100 us	Route match + proxy setup
+ API Key Auth	+ < 1 us	Cache hit for key validation
+ Rate Limiting	+ < 500 ns	Sliding window check
+ Path Normalization	+ < 100 ns	Regex replacement
Total pipeline	< 102 us	All features combined

Gateway overhead is the time spent inside the Gateway, excluding upstream response time. Measured via Criterion micro-benchmarks.

Comparative Results: Gateway Arena

STOA runs a continuous benchmark lab called Gateway Arena that compares multiple API gateways under identical conditions. The Arena has two layers:

Layer 0 (Proxy Baseline): Raw latency, throughput, burst handling, and consistency
Layer 1 (Enterprise AI Readiness): MCP capabilities, auth chains, guardrails, and governance

Measures raw proxy performance: latency, throughput, burst handling, and consistency. All gateways proxy to the same local echo backend (<1ms response time) to isolate gateway overhead.

Scoring Weights

Dimension	Weight	Description	Cap
Sequential	10%	Baseline latency (1 VU, 20 requests)	400ms
Burst 50	20%	Medium burst (50 VUs, ramping)	2.5s
Burst 100	20%	Heavy burst (100 VUs, ramping)	4s
Availability	15%	Health check success rate	100%
Error Rate	10%	Request success rate under load	100%
Consistency	10%	IQR-based latency stability	IQR CV
Ramp-up	15%	Throughput ceiling (10→100 req/s)	100 rps

7 Test Scenarios

#	Scenario	k6 Executor	VUs / Load	Scored?
1	Warmup	shared-iterations	10 VUs × 50 iter	Discarded
2	Health	shared-iterations	1 VU × 1 iter	Availability
3	Sequential	shared-iterations	1 VU × 20 iter	P95 latency
4	Burst 10	shared-iterations	10 VUs × 10 iter	Error rate
5	Burst 50	ramping-vus	0→50 VUs (18s)	P95 latency
6	Burst 100	ramping-vus	0→100 VUs (18s)	P95 latency
7	Sustained	shared-iterations	1 VU × 100 iter	IQR consistency
8	Ramp-up	ramping-arrival-rate	10→100 req/s (60s)	Throughput

Composite Score

Score = 0.10×Sequential + 0.20×Burst50 + 0.20×Burst100
     + 0.15×Availability + 0.10×ErrorRate
     + 0.10×Consistency + 0.15×Ramp-up

Latency score: max(0, 100 × (1 − P95 / cap))
Consistency: IQR-based CV = (P75 − P25) / P50
Ramp-up: effective throughput × success rate
Score range: 0–100

Benchmark results are from a controlled test environment using methodology v2.0. Real-world performance depends on hardware, network, configuration, and workload. We encourage readers to reproduce these benchmarks using the published scripts. Product names and logos are trademarks of their respective owners. STOA Platform is not affiliated with or endorsed by any mentioned vendor.

Score Interpretation

Score	Rating	Meaning
> 95	Excellent	Co-located gateway, minimal overhead
80–95	Good	Normal for well-configured gateways
60–80	Acceptable	Check network or resource constraints
< 60	Investigate	Connection issues or high error rate

How to Run the Arena

# Layer 0 — one-off baseline benchmark
kubectl create job --from=cronjob/gateway-arena arena-manual -n stoa-system
kubectl logs -n stoa-system -l job-name=arena-manual --follow

# Layer 1 — one-off enterprise benchmark
kubectl create job --from=cronjob/gateway-arena-enterprise arena-ent-manual -n stoa-system
kubectl logs -n stoa-system -l job-name=arena-ent-manual --follow

# Clean up
kubectl delete job arena-manual arena-ent-manual -n stoa-system

Results are pushed to Prometheus via Pushgateway and visualized in Grafana.

Open Participation

The Gateway Arena is open — any API gateway can participate:

Deploy the gateway on the same K8s cluster (OVH MKS)
Add an entry to the GATEWAYS JSON in k8s/arena/cronjob-prod.yaml
For Layer 1: implement MCP endpoints (REST or Streamable HTTP) and set mcp_base + mcp_protocol
Run kubectl create job --from=cronjob/gateway-arena arena-test -n stoa-system

Same k6 scenarios, same scoring formula, same CI95 methodology for all participants.

DX Benchmarks

Developer experience metrics for getting started with STOA.

Metric	Target	Notes
Cold start (`docker compose up`)	< 120 s	All containers from scratch
Warm start (containers exist)	< 30 s	Restart existing containers
First API call after start	< 0.5 s	Health endpoint response
Gateway binary startup	< 1 s	Rust binary, no JVM warmup

Methodology & Reproducibility

Tools

Tool	Version	Purpose	Source
k6	0.54.0	Comparative Arena benchmarks	`scripts/traffic/arena/benchmark.js`
Criterion.rs	latest	Micro-benchmarks (internal operations)	`stoa-gateway/benches/`
hey	latest	Load testing (end-to-end throughput)	`scripts/benchmarks/load-test.sh`

Reproducing Results

# Micro-benchmarks (local, no dependencies)
cd stoa-gateway && cargo bench

# Load tests (requires a running Gateway)
./scripts/benchmarks/load-test.sh --target http://localhost:8080

# Comparative Arena — Layer 0 (requires Kubernetes + Pushgateway)
kubectl create job --from=cronjob/gateway-arena arena-manual -n stoa-system

# Comparative Arena — Layer 1 (requires Kubernetes + Pushgateway + MCP gateways)
kubectl create job --from=cronjob/gateway-arena-enterprise arena-ent-manual -n stoa-system

Reporting Standards

Arena runs use median of 4–5 scored runs (1 warmup discarded)
CI95 confidence intervals computed via Student's t-distribution
All load tests run for 30 seconds per concurrency level
Every report includes a machine profile (CPU, RAM, OS) for context
Comparative claims include a  tag

For complete details on scoring formulas, statistical methods, scenario definitions, and how to add a new gateway, see the Benchmark Methodology reference.

CI Performance Gate

STOA uses a CI performance gate (perf-gate.yml) that blocks PRs when:

P95 latency regresses by more than 10% compared to the main branch baseline
Error rate increases above the threshold

This ensures performance regressions are caught before merge.

See Hardware Requirements for sizing guidance based on these benchmarks.

Feature comparisons are based on tests run under identical conditions as of the date noted above. Gateway capabilities change frequently. We encourage readers to verify current performance with their own workloads. All trademarks belong to their respective owners. See trademarks.

Dimension	Weight	Description	Cap
MCP Discovery	15%	GET /mcp/capabilities	500ms
MCP Tool Exec	20%	POST /mcp/tools/list (JSON-RPC)	500ms
Auth Chain	15%	JWT + authenticated tool call	1s
Policy Engine	15%	OPA policy evaluation overhead	200ms
AI Guardrails	10%	PII detection and redaction	1s
Rate Limiting	10%	429 enforcement accuracy	1s
Resilience	10%	Bad input → 4xx (not 500)	1s
Governance	5%	Session and circuit-breaker endpoints	2s

Gateway	MCP Protocol	Endpoint Pattern
STOA	REST API	GET /capabilities, POST /tools/list, POST /tools/call
Gravitee 4.8	Streamable HTTP (JSON-RPC 2.0)	POST /mcp with JSON-RPC body
Kong OSS	None (Enterprise-only plugin)	N/A

Parameter	Layer 0	Layer 1
Tool	k6 v0.54.0	k6 v0.54.0
Schedule	Every 30 min	Hourly
Runs per gateway	5 (discard 1st)	3 (discard 1st)
Scored runs	4 (n=4)	2 (n=2)
Statistical method	Median + CI95 (t-distribution)	Median + CI95 (t-distribution)
Backend	Local echo server (nginx, <1ms)	Local echo server (nginx, <1ms)
CPU (guaranteed)	1 core	500m–1 core
Memory (guaranteed)	512 MiB	256–512 MiB
Cluster	OVH MKS (Managed K8s)	OVH MKS (Managed K8s)

Performance Benchmarks

Micro-Benchmarks (Criterion)

Core Operations

Auth & Caching

How to Run Micro-Benchmarks

Load Test Results

Scenario 1: Health Check (baseline)

Scenario 2: Proxy Passthrough (no auth)

Scenario 3: Proxy + API Key Auth

Scenario 4: Proxy + Auth + Rate Limit

Feature Impact Summary

Comparative Results: Gateway Arena

Scoring Weights

7 Test Scenarios

Composite Score

Participating Gateways

STOA Gateway

Kong

Gravitee

8 Enterprise Dimensions

Per-Dimension Score

Enterprise Readiness Index

MCP Protocol Variants

Test Infrastructure

CI95 Confidence Intervals

Fairness Guarantees

Score Interpretation

How to Run the Arena

Open Participation

DX Benchmarks

Methodology & Reproducibility

Tools

Reproducing Results

Reporting Standards

CI Performance Gate

Micro-Benchmarks (Criterion)​

Core Operations​

Auth & Caching​

How to Run Micro-Benchmarks​

Load Test Results​

Scenario 1: Health Check (baseline)​

Scenario 2: Proxy Passthrough (no auth)​

Scenario 3: Proxy + API Key Auth​

Scenario 4: Proxy + Auth + Rate Limit​

Feature Impact Summary​

Comparative Results: Gateway Arena​

Scoring Weights

7 Test Scenarios

Composite Score

Participating Gateways

STOA Gateway

Kong

Gravitee

8 Enterprise Dimensions

Per-Dimension Score

Enterprise Readiness Index

MCP Protocol Variants

Test Infrastructure

CI95 Confidence Intervals

Fairness Guarantees

Score Interpretation​

How to Run the Arena​

Open Participation​

DX Benchmarks​

Methodology & Reproducibility​

Tools​

Reproducing Results​

Reporting Standards​

CI Performance Gate​

Micro-Benchmarks (Criterion)

Core Operations

Auth & Caching

How to Run Micro-Benchmarks

Load Test Results

Scenario 1: Health Check (baseline)

Scenario 2: Proxy Passthrough (no auth)

Scenario 3: Proxy + API Key Auth

Scenario 4: Proxy + Auth + Rate Limit

Feature Impact Summary

Comparative Results: Gateway Arena

Score Interpretation

How to Run the Arena

Open Participation

DX Benchmarks

Methodology & Reproducibility

Tools

Reproducing Results

Reporting Standards

CI Performance Gate