Skip to main content

ADR-034: Python to Rust Migration Strategy

Metadata​

FieldValue
StatusAccepted
Date2026-02-06
LinearN/A (Strategic)
RelatedADR-024 (Gateway Unified Modes)

Context​

The MCP Gateway is currently implemented in Python (FastAPI). While Python excels at rapid development, Rust offers:

  • Performance β€” Lower latency, higher throughput
  • Memory Safety β€” No GC pauses, predictable performance
  • Native OPA β€” regorus library for embedded policy evaluation
  • Single Binary β€” Simplified deployment

The Problem​

"How do we migrate from Python to Rust without disrupting production?"

Decision​

Implement a phased migration with shadow validation, targeting Q4 2026 for full Rust deployment.

Migration Phases​

Python β†’ Rust Migration Timeline
Phased migration with shadow validation. No disruption to production.
Q1 2026
Python Production100% Python
Python mcp-gateway handles all production traffic. Baseline metrics established.
Scope: Baseline metricsValidation: Performance benchmarks recorded
Q2 2026
Rust Edge-MCPCanary
Q3 2026
Rust MajorityMajority
Q4 2026
Rust Complete100% Rust
Python
Rust
Both (shadow)
Shadow Mirror Validation
During Phase 2, both implementations run in parallel. Python serves traffic; Rust shadows and compares.
πŸ“¨ Incoming Request
β–Ό
Load Balancer
🐍 Python
PRIMARY
Response returned to client
mirror
↔
βš™ Rust
SHADOW
Response compared, not returned
Response body diff
Latency delta
Error rate
Rollback Plan
1. Route 100% to Python
β†’
2. Investigate shadow logs
β†’
3. Patch Rust & redeploy
β†’
4. 48h re-validation
β†’
5. Resume migration
Feature Parity & Performance Targets
Tracking Python feature coverage in Rust and expected performance improvements.
Feature
Python
Rust
MCP SSE Transport
βœ“
βœ“
Tool Registry
βœ“
βœ“
OPA Policy Evaluation
βœ“
βœ“
Keycloak JWT
βœ“
βœ“
OpenTelemetry
βœ“
βœ“
Semantic Cache
βœ“
β—‹
Error Snapshots
βœ“
β—‹
K8s CRD Watcher
βœ“
β—‹
Performance Targets
Metric
Python
Rust
Gain
P50 latency
15ms
5ms
3x
P99 latency
80ms
20ms
4x
RPS per pod
1,000
5,000
5x
Memory
512MB
64MB
8x
Cold start
3s
100ms
30x

Phase Details​

PhaseTimelineScopeValidation
1Q1 2026Python productionBaseline metrics
2Q2 2026Rust edge-mcp modeShadow mirror
3Q3 2026Rust proxy + sidecarCanary deployment
4Q4 2026Rust shadow modeFull migration

Shadow Mirror Validation​

During Phase 2, both implementations run in parallel:

See the interactive Shadow Validation tab above for the full request flow and comparison metrics.

Feature Parity Checklist​

FeaturePythonRustStatus
MCP SSE Transportβœ…πŸ”„In progress
Tool Registryβœ…πŸ“‹Planned
OPA Policy Evaluationβœ… (sidecar)βœ… (regorus)Complete
Semantic Cacheβœ…πŸ“‹Planned
Error Snapshotsβœ…πŸ“‹Planned
K8s CRD Watcherβœ…πŸ“‹Planned
OpenTelemetryβœ…πŸ”„API stabilizing
Keycloak JWTβœ…πŸ“‹Planned

Rust Implementation Structure​

stoa-gateway/
β”œβ”€β”€ src/
β”‚ β”œβ”€β”€ main.rs # Entry point, --mode flag
β”‚ β”œβ”€β”€ modes/
β”‚ β”‚ β”œβ”€β”€ mod.rs # Mode trait
β”‚ β”‚ β”œβ”€β”€ edge_mcp.rs # MCP protocol
β”‚ β”‚ β”œβ”€β”€ sidecar.rs # Behind existing gateway
β”‚ β”‚ β”œβ”€β”€ proxy.rs # Inline enforcement
β”‚ β”‚ └── shadow.rs # Traffic observation
β”‚ β”œβ”€β”€ auth/
β”‚ β”‚ └── oidc.rs # Keycloak JWT validation
β”‚ β”œβ”€β”€ policy/
β”‚ β”‚ └── opa.rs # regorus integration
β”‚ └── observability/
β”‚ └── metrics.rs # Prometheus
β”œβ”€β”€ Cargo.toml
└── Dockerfile

Rollback Plan​

If Rust implementation shows issues:

  1. Immediate β€” Route 100% to Python via load balancer
  2. Investigation β€” Compare shadow logs
  3. Fix β€” Patch Rust, redeploy shadow
  4. Validate β€” 48h shadow validation
  5. Resume β€” Gradual traffic shift

Performance Targets​

MetricPythonRust TargetImprovement
P50 latency15ms5ms3x
P99 latency80ms20ms4x
RPS per pod100050005x
Memory512MB64MB8x
Cold start3s100ms30x

Consequences​

Positive​

  • Performance β€” Significant latency reduction
  • Cost β€” Fewer pods needed for same throughput
  • Reliability β€” Memory safety, no GC pauses
  • Native OPA β€” regorus eliminates sidecar

Negative​

  • Development Velocity β€” Rust slower to write
  • Dual Maintenance β€” Two codebases during migration
  • Team Skills β€” Rust learning curve

Mitigations​

ChallengeMitigation
VelocityPython for experiments, Rust for stable features
Dual maintenanceFeature freeze on Python post-Phase 2
SkillsInternal Rust training, code reviews

References​


Standard Marchemalo: A 40-year veteran architect understands in 30 seconds