Aller au contenu principal

ADR-034: Python to Rust Migration Strategy

Metadata

FieldValue
StatusAccepted
Date2026-02-06
LinearN/A (Strategic)
RelatedADR-024 (Gateway Unified Modes)

Context

The MCP Gateway is currently implemented in Python (FastAPI). While Python excels at rapid development, Rust offers:

  • Performance — Lower latency, higher throughput
  • Memory Safety — No GC pauses, predictable performance
  • Native OPA — regorus library for embedded policy evaluation
  • Single Binary — Simplified deployment

The Problem

"How do we migrate from Python to Rust without disrupting production?"

Decision

Implement a phased migration with shadow validation, targeting Q4 2026 for full Rust deployment.

Migration Phases

Python Rust Migration Timeline
Phased migration with shadow validation. No disruption to production.
Q1 2026
Python Production100% Python
Python mcp-gateway handles all production traffic. Baseline metrics established.
Scope: Baseline metricsValidation: Performance benchmarks recorded
Q2 2026
Rust Edge-MCPCanary
Q3 2026
Rust MajorityMajority
Q4 2026
Rust Complete100% Rust
Python
Rust
Both (shadow)
Shadow Mirror Validation
During Phase 2, both implementations run in parallel. Python serves traffic; Rust shadows and compares.
📨 Incoming Request
Load Balancer
🐍 Python
PRIMARY
Response returned to client
mirror
Rust
SHADOW
Response compared, not returned
Response body diff
Latency delta
Error rate
Rollback Plan
1. Route 100% to Python
2. Investigate shadow logs
3. Patch Rust & redeploy
4. 48h re-validation
5. Resume migration
Feature Parity & Performance Targets
Tracking Python feature coverage in Rust and expected performance improvements.
Feature
Python
Rust
MCP SSE Transport
Tool Registry
OPA Policy Evaluation
Keycloak JWT
OpenTelemetry
Semantic Cache
Error Snapshots
K8s CRD Watcher
Performance Targets
Metric
Python
Rust
Gain
P50 latency
15ms
5ms
3x
P99 latency
80ms
20ms
4x
RPS per pod
1,000
5,000
5x
Memory
512MB
64MB
8x
Cold start
3s
100ms
30x

Phase Details

PhaseTimelineScopeValidation
1Q1 2026Python productionBaseline metrics
2Q2 2026Rust edge-mcp modeShadow mirror
3Q3 2026Rust proxy + sidecarCanary deployment
4Q4 2026Rust shadow modeFull migration

Shadow Mirror Validation

During Phase 2, both implementations run in parallel:

See the interactive Shadow Validation tab above for the full request flow and comparison metrics.

Feature Parity Checklist

FeaturePythonRustStatus
MCP SSE Transport🔄In progress
Tool Registry📋Planned
OPA Policy Evaluation✅ (sidecar)✅ (regorus)Complete
Semantic Cache📋Planned
Error Snapshots📋Planned
K8s CRD Watcher📋Planned
OpenTelemetry🔄API stabilizing
Keycloak JWT📋Planned

Rust Implementation Structure

stoa-gateway/
├── src/
│ ├── main.rs # Entry point, --mode flag
│ ├── modes/
│ │ ├── mod.rs # Mode trait
│ │ ├── edge_mcp.rs # MCP protocol
│ │ ├── sidecar.rs # Behind existing gateway
│ │ ├── proxy.rs # Inline enforcement
│ │ └── shadow.rs # Traffic observation
│ ├── auth/
│ │ └── oidc.rs # Keycloak JWT validation
│ ├── policy/
│ │ └── opa.rs # regorus integration
│ └── observability/
│ └── metrics.rs # Prometheus
├── Cargo.toml
└── Dockerfile

Rollback Plan

If Rust implementation shows issues:

  1. Immediate — Route 100% to Python via load balancer
  2. Investigation — Compare shadow logs
  3. Fix — Patch Rust, redeploy shadow
  4. Validate — 48h shadow validation
  5. Resume — Gradual traffic shift

Performance Targets

MetricPythonRust TargetImprovement
P50 latency15ms5ms3x
P99 latency80ms20ms4x
RPS per pod100050005x
Memory512MB64MB8x
Cold start3s100ms30x

Consequences

Positive

  • Performance — Significant latency reduction
  • Cost — Fewer pods needed for same throughput
  • Reliability — Memory safety, no GC pauses
  • Native OPA — regorus eliminates sidecar

Negative

  • Development Velocity — Rust slower to write
  • Dual Maintenance — Two codebases during migration
  • Team Skills — Rust learning curve

Mitigations

ChallengeMitigation
VelocityPython for experiments, Rust for stable features
Dual maintenanceFeature freeze on Python post-Phase 2
SkillsInternal Rust training, code reviews

References


Standard Marchemalo: A 40-year veteran architect understands in 30 seconds