ADR-034: Python to Rust Migration Strategy
Metadataβ
| Field | Value |
|---|---|
| Status | Accepted |
| Date | 2026-02-06 |
| Linear | N/A (Strategic) |
| Related | ADR-024 (Gateway Unified Modes) |
Contextβ
The MCP Gateway is currently implemented in Python (FastAPI). While Python excels at rapid development, Rust offers:
- Performance β Lower latency, higher throughput
- Memory Safety β No GC pauses, predictable performance
- Native OPA β regorus library for embedded policy evaluation
- Single Binary β Simplified deployment
The Problemβ
"How do we migrate from Python to Rust without disrupting production?"
Decisionβ
Implement a phased migration with shadow validation, targeting Q4 2026 for full Rust deployment.
Migration Phasesβ
Phase Detailsβ
| Phase | Timeline | Scope | Validation |
|---|---|---|---|
| 1 | Q1 2026 | Python production | Baseline metrics |
| 2 | Q2 2026 | Rust edge-mcp mode | Shadow mirror |
| 3 | Q3 2026 | Rust proxy + sidecar | Canary deployment |
| 4 | Q4 2026 | Rust shadow mode | Full migration |
Shadow Mirror Validationβ
During Phase 2, both implementations run in parallel:
See the interactive Shadow Validation tab above for the full request flow and comparison metrics.
Feature Parity Checklistβ
| Feature | Python | Rust | Status |
|---|---|---|---|
| MCP SSE Transport | β | π | In progress |
| Tool Registry | β | π | Planned |
| OPA Policy Evaluation | β (sidecar) | β (regorus) | Complete |
| Semantic Cache | β | π | Planned |
| Error Snapshots | β | π | Planned |
| K8s CRD Watcher | β | π | Planned |
| OpenTelemetry | β | π | API stabilizing |
| Keycloak JWT | β | π | Planned |
Rust Implementation Structureβ
stoa-gateway/
βββ src/
β βββ main.rs # Entry point, --mode flag
β βββ modes/
β β βββ mod.rs # Mode trait
β β βββ edge_mcp.rs # MCP protocol
β β βββ sidecar.rs # Behind existing gateway
β β βββ proxy.rs # Inline enforcement
β β βββ shadow.rs # Traffic observation
β βββ auth/
β β βββ oidc.rs # Keycloak JWT validation
β βββ policy/
β β βββ opa.rs # regorus integration
β βββ observability/
β βββ metrics.rs # Prometheus
βββ Cargo.toml
βββ Dockerfile
Rollback Planβ
If Rust implementation shows issues:
- Immediate β Route 100% to Python via load balancer
- Investigation β Compare shadow logs
- Fix β Patch Rust, redeploy shadow
- Validate β 48h shadow validation
- Resume β Gradual traffic shift
Performance Targetsβ
| Metric | Python | Rust Target | Improvement |
|---|---|---|---|
| P50 latency | 15ms | 5ms | 3x |
| P99 latency | 80ms | 20ms | 4x |
| RPS per pod | 1000 | 5000 | 5x |
| Memory | 512MB | 64MB | 8x |
| Cold start | 3s | 100ms | 30x |
Consequencesβ
Positiveβ
- Performance β Significant latency reduction
- Cost β Fewer pods needed for same throughput
- Reliability β Memory safety, no GC pauses
- Native OPA β regorus eliminates sidecar
Negativeβ
- Development Velocity β Rust slower to write
- Dual Maintenance β Two codebases during migration
- Team Skills β Rust learning curve
Mitigationsβ
| Challenge | Mitigation |
|---|---|
| Velocity | Python for experiments, Rust for stable features |
| Dual maintenance | Feature freeze on Python post-Phase 2 |
| Skills | Internal Rust training, code reviews |
Referencesβ
Standard Marchemalo: A 40-year veteran architect understands in 30 seconds