ADR-036: Gateway Auto-Registration
Metadataβ
| Field | Value |
|---|---|
| Status | β Accepted |
| Date | 2026-02-06 |
| Migrated from | stoa repo ADR-028 (number conflict) |
Contextβ
STOA Control Plane currently requires manual gateway registration via POST /v1/admin/gateways. This creates operational friction:
- Operators must copy/paste UUIDs and URLs into registration requests
- Gateway restarts don't preserve registration state in stateless deployments
- Third-party gateways (Kong, Envoy, Apigee) require custom integration per vendor
- No real-time visibility into gateway health between explicit health checks
Industry comparison:
| Platform | Registration Model | Mechanism |
|---|---|---|
| Apple iCloud | Automatic | Device certificate + Apple ID |
| Kubernetes | Self-registration | Bootstrap tokens, kubelet heartbeat |
| HashiCorp Consul | Auto-join | Gossip protocol, agent heartbeat |
| HashiCorp Vault | Manual | Token-based unsealing |
The "Apple ecosystem" experience β where devices pair seamlessly β is the target UX for STOA gateways.
Decisionβ
Implement Gateway Auto-Registration: gateways self-register with the Control Plane at startup using a shared API key, then maintain presence via periodic heartbeat.
Two-Tier Architectureβ
Tier 1: STOA Native Gateways (Full Integration)
- Zero-config: 2 environment variables only
STOA_CONTROL_PLANE_URLβ Control Plane API endpointSTOA_CONTROL_PLANE_API_KEYβ Shared secret for authentication
- Gateway derives identity from hostname + mode + environment
- Full bidirectional integration:
- API sync (CP β Gateway via admin API)
- Policy push (CP β Gateway)
- MCP tool registration
- Metering to Kafka
Tier 2: Third-Party Gateways (Sidecar Pattern)
- Deploy STOA sidecar alongside existing gateway (Kong, Envoy, Apigee, NGINX, AWS)
- Sidecar auto-registers as
stoa_sidecartype with reduced capability set - Main gateway uses
ext_authzfilter β sidecar for policy/metering - Sidecar provides: policy enforcement, rate limiting, metering, observability
Registration Protocolβ
βββββββββββββββ βββββββββββββββββββ
β Gateway β β Control Plane β
ββββββββ¬βββββββ ββββββββββ¬βββββββββ
β β
β 1. POST /v1/internal/gateways/register
β { hostname, mode, version, β
β environment, capabilities, β
β admin_url } β
β Header: X-Gateway-Key: gw_xxx β
βββββββββββββββββββββββββββββββββββββΊβ
β β
β 2. 201 Created β
β { id, name, status: "online" } β
ββββββββββββββββββββββββββββββββββββββ€
β β
β 3. Every 30s: POST /{id}/heartbeatβ
β { uptime, routes, policies, β
β requests_total, error_rate } β
βββββββββββββββββββββββββββββββββββββΊβ
β β
β 4. 204 No Content β
ββββββββββββββββββββββββββββββββββββββ€
β β
β [If no heartbeat for 90s] β
β β
β Gateway marked β
β OFFLINE β
β β
Identity Derivationβ
Instance name is deterministic: {hostname}-{mode}-{environment}
Examples:
stoa-gateway-7f8b9c-edgemcp-prodβ Production edge-mcp gatewaystoa-sidecar-kong-abc123-sidecar-stagingβ Staging sidecar alongside Kong
This enables:
- Idempotent registration (same gateway re-registers on restart)
- Clear naming in Console UI
- Environment-based filtering
Security Modelβ
| Aspect | Design | Rationale |
|---|---|---|
| Authentication | Shared API key per environment | Simple, low barrier to adoption |
| Key transmission | X-Gateway-Key header over HTTPS | Standard header pattern |
| Key storage | K8s Secret, injected as env var | No secrets in code or config files |
| Key rotation | Comma-separated list supported | Rolling updates without downtime |
| Endpoint exposure | /v1/internal/* not on public ingress | Internal traffic only |
Capabilities Declarationβ
Gateways declare capabilities at registration:
{
"capabilities": [
"rest",
"mcp",
"sse",
"oidc",
"rate_limiting",
"ext_authz",
"metering"
]
}
Control Plane uses capabilities to:
- Filter which gateways can receive specific deployments
- Display accurate feature badges in Console UI
- Route MCP requests to capable gateways
Consequencesβ
Positiveβ
- Zero friction onboarding: Start gateway with 2 env vars β appears in Console within 5 seconds
- Real-time status: Heartbeat provides sub-minute visibility into gateway health
- Unified experience: Same registration pattern for native and sidecar gateways
- Self-healing: Gateway restart automatically re-registers, no manual intervention
- Capability-aware routing: CP knows what each gateway can handle
- Idempotent: Multiple registrations with same identity update rather than duplicate
Negativeβ
- Shared secret model: Compromised key allows rogue gateway registration
- Network dependency: Gateway requires CP connectivity at startup
- Heartbeat traffic: N gateways x 2 requests/min (negligible but non-zero)
- Startup delay: Gateway waits for registration response before serving traffic
Mitigationsβ
| Risk | Mitigation |
|---|---|
| Rogue registration | Rate limit endpoint, log all registrations with source IP |
| CP unavailable at startup | Graceful degradation: gateway runs in standalone mode |
| Heartbeat traffic | Lightweight payload, exponential backoff on errors |
| Key compromise | Key rotation without restart, audit logging |
Alternatives Consideredβ
1. mTLS with Per-Gateway Certificatesβ
Pros: Cryptographically strong identity, no shared secrets Cons: Complex PKI setup, certificate rotation overhead Verdict: Deferred to Phase 2 for high-security deployments
2. Push-Based Discovery (CP β Gateways)β
Pros: CP controls timing, no registration endpoint needed Cons: Requires network path from CP to all gateways, blocked by NAT/firewalls Verdict: Rejected β pull model works across all network topologies
3. Kubernetes Service Discovery Onlyβ
Pros: Native K8s integration, no custom registration Cons: K8s-only, doesn't work for VM/bare-metal deployments Verdict: Rejected β STOA must support non-K8s environments
4. Manual Registration (Status Quo)β
Pros: Explicit control, no new code Cons: Operational friction, doesn't scale Verdict: Remains available for brownfield integrations
Implementation Notesβ
Control Plane (Python)β
-
New router:
src/routers/gateway_internal.pyPOST /v1/internal/gateways/registerβ Self-registrationPOST /v1/internal/gateways/{id}/heartbeatβ Heartbeat
-
New worker:
src/workers/gateway_health_worker.py- Runs every 30s
- Marks gateways OFFLINE if
last_health_check < now() - 90s
-
Config:
STOA_GATEWAY_API_KEYSβ Comma-separated list of valid keys
STOA Gateway (Rust)β
-
New module:
src/control_plane/registration.rsGatewayRegistrar::register()β Called at startupGatewayRegistrar::start_heartbeat()β Background tokio task
-
Config additions:
environmentβ For identity derivation (default: "dev")
New Gateway Typeβ
Add STOA_SIDECAR to GatewayType enum for sidecars with reduced capabilities.
Implementation Statusβ
| Component | Status | PR |
|---|---|---|
| Control Plane API β registration, heartbeat, health worker | β Complete | β |
| Control Plane API β config fetch (real deployments/policies) | β Complete | #170 |
| STOA Gateway (Rust) β registration, heartbeat with real metrics | β Complete | #170 |
| STOA Gateway (Rust) β deep readiness probe (CP + OIDC checks) | β Complete | #170 |
| Console UI β auto-refresh, detail panel, live indicator | β Complete | #170 |
Helm chart β control-plane-api-key in deployment + ExternalSecret | β Complete | #170 |
| K8s deployment.yaml β secret documentation | β Complete | #170 |
| Sidecar mode (Tier 2) | Planned | Q2 2026 |
| mTLS per-gateway certificates | Planned | Phase 2 |
Related ADRsβ
- ADR-035: Gateway Adapter Pattern β Defines the interface used by CP to orchestrate gateways
- ADR-024: Unified Gateway Architecture β Defines edge-mcp, sidecar, proxy, shadow modes
- ADR-037: Deployment Modes β Sovereign First β Deployment strategy