Aller au contenu principal

ADR-036: Gateway Auto-Registration

Metadata​

FieldValue
Statusβœ… Accepted
Date2026-02-06
Migrated fromstoa repo ADR-028 (number conflict)

Context​

STOA Control Plane currently requires manual gateway registration via POST /v1/admin/gateways. This creates operational friction:

  • Operators must copy/paste UUIDs and URLs into registration requests
  • Gateway restarts don't preserve registration state in stateless deployments
  • Third-party gateways (Kong, Envoy, Apigee) require custom integration per vendor
  • No real-time visibility into gateway health between explicit health checks

Industry comparison:

PlatformRegistration ModelMechanism
Apple iCloudAutomaticDevice certificate + Apple ID
KubernetesSelf-registrationBootstrap tokens, kubelet heartbeat
HashiCorp ConsulAuto-joinGossip protocol, agent heartbeat
HashiCorp VaultManualToken-based unsealing

The "Apple ecosystem" experience β€” where devices pair seamlessly β€” is the target UX for STOA gateways.

Decision​

Implement Gateway Auto-Registration: gateways self-register with the Control Plane at startup using a shared API key, then maintain presence via periodic heartbeat.

Two-Tier Architecture​

Tier 1: STOA Native Gateways (Full Integration)

  • Zero-config: 2 environment variables only
    • STOA_CONTROL_PLANE_URL β€” Control Plane API endpoint
    • STOA_CONTROL_PLANE_API_KEY β€” Shared secret for authentication
  • Gateway derives identity from hostname + mode + environment
  • Full bidirectional integration:
    • API sync (CP β†’ Gateway via admin API)
    • Policy push (CP β†’ Gateway)
    • MCP tool registration
    • Metering to Kafka

Tier 2: Third-Party Gateways (Sidecar Pattern)

  • Deploy STOA sidecar alongside existing gateway (Kong, Envoy, Apigee, NGINX, AWS)
  • Sidecar auto-registers as stoa_sidecar type with reduced capability set
  • Main gateway uses ext_authz filter β†’ sidecar for policy/metering
  • Sidecar provides: policy enforcement, rate limiting, metering, observability

Registration Protocol​

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Gateway β”‚ β”‚ Control Plane β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ β”‚
β”‚ 1. POST /v1/internal/gateways/register
β”‚ { hostname, mode, version, β”‚
β”‚ environment, capabilities, β”‚
β”‚ admin_url } β”‚
β”‚ Header: X-Gateway-Key: gw_xxx β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Ίβ”‚
β”‚ β”‚
β”‚ 2. 201 Created β”‚
β”‚ { id, name, status: "online" } β”‚
│◄────────────────────────────────────
β”‚ β”‚
β”‚ 3. Every 30s: POST /{id}/heartbeatβ”‚
β”‚ { uptime, routes, policies, β”‚
β”‚ requests_total, error_rate } β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Ίβ”‚
β”‚ β”‚
β”‚ 4. 204 No Content β”‚
│◄────────────────────────────────────
β”‚ β”‚
β”‚ [If no heartbeat for 90s] β”‚
β”‚ β”‚
β”‚ Gateway marked β”‚
β”‚ OFFLINE β”‚
β”‚ β”‚

Identity Derivation​

Instance name is deterministic: {hostname}-{mode}-{environment}

Examples:

  • stoa-gateway-7f8b9c-edgemcp-prod β€” Production edge-mcp gateway
  • stoa-sidecar-kong-abc123-sidecar-staging β€” Staging sidecar alongside Kong

This enables:

  • Idempotent registration (same gateway re-registers on restart)
  • Clear naming in Console UI
  • Environment-based filtering

Security Model​

AspectDesignRationale
AuthenticationShared API key per environmentSimple, low barrier to adoption
Key transmissionX-Gateway-Key header over HTTPSStandard header pattern
Key storageK8s Secret, injected as env varNo secrets in code or config files
Key rotationComma-separated list supportedRolling updates without downtime
Endpoint exposure/v1/internal/* not on public ingressInternal traffic only

Capabilities Declaration​

Gateways declare capabilities at registration:

{
"capabilities": [
"rest",
"mcp",
"sse",
"oidc",
"rate_limiting",
"ext_authz",
"metering"
]
}

Control Plane uses capabilities to:

  • Filter which gateways can receive specific deployments
  • Display accurate feature badges in Console UI
  • Route MCP requests to capable gateways

Consequences​

Positive​

  • Zero friction onboarding: Start gateway with 2 env vars β†’ appears in Console within 5 seconds
  • Real-time status: Heartbeat provides sub-minute visibility into gateway health
  • Unified experience: Same registration pattern for native and sidecar gateways
  • Self-healing: Gateway restart automatically re-registers, no manual intervention
  • Capability-aware routing: CP knows what each gateway can handle
  • Idempotent: Multiple registrations with same identity update rather than duplicate

Negative​

  • Shared secret model: Compromised key allows rogue gateway registration
  • Network dependency: Gateway requires CP connectivity at startup
  • Heartbeat traffic: N gateways x 2 requests/min (negligible but non-zero)
  • Startup delay: Gateway waits for registration response before serving traffic

Mitigations​

RiskMitigation
Rogue registrationRate limit endpoint, log all registrations with source IP
CP unavailable at startupGraceful degradation: gateway runs in standalone mode
Heartbeat trafficLightweight payload, exponential backoff on errors
Key compromiseKey rotation without restart, audit logging

Alternatives Considered​

1. mTLS with Per-Gateway Certificates​

Pros: Cryptographically strong identity, no shared secrets Cons: Complex PKI setup, certificate rotation overhead Verdict: Deferred to Phase 2 for high-security deployments

2. Push-Based Discovery (CP β†’ Gateways)​

Pros: CP controls timing, no registration endpoint needed Cons: Requires network path from CP to all gateways, blocked by NAT/firewalls Verdict: Rejected β€” pull model works across all network topologies

3. Kubernetes Service Discovery Only​

Pros: Native K8s integration, no custom registration Cons: K8s-only, doesn't work for VM/bare-metal deployments Verdict: Rejected β€” STOA must support non-K8s environments

4. Manual Registration (Status Quo)​

Pros: Explicit control, no new code Cons: Operational friction, doesn't scale Verdict: Remains available for brownfield integrations

Implementation Notes​

Control Plane (Python)​

  1. New router: src/routers/gateway_internal.py

    • POST /v1/internal/gateways/register β€” Self-registration
    • POST /v1/internal/gateways/{id}/heartbeat β€” Heartbeat
  2. New worker: src/workers/gateway_health_worker.py

    • Runs every 30s
    • Marks gateways OFFLINE if last_health_check < now() - 90s
  3. Config: STOA_GATEWAY_API_KEYS β€” Comma-separated list of valid keys

STOA Gateway (Rust)​

  1. New module: src/control_plane/registration.rs

    • GatewayRegistrar::register() β€” Called at startup
    • GatewayRegistrar::start_heartbeat() β€” Background tokio task
  2. Config additions:

    • environment β€” For identity derivation (default: "dev")

New Gateway Type​

Add STOA_SIDECAR to GatewayType enum for sidecars with reduced capabilities.

Implementation Status​

ComponentStatusPR
Control Plane API β€” registration, heartbeat, health workerβœ… Completeβ€”
Control Plane API β€” config fetch (real deployments/policies)βœ… Complete#170
STOA Gateway (Rust) β€” registration, heartbeat with real metricsβœ… Complete#170
STOA Gateway (Rust) β€” deep readiness probe (CP + OIDC checks)βœ… Complete#170
Console UI β€” auto-refresh, detail panel, live indicatorβœ… Complete#170
Helm chart β€” control-plane-api-key in deployment + ExternalSecretβœ… Complete#170
K8s deployment.yaml β€” secret documentationβœ… Complete#170
Sidecar mode (Tier 2)PlannedQ2 2026
mTLS per-gateway certificatesPlannedPhase 2

References​