Skip to main content

Week 1 Operations Runbook: Install to Production-Ready

Β· 10 min read
STOA Team
The STOA Platform Team

You've installed STOA. The health check returns 200. Now what?

The gap between "it runs" and "it's production-ready" is where most setups fail. This runbook covers your first 7 days with STOA β€” the operational habits that prevent 3am surprises, the monitoring that catches issues before your users do, and the hardening steps that separate a demo from a real deployment.

Who This Is For​

This guide is for developers, freelancers, and small teams who have STOA running (via Docker Compose or Kubernetes) and want to move from "installed" to "operating with confidence."

Each day builds on the previous one. By Day 7, you'll have monitoring, alerting, consumers, policies, and a production checklist completed.

Configure your environment
export STOA_API_URL="http://localhost:8000"      # Control Plane API
export STOA_GATEWAY_URL="http://localhost:3001" # Gateway endpoint
export STOA_AUTH_URL="http://localhost:8080" # Keycloak
export TENANT_ID="default" # Your tenant ID

Replace with your production URLs if deploying remotely. See the full operations guide for environment-specific details.


Day 1: Verify Everything Works​

Before adding anything new, confirm your baseline is solid.

Health Check All Services​

# Gateway
curl -s ${STOA_GATEWAY_URL}/health | jq .
# Expected: {"status":"ok","version":"..."}

# Control Plane API
curl -s ${STOA_API_URL}/health | jq .
# Expected: {"status":"ok"}

# Keycloak
curl -s ${STOA_AUTH_URL}/health/ready
# Expected: {"status":"UP"}

If any service fails, check logs before proceeding:

docker compose logs <service-name> --tail=50

Get Your Admin Token​

You'll need this for every API call in this runbook:

TOKEN=$(curl -s -X POST ${STOA_AUTH_URL}/realms/stoa/protocol/openid-connect/token \
-d "client_id=control-plane-api" \
-d "client_secret=your-client-secret" \
-d "grant_type=client_credentials" | jq -r .access_token)

Explore the Console​

Open http://localhost:3000 (or your Console URL) and log in with your admin credentials. Walk through:

  • Dashboard: overview of APIs, consumers, and request volume
  • APIs: your registered API catalog
  • Policies: rate limits, CORS, and security rules
  • Consumers: who has access to what

This is your command center. Bookmark it.

Day 1 checkpoint: All services healthy, Console accessible, admin token working.


Day 2: Register Your First API​

If you followed the Quick Start, you already have a test API. Now let's register a real one.

Create a UAC Contract​

The Universal API Contract is how STOA manages APIs. One contract, multiple protocol bindings:

CONTRACT_ID=$(curl -s -X POST "${STOA_API_URL}/v1/contracts" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"apiName": "my-service-api",
"apiVersion": "1.0.0",
"tenant": "'$TENANT_ID'",
"displayName": "My Service API",
"description": "Backend service for my application",
"endpoint": {
"url": "http://my-backend:8080/api",
"method": "REST",
"timeout": "15s"
},
"auth": {
"type": "api_key"
},
"portal": {
"visible": true,
"categories": ["production"]
}
}' | jq -r .id)

echo "Contract ID: $CONTRACT_ID"

Verify It's Reachable​

curl -s -o /dev/null -w "HTTP %{http_code} in %{time_total}s\n" \
"${STOA_GATEWAY_URL}/my-service-api/v1/health" \
-H "X-API-Key: your-test-key"

You should see HTTP 200. If you get a 502, your backend URL isn't reachable from inside the gateway container. Common fix: use host.docker.internal instead of localhost on Mac/Windows.

Day 2 checkpoint: At least one real API registered and callable through the gateway.


Day 3: Set Up Policies​

Policies are what turn a proxy into a gateway. Start with the two most important: rate limiting and CORS.

Add Rate Limiting​

Protect your backend from traffic spikes:

curl -s -X POST "${STOA_API_URL}/v1/admin/policies" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "standard-rate-limit",
"policy_type": "rate_limit",
"tenant_id": "'$TENANT_ID'",
"scope": "api",
"config": {
"requests_per_minute": 100,
"burst": 10
}
}' | jq .

Add CORS (If You Have Browser Clients)​

Without CORS headers, browsers block cross-origin API calls:

curl -s -X POST "${STOA_API_URL}/v1/admin/policies" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "browser-cors",
"policy_type": "cors",
"tenant_id": "'$TENANT_ID'",
"scope": "api",
"config": {
"origins": ["https://yourapp.com", "http://localhost:3000"],
"methods": ["GET", "POST", "PUT", "DELETE"],
"headers": ["Content-Type", "Authorization", "X-API-Key"],
"max_age": 3600
}
}' | jq .

Test Rate Limiting​

Send a burst of requests to verify the limit kicks in:

for i in $(seq 1 15); do
STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
"${STOA_GATEWAY_URL}/my-service-api/v1/health" \
-H "X-API-Key: your-test-key")
echo "Request $i: HTTP $STATUS"
done

After exceeding the burst allowance, you should see HTTP 429 (Too Many Requests). That means rate limiting is working.

Day 3 checkpoint: Rate limiting and CORS policies active and verified.


Day 4: Onboard Your First Consumer​

APIs are useless without consumers. Let's set one up.

Create a Consumer​

CONSUMER_ID=$(curl -s -X POST "${STOA_API_URL}/v1/consumers/${TENANT_ID}" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"external_id": "client-001",
"name": "Acme Corp Integration",
"email": "dev@acme.example.com",
"consumer_metadata": {
"plan": "starter",
"contact": "Alice"
}
}' | jq -r .id)

echo "Consumer ID: $CONSUMER_ID"

Create a Subscription (API Key)​

API_KEY=$(curl -s -X POST "${STOA_API_URL}/v1/subscriptions" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"consumer_id": "'$CONSUMER_ID'",
"api_id": "'$CONTRACT_ID'",
"tenant_id": "'$TENANT_ID'"
}' | jq -r .api_key)

echo "API Key for Acme: $API_KEY"

Verify the Consumer Can Access the API​

curl -s "${STOA_GATEWAY_URL}/my-service-api/v1/health" \
-H "X-API-Key: $API_KEY" \
-w "\nHTTP %{http_code}\n"

Send this API key to your consumer. They can also self-register through the Developer Portal at http://localhost:3002.

Day 4 checkpoint: At least one consumer created with a working API key.


Day 5: Enable Monitoring​

You can't fix what you can't see. STOA exposes Prometheus metrics and structured logs out of the box.

Check Metrics Are Exposed​

curl -s ${STOA_GATEWAY_URL}/metrics | head -20

You should see Prometheus-format metrics: stoa_requests_total, stoa_request_duration_seconds, stoa_rate_limit_exceeded_total, and others.

Set Up a Basic Monitoring Script​

If you don't have Prometheus/Grafana yet, start with a simple check script:

#!/bin/bash
# check-stoa.sh β€” run via cron every 5 minutes

GATEWAY="${STOA_GATEWAY_URL:-http://localhost:3001}"

status=$(curl -s -o /dev/null -w "%{http_code}" "$GATEWAY/health")
latency=$(curl -s -o /dev/null -w "%{time_total}" "$GATEWAY/health")

if [ "$status" != "200" ]; then
echo "[ALERT] Gateway returned $status at $(date)"
fi

if (( $(echo "$latency > 2.0" | bc -l) )); then
echo "[WARN] Gateway latency ${latency}s at $(date)"
fi

Add to cron:

crontab -e
# */5 * * * * /path/to/check-stoa.sh >> /var/log/stoa-monitor.log 2>&1

Review Gateway Logs​

STOA logs every request in structured format. The important patterns to watch:

Log PatternMeaningAction
status=200Normal requestNone
status=429Rate limit hitCheck if consumer needs higher limit
status=502Backend unreachableCheck backend health immediately
status=401Auth failureVerify API key or token is valid
# Find errors in the last hour
docker compose logs stoa-gateway --since=1h 2>/dev/null | grep '"status":5'

For a full observability setup with Grafana dashboards, see the Observability Guide.

Day 5 checkpoint: Monitoring script running, you know where to find logs and metrics.


Day 6: Set Up Alerting​

Monitoring without alerting means you only discover problems when users complain. Set up the minimum viable alerts.

Three Alerts You Need on Day 1​

AlertConditionWhy
Gateway downHealth check returns non-200Your API is offline
High error rate5xx rate > 5% for 5 minutesBackend is failing
Rate limit storm429 rate > 50% for 10 minutesPossible abuse or misconfigured limits

Free External Monitoring​

For a quick setup without infrastructure, use UptimeRobot (free tier: 50 monitors, 5-minute checks):

  1. Add monitor: https://your-gateway-domain/health
  2. Alert via email or Slack webhook
  3. Set check interval to 5 minutes

This catches the scenario where your server is up but STOA is down.

Prometheus Alerts (If You Have Prometheus)​

# prometheus-alerts.yml
groups:
- name: stoa
rules:
- alert: StoaGatewayDown
expr: up{job="stoa-gateway"} == 0
for: 2m
labels:
severity: critical

- alert: StoaHighErrorRate
expr: rate(stoa_requests_total{status=~"5.."}[5m]) / rate(stoa_requests_total[5m]) > 0.05
for: 5m
labels:
severity: warning

- alert: StoaRateLimitStorm
expr: rate(stoa_rate_limit_exceeded_total[10m]) / rate(stoa_requests_total[10m]) > 0.5
for: 10m
labels:
severity: warning

Day 6 checkpoint: At least one external health check configured, you'll know within 5 minutes if the gateway goes down.


Day 7: Production Checklist​

Before telling anyone "it's ready," run through this checklist.

Security​

CheckCommandExpected
TLS enabledcurl -I https://your-gateway/healthHTTP/2 200
Admin API not publiccurl https://your-domain:8000/healthConnection refused from internet
Default credentials changedCheck Keycloak admin passwordNot admin/admin
API keys are unique per consumerSELECT count(DISTINCT api_key) FROM subscriptionsEquals total subscriptions

Reliability​

CheckCommandExpected
Gateway restarts cleandocker compose restart stoa-gateway && curl /health200 within 10s
Rate limiting worksBurst test from Day 3429 after limit exceeded
Logs rotatingCheck Docker log driver configmax-size and max-file set

Operations​

CheckHowExpected
Monitoring activeCheck your monitoring toolGreen/UP status
Alerting testedTrigger a test alertAlert received in < 5 min
Backup procedure documentedWrite it downYou know how to restore
Incident runbook existsCreate one (template below)Covers top 3 failure scenarios

Minimal Incident Runbook​

Save this somewhere your team can find it at 3am:

# STOA Incident Runbook

## Gateway returns 5xx
1. docker compose logs stoa-gateway --tail=50
2. docker compose restart stoa-gateway
3. If still down: docker compose down && docker compose up -d

## Consumer locked out (invalid key)
1. Look up consumer: GET /v1/consumers/$TENANT_ID?email=their@email.com
2. Rotate key: POST /v1/subscriptions/$SUB_ID/rotate-key
3. Send new key to consumer

## Rate limit too aggressive
1. GET /v1/admin/policies (find the policy)
2. PATCH /v1/admin/policies/$POLICY_ID with higher limits
3. Changes take effect immediately β€” no restart

Day 7 checkpoint: All checklist items green. You're production-ready.


What You've Built in One Week​

DayWhat You DidWhy It Matters
1Verified baselineConfirmed nothing is broken before building on top
2Registered a real APIYour first production workload
3Added policiesProtection against traffic spikes and browser issues
4Onboarded a consumerSomeone can actually use your API
5Enabled monitoringYou'll see problems before users do
6Set up alertingYou'll know about problems even when you're not looking
7Production checklistConfidence that it's ready for real traffic

This is the foundation. Everything else β€” multi-gateway setups, GitOps deployments, MCP for AI agents β€” builds on these basics.


FAQ​

What if I'm stuck on a specific step?​

Check the full operations guide for detailed troubleshooting for each day. Common issues like 502 errors, Keycloak token expiry, and Docker networking are covered there.

How do I add more gateways (Kong, Gravitee)?​

STOA's adapter pattern supports 7 gateway backends. Once your first gateway is stable, see the Multi-Gateway Setup Guide to add Kong, Gravitee, Apigee, or others alongside STOA Gateway.

Where's the community?​

  • GitHub Issues: github.com/stoa-platform/stoa/issues for bugs and feature requests
  • GitHub Discussions: for questions and architecture conversations
  • Developer Portal: your consumers can self-register and browse APIs

Can I skip days?​

The days are sequential because each builds context for the next. But if you already have monitoring (Day 5-6 done), jump to Day 7 for the production checklist.


Next Steps​