MCP Protocol Deep Dive: Message Flow and Transports
The Model Context Protocol (MCP) is a JSON-RPC 2.0 based protocol that standardizes how AI agents discover, authenticate with, and invoke external tools. It defines four phases β initialization, discovery, invocation, and streaming β over pluggable transports including SSE, WebSocket, and stdio. This article covers the protocol internals that matter for production deployments.
This is a technical deep dive for engineers building on MCP. For a higher-level introduction, start with What is an MCP Gateway?. For a hands-on deployment, see MCP Gateway Quickstart with Docker.
Protocol Architecture Overviewβ
MCP is built on three architectural principles:
- Client-server model: MCP clients (AI agents) connect to MCP servers (tool providers). A single client can connect to multiple servers.
- JSON-RPC 2.0 foundation: All messages follow the JSON-RPC 2.0 specification β request/response pairs with
method,params, andresult/errorfields. - Pluggable transport: The protocol is transport-agnostic. The same JSON-RPC messages can flow over HTTP+SSE, WebSocket, or stdio pipes.
Protocol Stackβ
βββββββββββββββββββββββββββββββββββββββββββββββ
β Application Layer β
β (Tool definitions, Resources, Prompts) β
βββββββββββββββββββββββββββββββββββββββββββββββ€
β Protocol Layer β
β (JSON-RPC 2.0: methods, params, results) β
βββββββββββββββββββββββββββββββββββββββββββββββ€
β Transport Layer β
β (HTTP+SSE | WebSocket | stdio | Streamable) β
βββββββββββββββββββββββββββββββββββββββββββββββ€
β Security Layer β
β (TLS, OAuth2, API keys, mTLS) β
βββββββββββββββββββββββββββββββββββββββββββββββ
Each layer is independently replaceable. You can swap transports without changing tool definitions. You can add security layers without modifying the protocol messages. This separation is what makes MCP suitable for both local development (stdio) and production enterprise deployments (HTTP+SSE with mTLS).
The Four Phases of an MCP Sessionβ
Every MCP client-server interaction follows four phases:
Phase 1: Initializationβ
The client establishes a connection and negotiates capabilities:
Client Server
β β
βββ initialize βββββββββββββββββββββββ
β { β
β "method": "initialize", β
β "params": { β
β "protocolVersion": "2025-03",β
β "capabilities": { β
β "tools": {}, β
β "resources": {}, β
β "prompts": {} β
β }, β
β "clientInfo": { β
β "name": "claude-desktop", β
β "version": "1.0.0" β
β } β
β } β
β } β
β β
ββββ initialize result ββββββββββββββ
β { β
β "protocolVersion": "2025-03", β
β "capabilities": { β
β "tools": {"listChanged":true}β
β }, β
β "serverInfo": { β
β "name": "stoa-gateway", β
β "version": "0.6.0" β
β } β
β } β
β β
βββ initialized (notification) βββββββ
β β
Key points:
protocolVersionensures client and server agree on the MCP spec versioncapabilitiesnegotiation tells each side what features the other supports- The
initializednotification signals that the client is ready to begin discovery - If capabilities don't match, the client can downgrade or disconnect
Phase 2: Discoveryβ
The client enumerates available tools, resources, and prompts:
Client Server
β β
βββ tools/list βββββββββββββββββββββββ
β {"method": "tools/list"} β
β β
ββββ tools/list result ββββββββββββββ
β { β
β "tools": [ β
β { β
β "name": "search-contacts", β
β "description": "Search...",β
β "inputSchema": { β
β "type": "object", β
β "properties": { β
β "query": { β
β "type": "string" β
β } β
β }, β
β "required": ["query"] β
β } β
β } β
β ] β
β } β
β β
Discovery is dynamic β the agent calls tools/list at runtime, not at build time. This is fundamentally different from static API documentation. The server can return different tool lists based on the client's identity, tenant, or environment.
An MCP gateway adds a policy layer here: the tools/list response is filtered per-tenant. Tenant A sees only CRM tools. Tenant B sees only billing tools. Both connect to the same gateway endpoint.
Phase 3: Invocationβ
The client calls a tool with typed parameters:
Client Server
β β
βββ tools/call βββββββββββββββββββββββ
β { β
β "method": "tools/call", β
β "params": { β
β "name": "search-contacts", β
β "arguments": { β
β "query": "Leanne" β
β } β
β } β
β } β
β β
ββββ tools/call result ββββββββββββββ
β { β
β "content": [ β
β { β
β "type": "text", β
β "text": "{\"contacts\":..}"β
β } β
β ], β
β "isError": false β
β } β
β β
Key points:
argumentsare validated against the tool'sinputSchemabefore execution- Results are returned as
contentarrays, supporting multiple content types (text, image, resource) isError: truesignals a tool execution error (not a protocol error β protocol errors use JSON-RPC error responses)- The gateway proxies
tools/callto the backend REST API, translating MCP format to HTTP and back
Phase 4: Streaming (Server-Sent Events)β
For long-running operations, MCP supports streaming responses via notifications:
Client Server
β β
βββ tools/call βββββββββββββββββββββββ
β (long-running operation) β
β β
ββββ progress notification ββββββββββ
β {"method":"notifications/progress",
β "params":{"progressToken":"abc",β
β "progress":0.25, β
β "total":1.0}} β
β β
ββββ progress notification ββββββββββ
β {..., "progress": 0.75} β
β β
ββββ tools/call result ββββββββββββββ
β {"content": [...]} β
β β
Streaming is essential for enterprise workloads: batch data processing, report generation, and multi-step workflows all benefit from progress updates rather than blocking until completion.
Transport Layer Optionsβ
MCP is transport-agnostic. The same JSON-RPC messages can flow over different transport mechanisms depending on the deployment context.
HTTP + Server-Sent Events (SSE)β
The most common transport for production MCP deployments:
Client ββHTTP POSTβββ Server (request)
Client βββSSEββββββ Server (response stream)
How it works:
- Client sends JSON-RPC requests as HTTP POST to the server's message endpoint
- Server streams responses back over a persistent SSE connection
- Multiple requests can be in-flight simultaneously on the same SSE connection
Advantages:
- Works through HTTP proxies, load balancers, CDNs, and firewalls
- Uni-directional streaming (server to client) is well-supported by all infrastructure
- Easy to add authentication headers (Bearer tokens, API keys)
- Compatible with existing HTTP monitoring and logging tools
Limitations:
- Server-to-client streaming only (client uses POST for requests)
- SSE connections can be dropped by aggressive proxies (configure timeouts)
- No binary frame support (everything is UTF-8 text)
Best for: Production deployments behind API gateways, cloud environments, enterprise networks.
Streamable HTTP (2025-03 spec)β
The latest MCP specification introduces Streamable HTTP as a simplified transport:
Client ββHTTP POSTβββ Server
(request in body, response streamed back on same connection)
How it works:
- Client sends a JSON-RPC request as HTTP POST
- Server responds with
Content-Type: text/event-streamfor streaming, orapplication/jsonfor single responses - The server can optionally include a
Mcp-Session-Idheader for session affinity
Advantages:
- Simpler than SSE (no separate event stream endpoint)
- Session management via headers (not URL paths)
- Supports both streaming and non-streaming responses
- Better alignment with standard HTTP semantics
Best for: New implementations targeting the 2025-03 spec.
WebSocketβ
Bi-directional, full-duplex communication:
Client βββWebSocketβββ Server (bidirectional)
How it works:
- Client establishes a WebSocket connection (HTTP upgrade)
- Both sides can send JSON-RPC messages at any time
- Server can push notifications without client polling
Advantages:
- True bi-directional communication
- Lower latency for high-frequency interactions
- Server-initiated notifications without polling
Limitations:
- WebSocket connections are harder to load-balance (sticky sessions needed)
- Some enterprise firewalls and proxies block WebSocket upgrades
- More complex to debug than HTTP (no standard request/response logging)
Best for: Real-time applications, low-latency tool invocations, bi-directional notification patterns.
stdio (Standard I/O)β
Process-level communication via stdin/stdout pipes:
Client ββstdinβββ Server Process
Client βββstdoutββ Server Process
How it works:
- Client spawns the MCP server as a child process
- JSON-RPC messages are written to the process's stdin
- Responses are read from the process's stdout
- One message per line (newline-delimited JSON)
Advantages:
- Zero network configuration β works offline
- Process-level isolation and lifecycle management
- Simplest transport to implement
Limitations:
- Single-machine only (no network access)
- One client per server process
- No built-in authentication (process-level trust)
Best for: Local development, IDE integrations (VS Code, Claude Desktop), CLI tools.
Transport Comparisonβ
| Feature | HTTP+SSE | Streamable HTTP | WebSocket | stdio |
|---|---|---|---|---|
| Direction | ClientβPOST, ServerβSSE | Bidirectional via HTTP | Full duplex | Bidirectional via pipes |
| Infrastructure | Standard HTTP stack | Standard HTTP stack | WebSocket-aware LB | Local process |
| Auth | HTTP headers | HTTP headers | Initial handshake | Process-level trust |
| Streaming | ServerβClient | Both | Both | Both |
| Firewall-friendly | Yes | Yes | Sometimes blocked | N/A (local) |
| Load balancing | Standard HTTP | Standard HTTP | Sticky sessions | N/A |
| Best for | Production APIs | New implementations | Real-time apps | Local dev/IDEs |
Security Modelβ
MCP's security model operates at multiple layers:
Transport Securityβ
All production MCP deployments should use TLS. The protocol itself does not mandate TLS, but without it, JSON-RPC messages (including tool arguments and results) travel in plaintext.
Client ββTLS 1.3βββ MCP Gateway ββmTLSβββ Backend Service
Authenticationβ
MCP does not define its own authentication mechanism. Instead, it relies on the transport layer:
- HTTP transports: Bearer tokens (JWT), API keys, or client certificates in HTTP headers
- WebSocket: Authentication during the HTTP upgrade handshake
- stdio: Process-level trust (the client controls which server it spawns)
An MCP gateway centralizes authentication. Instead of each MCP server implementing its own auth, the gateway validates credentials once and forwards authenticated requests to backend servers.
Authorizationβ
MCP's capabilities negotiation provides coarse-grained feature control, but fine-grained authorization (which tools can this tenant call?) is the gateway's responsibility.
STOA implements this with OPA policies evaluated at two points:
- Discovery time:
tools/listresponses are filtered per-tenant. Unauthorized tools are never shown. - Invocation time:
tools/callrequests are evaluated against per-tenant, per-tool policies before proxying.
This prevents both direct tool access and enumeration attacks (where an agent discovers tools it shouldn't know about).
Auditβ
Every MCP interaction should be logged for compliance:
| Event | What to Log | Why |
|---|---|---|
initialize | Client identity, protocol version | Track which agents connect |
tools/list | Tenant ID, tools returned | Audit tool discovery |
tools/call | Tenant, tool, arguments, result status | Full invocation audit trail |
| Error | Error type, tenant, context | Incident investigation |
MCP gateways produce these audit events automatically. Raw MCP servers require custom instrumentation.
MCP Compared to Other Protocolsβ
MCP vs gRPCβ
| Aspect | MCP | gRPC |
|---|---|---|
| Purpose | AI agent β tool communication | Service-to-service RPC |
| Schema | JSON Schema (runtime discovery) | Protobuf (compile-time code generation) |
| Discovery | Dynamic (tools/list at runtime) | Static (proto files, service reflection) |
| Transport | HTTP+SSE, WebSocket, stdio | HTTP/2 |
| Streaming | SSE or WebSocket | Bidirectional HTTP/2 streams |
| Ecosystem | AI agents, LLM frameworks | Microservices, cloud infrastructure |
| Binary support | Text-based (JSON) | Native binary (Protobuf) |
When to use MCP: AI agent integration, dynamic tool discovery, multi-tenant tool access. When to use gRPC: High-performance service-to-service communication, strict schemas, binary payloads.
MCP and gRPC are complementary. An MCP gateway can proxy tool invocations to gRPC backends β the agent sees MCP tools, the backend serves gRPC.
MCP vs GraphQLβ
| Aspect | MCP | GraphQL |
|---|---|---|
| Purpose | AI agent tool invocation | Client-driven data querying |
| Schema | Per-tool JSON Schema | Unified type system |
| Query model | Tool call (function invocation) | Declarative query (ask for fields) |
| Discovery | tools/list enumeration | Schema introspection |
| Streaming | Progress notifications | Subscriptions |
| Auth model | Per-tool policies | Per-field resolvers |
When to use MCP: AI agents that need to call functions (search, create, update). When to use GraphQL: Clients that need flexible data querying with field-level control.
Again, these are complementary. An MCP tool can internally execute a GraphQL query against a backend.
MCP vs OpenAI Function Callingβ
| Aspect | MCP | OpenAI Function Calling |
|---|---|---|
| Standard | Open (Anthropic + community) | Proprietary (OpenAI) |
| Discovery | Runtime (tools/list) | Compile-time (function schemas in API call) |
| Transport | Multiple (SSE, WS, stdio) | OpenAI API only |
| Multi-tenant | Built into protocol | Application-level |
| Vendor lock-in | None | OpenAI ecosystem |
For a detailed comparison, see MCP vs OpenAI Function Calling vs LangChain Tools.
Building an MCP Server: Minimal Exampleβ
To understand the protocol concretely, here's a minimal MCP server in Python (stdio transport):
import json
import sys
def handle_request(request):
method = request.get("method")
if method == "initialize":
return {
"protocolVersion": "2025-03",
"capabilities": {"tools": {}},
"serverInfo": {"name": "demo-server", "version": "1.0.0"}
}
elif method == "tools/list":
return {
"tools": [{
"name": "greet",
"description": "Generate a greeting message",
"inputSchema": {
"type": "object",
"properties": {
"name": {"type": "string", "description": "Name to greet"}
},
"required": ["name"]
}
}]
}
elif method == "tools/call":
tool_name = request["params"]["name"]
args = request["params"]["arguments"]
if tool_name == "greet":
return {
"content": [{"type": "text", "text": f"Hello, {args['name']}!"}],
"isError": False
}
return {"error": {"code": -32601, "message": f"Unknown method: {method}"}}
# stdio transport: read JSON-RPC from stdin, write to stdout
for line in sys.stdin:
request = json.loads(line.strip())
result = handle_request(request)
response = {"jsonrpc": "2.0", "id": request.get("id"), "result": result}
sys.stdout.write(json.dumps(response) + "\n")
sys.stdout.flush()
This 40-line server implements the full MCP lifecycle: initialization, discovery, and tool invocation. In production, you would use an MCP SDK and deploy behind a gateway β but the protocol is simple enough to implement from scratch.
Production Considerationsβ
Connection Lifecycleβ
MCP sessions are long-lived. An AI agent may maintain a connection for hours or days. Plan for:
- Reconnection: Clients should handle dropped connections gracefully and re-initialize
- Session state: Avoid server-side session state if possible (stateless tool invocations scale better)
- Heartbeats: Use SSE comments (
:ping) or WebSocket ping frames to detect dead connections
Scalabilityβ
MCP gateways handle tool invocations as proxied HTTP requests β they scale the same way any reverse proxy scales:
- Horizontal scaling with Kubernetes replicas
- Connection pooling to backend services
- Stateless request handling (no session affinity needed for HTTP+SSE)
Error Handlingβ
MCP distinguishes between protocol errors and tool errors:
- Protocol errors: Invalid JSON-RPC, unknown method, malformed params β JSON-RPC error response
- Tool errors: Backend returned 500, timeout, validation failure β
tools/callresult withisError: true
The gateway should never expose backend error details (stack traces, internal URLs) to the client. Log them server-side and return sanitized error messages.
Frequently Asked Questionsβ
What version of MCP should I target?β
Target the 2025-03 protocol version, which includes Streamable HTTP transport and improved capability negotiation. The protocol is backward-compatible β a server supporting 2025-03 can negotiate down to 2024-11 with older clients. Check the official MCP specification for the latest version.
Can MCP handle binary data (files, images)?β
MCP content types include text and image (base64-encoded). For large binary payloads, the recommended pattern is to return a URL or resource reference that the client can fetch separately, rather than embedding binary data in the JSON-RPC response. The resources/read method supports this pattern natively.
How does MCP handle authentication across multiple servers?β
Each MCP server (or gateway) handles its own authentication independently. A client connecting to multiple MCP servers manages separate credentials per connection. An MCP gateway simplifies this by providing a single authenticated endpoint that routes to multiple backend servers β the client authenticates once with the gateway.
Is MCP suitable for high-throughput workloads?β
MCP adds minimal overhead to tool invocations β the JSON-RPC envelope is a few hundred bytes. The gateway's proxying latency depends on the transport and backend. STOA's Rust-based gateway adds sub-millisecond latency per invocation. For bulk operations, consider batch tool invocations or streaming responses rather than high-frequency individual calls. See our gateway performance benchmarks for measured latencies.
Further Readingβ
- What is an MCP Gateway? β Why AI agents need a gateway layer
- OAuth 2.1 + PKCE for MCP Gateways β Complete OAuth flow for MCP clients
- Convert REST APIs to MCP Tools β Practical guide to exposing your APIs
- Connecting AI Agents to Enterprise APIs β Enterprise integration patterns
- ESB is Dead, Long Live MCP β How MCP replaces traditional integration middleware
- MCP Gateway Concepts β STOA's gateway architecture
- Official MCP Specification β The protocol source of truth
Building on MCP? Start with the quickstart guide to deploy a working gateway, or explore the MCP gateway documentation for architecture details.