Most MCP tutorials show you hello-world examples. This guide covers what you actually need for production: authentication strategies, rate limiting that prevents abuse, structured logging for compliance, and deployment patterns that scale.
Building an MCP server that works in a demo is easy. Building one that you'd trust with live business data, can survive real usage patterns, and gives you visibility when something goes wrong — that's a different challenge.
This guide covers the production engineering concerns that most MCP tutorials skip entirely.
What "Production" Actually Means for MCP
Demo MCP servers are single-user, trust everything that connects, crash when something goes wrong, and leave no trace of what they did. That's fine for a local proof-of-concept. It's not fine when business data is flowing through.
Production means multiple agents connecting simultaneously — each with different permissions, different rate limits, different access to your tools. It means the server handles auth failures, malformed inputs, and backend downtime without dying. It means every tool call is logged with enough detail to reconstruct what happened if something goes wrong a week later. And it means the server keeps running when you're not watching it.
None of this is complicated. But you have to actually build it.
Hello-world MCP examples skip all of it. Let's fix that.
Transport Choice: stdio vs HTTP
For production deployments serving multiple clients, HTTP with SSE is the only viable transport. stdio works for local development and single-user Claude Desktop integrations, but doesn't support concurrent connections or authentication.
import express from "express";
import { SSEServerTransport } from "@modelcontextprotocol/sdk/server/sse.js";
const app = express();
// Authentication middleware (see below)
app.use(authenticateRequest);
// Rate limiting (see below)
app.use(rateLimiter);
// MCP SSE endpoint
app.get("/sse", async (req, res) => {
const transport = new SSEServerTransport("/messages", res);
await mcpServer.connect(transport);
});
app.post("/messages", express.json(), async (req, res) => {
await transport.handlePostMessage(req, res);
});
app.listen(3000);
Authentication
Strategy 1: API Keys (Simplest — Good for Internal Use)
For internal tools where clients are known services (your own Lambda functions, your own Claude agents), API key authentication is simple and effective.
interface AuthenticatedRequest extends Request {
clientId?: string;
clientPermissions?: string[];
}
const API_KEYS: Record<string, { clientId: string; permissions: string[] }> = {
// Stored in Secrets Manager, loaded at startup
};
async function authenticateRequest(
req: AuthenticatedRequest,
res: Response,
next: NextFunction
) {
const apiKey = req.headers['x-api-key'] as string;
if (!apiKey) {
return res.status(401).json({ error: "API key required" });
}
const client = API_KEYS[apiKey];
if (!client) {
// Log failed auth attempts
logger.warn({ event: 'auth_failed', ip: req.ip, path: req.path });
return res.status(401).json({ error: "Invalid API key" });
}
// Attach client context for downstream middleware
req.clientId = client.clientId;
req.clientPermissions = client.permissions;
logger.info({ event: 'auth_success', clientId: client.clientId });
next();
}
Store API keys in AWS Secrets Manager and load them at server startup. Rotate keys by adding new ones before removing old ones — zero downtime rotation.
Strategy 2: JWT Tokens (For User-Scoped Access)
If your MCP server needs to act on behalf of specific users (each user should only see their own data), use JWTs:
import jwt from "jsonwebtoken";
async function authenticateJWT(req: AuthenticatedRequest, res: Response, next: NextFunction) {
const token = req.headers.authorization?.replace("Bearer ", "");
if (!token) {
return res.status(401).json({ error: "Bearer token required" });
}
try {
const payload = jwt.verify(token, process.env.JWT_SECRET!) as JWTPayload;
req.clientId = payload.sub;
req.clientPermissions = payload.permissions;
next();
} catch (error) {
if (error instanceof jwt.TokenExpiredError) {
return res.status(401).json({ error: "Token expired" });
}
return res.status(401).json({ error: "Invalid token" });
}
}
Strategy 3: mTLS (For High-Security Enterprise)
Mutual TLS provides the strongest authentication — clients must present a valid certificate signed by your CA:
import https from "https";
import fs from "fs";
const server = https.createServer({
key: fs.readFileSync("server.key"),
cert: fs.readFileSync("server.cert"),
ca: fs.readFileSync("client-ca.cert"),
requestCert: true,
rejectUnauthorized: true, // Reject clients without valid cert
}, app);
// Extract client identity from certificate
app.use((req: AuthenticatedRequest, res, next) => {
const cert = (req as any).socket.getPeerCertificate();
if (!cert || !cert.subject) {
return res.status(401).json({ error: "Client certificate required" });
}
req.clientId = cert.subject.CN;
next();
});
Authorisation: Per-Tool Permissions
After authentication, enforce authorisation at the tool level:
interface Permission {
tools: string[];
maxRecordsPerQuery?: number;
allowWrite?: boolean;
}
const CLIENT_PERMISSIONS: Record<string, Permission> = {
"customer-service-agent": {
tools: ["search_contacts", "get_contact_history", "create_note"],
maxRecordsPerQuery: 10,
allowWrite: true,
},
"reporting-agent": {
tools: ["search_contacts", "get_deal_summary", "get_revenue_report"],
maxRecordsPerQuery: 100,
allowWrite: false,
},
"readonly-dashboard": {
tools: ["get_dashboard_metrics"],
maxRecordsPerQuery: 50,
allowWrite: false,
},
};
// In your tool execution handler
server.setRequestHandler(CallToolRequestSchema, async (request, extra) => {
const clientId = extra.authInfo?.clientId as string;
const permissions = CLIENT_PERMISSIONS[clientId];
if (!permissions) {
throw new Error("Client not recognised");
}
if (!permissions.tools.includes(request.params.name)) {
logger.warn({
event: "tool_access_denied",
clientId,
tool: request.params.name,
});
throw new Error(`Tool '${request.params.name}' is not permitted for this client`);
}
// Execute tool with permission context
return executeTool(request.params.name, request.params.arguments, permissions);
});
Rate Limiting
AI agents can generate far more requests than human users. Without rate limiting, a misbehaving agent can saturate your backend APIs, exhaust quotas, and run up unexpected costs.
Multi-Dimensional Rate Limiting
Apply limits at multiple levels:
import { RateLimiterMemory, RateLimiterRedis } from "rate-limiter-flexible";
// Global rate limiter (all clients combined)
const globalLimiter = new RateLimiterMemory({
points: 500, // 500 requests
duration: 60, // per minute
});
// Per-client rate limiter (prevents one client monopolising)
const clientLimiter = new RateLimiterRedis({
storeClient: redisClient,
keyPrefix: "mcp_client",
points: 60, // 60 requests
duration: 60, // per minute per client
});
// Per-tool rate limiter (expensive tools get stricter limits)
const toolLimiter = new RateLimiterMemory({
points: 10, // 10 calls
duration: 60, // per minute
});
async function checkRateLimits(clientId: string, toolName: string) {
try {
await globalLimiter.consume("global");
await clientLimiter.consume(clientId);
if (EXPENSIVE_TOOLS.includes(toolName)) {
await toolLimiter.consume(`${clientId}:${toolName}`);
}
} catch (rateLimitError) {
logger.warn({
event: "rate_limit_exceeded",
clientId,
toolName,
});
throw new Error("Rate limit exceeded. Retry after 60 seconds.");
}
}
For production, use Redis-backed rate limiting (RateLimiterRedis) so limits are shared across multiple server instances. Memory-based limits only work for single-instance deployments.
Returning Retry-After Headers
When rate limiting kicks in, tell clients how long to wait:
app.use(async (req, res, next) => {
try {
await checkRateLimits(req.clientId, req.body?.params?.name);
next();
} catch (error) {
res.set('Retry-After', '60');
res.status(429).json({ error: error.message });
}
});
Claude handles 429 responses gracefully if you implement tool error handling correctly — it will wait and retry.
Structured Logging
Every production MCP server needs complete, queryable logs. Structure them as JSON from the start:
import pino from "pino";
const logger = pino({
level: process.env.LOG_LEVEL ?? "info",
formatters: {
level: (label) => ({ level: label }),
},
timestamp: pino.stdTimeFunctions.isoTime,
});
// Log every tool call
async function executeTool(
toolName: string,
args: unknown,
context: ExecutionContext
) {
const startTime = Date.now();
const executionId = crypto.randomUUID();
logger.info({
event: "tool_call_start",
executionId,
clientId: context.clientId,
toolName,
// Sanitise args — remove PII before logging
args: sanitiseForLogging(args),
});
try {
const result = await executeToolInternal(toolName, args, context);
logger.info({
event: "tool_call_success",
executionId,
clientId: context.clientId,
toolName,
durationMs: Date.now() - startTime,
recordsReturned: countRecords(result),
});
return result;
} catch (error) {
logger.error({
event: "tool_call_error",
executionId,
clientId: context.clientId,
toolName,
durationMs: Date.now() - startTime,
error: error instanceof Error ? error.message : String(error),
});
throw error;
}
}
Log to CloudWatch (AWS) or Structured Destinations
// Ship logs to CloudWatch Logs for retention and alerting
const logger = pino({
transport: {
target: "@aws-lambda-powertools/logger",
options: {
serviceName: "mcp-server",
logLevel: "INFO",
},
},
});
With structured JSON logs in CloudWatch, you can query them with CloudWatch Insights:
-- Find all tool calls by a specific client today
fields @timestamp, toolName, durationMs, recordsReturned
| filter clientId = "customer-service-agent"
| filter event = "tool_call_success"
| sort @timestamp desc
| limit 50
-- Find slowest tool calls (performance monitoring)
stats avg(durationMs) as avgMs, count(*) as callCount by toolName
| sort avgMs desc
-- Find all failed tool calls with error messages
filter event = "tool_call_error"
| fields @timestamp, clientId, toolName, error
| sort @timestamp desc
Health Checks and Observability
Add a health endpoint your load balancer and monitoring can probe:
interface HealthStatus {
status: "healthy" | "degraded" | "unhealthy";
checks: Record<string, { status: string; latencyMs?: number }>;
uptime: number;
}
app.get("/health", async (req, res) => {
const checks: HealthStatus["checks"] = {};
// Check Zoho API connectivity
const zohoStart = Date.now();
try {
await zohoClient.ping();
checks.zoho_api = { status: "ok", latencyMs: Date.now() - zohoStart };
} catch {
checks.zoho_api = { status: "error" };
}
// Check Redis connectivity (if using for rate limiting)
try {
await redisClient.ping();
checks.redis = { status: "ok" };
} catch {
checks.redis = { status: "error" };
}
const allHealthy = Object.values(checks).every(c => c.status === "ok");
const anyFailed = Object.values(checks).some(c => c.status === "error");
const health: HealthStatus = {
status: allHealthy ? "healthy" : anyFailed ? "unhealthy" : "degraded",
checks,
uptime: process.uptime(),
};
res.status(allHealthy ? 200 : 503).json(health);
});
Deployment: Container-Based on ECS or App Runner
For a production MCP server with HTTP transport, run it as a container:
FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --production
COPY dist/ ./dist/
# Don't run as root
USER node
EXPOSE 3000
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s \
CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1
CMD ["node", "dist/server.js"]
Deploy on AWS App Runner (simplest — handles scaling automatically) or ECS Fargate (more control). Both support VPC placement, secrets injection from Secrets Manager, and CloudWatch logging.
Testing Your Production Server
Before deploying:
Load test: Use k6 or Artillery to simulate concurrent AI agents. Verify rate limiting kicks in correctly and the server stays stable.
Auth testing: Try connecting with invalid API keys, expired tokens, missing certificates. Verify all return 401, not 500.
Injection testing: Pass malicious content through your tools. Verify input validation catches it.
Chaos testing: Simulate backend failures (Zoho API down, Redis unavailable). Verify graceful degradation.
Audit log review: After testing, review the logs. Can you reconstruct exactly what happened? Is PII properly excluded?
Building production-grade infrastructure takes more upfront time than a quick demo — but it's the difference between a tool your business can trust and one that fails when you need it most.
If you're building an MCP server for a real business integration and want the architecture reviewed before you ship, let's talk.