Flowgenie — Excellence In Technology
MCPAI AgentsSecurityProductionArchitecture

Building Production MCP Servers: Authentication, Rate Limiting, and Logging

Mahesh Ramala·9 min read·

Most MCP tutorials show you hello-world examples. This guide covers what you actually need for production: authentication strategies, rate limiting that prevents abuse, structured logging for compliance, and deployment patterns that scale.

Building something like this?

I implement AI agents, Zoho automation & MCP integrations — end to end.

Building an MCP server that works in a demo is easy. Building one that you'd trust with live business data, can survive real usage patterns, and gives you visibility when something goes wrong — that's a different challenge.

This guide covers the production engineering concerns that most MCP tutorials skip entirely.

What "Production" Actually Means for MCP

Demo MCP servers are single-user, trust everything that connects, crash when something goes wrong, and leave no trace of what they did. That's fine for a local proof-of-concept. It's not fine when business data is flowing through.

Production means multiple agents connecting simultaneously — each with different permissions, different rate limits, different access to your tools. It means the server handles auth failures, malformed inputs, and backend downtime without dying. It means every tool call is logged with enough detail to reconstruct what happened if something goes wrong a week later. And it means the server keeps running when you're not watching it.

None of this is complicated. But you have to actually build it.

Hello-world MCP examples skip all of it. Let's fix that.

Transport Choice: stdio vs HTTP

For production deployments serving multiple clients, HTTP with SSE is the only viable transport. stdio works for local development and single-user Claude Desktop integrations, but doesn't support concurrent connections or authentication.

import express from "express";
import { SSEServerTransport } from "@modelcontextprotocol/sdk/server/sse.js";

const app = express();

// Authentication middleware (see below)
app.use(authenticateRequest);

// Rate limiting (see below)
app.use(rateLimiter);

// MCP SSE endpoint
app.get("/sse", async (req, res) => {
  const transport = new SSEServerTransport("/messages", res);
  await mcpServer.connect(transport);
});

app.post("/messages", express.json(), async (req, res) => {
  await transport.handlePostMessage(req, res);
});

app.listen(3000);

Authentication

Strategy 1: API Keys (Simplest — Good for Internal Use)

For internal tools where clients are known services (your own Lambda functions, your own Claude agents), API key authentication is simple and effective.

interface AuthenticatedRequest extends Request {
  clientId?: string;
  clientPermissions?: string[];
}

const API_KEYS: Record<string, { clientId: string; permissions: string[] }> = {
  // Stored in Secrets Manager, loaded at startup
};

async function authenticateRequest(
  req: AuthenticatedRequest,
  res: Response,
  next: NextFunction
) {
  const apiKey = req.headers['x-api-key'] as string;

  if (!apiKey) {
    return res.status(401).json({ error: "API key required" });
  }

  const client = API_KEYS[apiKey];
  if (!client) {
    // Log failed auth attempts
    logger.warn({ event: 'auth_failed', ip: req.ip, path: req.path });
    return res.status(401).json({ error: "Invalid API key" });
  }

  // Attach client context for downstream middleware
  req.clientId = client.clientId;
  req.clientPermissions = client.permissions;

  logger.info({ event: 'auth_success', clientId: client.clientId });
  next();
}

Store API keys in AWS Secrets Manager and load them at server startup. Rotate keys by adding new ones before removing old ones — zero downtime rotation.

Strategy 2: JWT Tokens (For User-Scoped Access)

If your MCP server needs to act on behalf of specific users (each user should only see their own data), use JWTs:

import jwt from "jsonwebtoken";

async function authenticateJWT(req: AuthenticatedRequest, res: Response, next: NextFunction) {
  const token = req.headers.authorization?.replace("Bearer ", "");

  if (!token) {
    return res.status(401).json({ error: "Bearer token required" });
  }

  try {
    const payload = jwt.verify(token, process.env.JWT_SECRET!) as JWTPayload;

    req.clientId = payload.sub;
    req.clientPermissions = payload.permissions;

    next();
  } catch (error) {
    if (error instanceof jwt.TokenExpiredError) {
      return res.status(401).json({ error: "Token expired" });
    }
    return res.status(401).json({ error: "Invalid token" });
  }
}

Strategy 3: mTLS (For High-Security Enterprise)

Mutual TLS provides the strongest authentication — clients must present a valid certificate signed by your CA:

import https from "https";
import fs from "fs";

const server = https.createServer({
  key: fs.readFileSync("server.key"),
  cert: fs.readFileSync("server.cert"),
  ca: fs.readFileSync("client-ca.cert"),
  requestCert: true,
  rejectUnauthorized: true, // Reject clients without valid cert
}, app);

// Extract client identity from certificate
app.use((req: AuthenticatedRequest, res, next) => {
  const cert = (req as any).socket.getPeerCertificate();
  if (!cert || !cert.subject) {
    return res.status(401).json({ error: "Client certificate required" });
  }
  req.clientId = cert.subject.CN;
  next();
});

Authorisation: Per-Tool Permissions

After authentication, enforce authorisation at the tool level:

interface Permission {
  tools: string[];
  maxRecordsPerQuery?: number;
  allowWrite?: boolean;
}

const CLIENT_PERMISSIONS: Record<string, Permission> = {
  "customer-service-agent": {
    tools: ["search_contacts", "get_contact_history", "create_note"],
    maxRecordsPerQuery: 10,
    allowWrite: true,
  },
  "reporting-agent": {
    tools: ["search_contacts", "get_deal_summary", "get_revenue_report"],
    maxRecordsPerQuery: 100,
    allowWrite: false,
  },
  "readonly-dashboard": {
    tools: ["get_dashboard_metrics"],
    maxRecordsPerQuery: 50,
    allowWrite: false,
  },
};

// In your tool execution handler
server.setRequestHandler(CallToolRequestSchema, async (request, extra) => {
  const clientId = extra.authInfo?.clientId as string;
  const permissions = CLIENT_PERMISSIONS[clientId];

  if (!permissions) {
    throw new Error("Client not recognised");
  }

  if (!permissions.tools.includes(request.params.name)) {
    logger.warn({
      event: "tool_access_denied",
      clientId,
      tool: request.params.name,
    });
    throw new Error(`Tool '${request.params.name}' is not permitted for this client`);
  }

  // Execute tool with permission context
  return executeTool(request.params.name, request.params.arguments, permissions);
});

Rate Limiting

AI agents can generate far more requests than human users. Without rate limiting, a misbehaving agent can saturate your backend APIs, exhaust quotas, and run up unexpected costs.

Multi-Dimensional Rate Limiting

Apply limits at multiple levels:

import { RateLimiterMemory, RateLimiterRedis } from "rate-limiter-flexible";

// Global rate limiter (all clients combined)
const globalLimiter = new RateLimiterMemory({
  points: 500,   // 500 requests
  duration: 60,  // per minute
});

// Per-client rate limiter (prevents one client monopolising)
const clientLimiter = new RateLimiterRedis({
  storeClient: redisClient,
  keyPrefix: "mcp_client",
  points: 60,    // 60 requests
  duration: 60,  // per minute per client
});

// Per-tool rate limiter (expensive tools get stricter limits)
const toolLimiter = new RateLimiterMemory({
  points: 10,    // 10 calls
  duration: 60,  // per minute
});

async function checkRateLimits(clientId: string, toolName: string) {
  try {
    await globalLimiter.consume("global");
    await clientLimiter.consume(clientId);

    if (EXPENSIVE_TOOLS.includes(toolName)) {
      await toolLimiter.consume(`${clientId}:${toolName}`);
    }
  } catch (rateLimitError) {
    logger.warn({
      event: "rate_limit_exceeded",
      clientId,
      toolName,
    });
    throw new Error("Rate limit exceeded. Retry after 60 seconds.");
  }
}

For production, use Redis-backed rate limiting (RateLimiterRedis) so limits are shared across multiple server instances. Memory-based limits only work for single-instance deployments.

Returning Retry-After Headers

When rate limiting kicks in, tell clients how long to wait:

app.use(async (req, res, next) => {
  try {
    await checkRateLimits(req.clientId, req.body?.params?.name);
    next();
  } catch (error) {
    res.set('Retry-After', '60');
    res.status(429).json({ error: error.message });
  }
});

Claude handles 429 responses gracefully if you implement tool error handling correctly — it will wait and retry.

Structured Logging

Every production MCP server needs complete, queryable logs. Structure them as JSON from the start:

import pino from "pino";

const logger = pino({
  level: process.env.LOG_LEVEL ?? "info",
  formatters: {
    level: (label) => ({ level: label }),
  },
  timestamp: pino.stdTimeFunctions.isoTime,
});

// Log every tool call
async function executeTool(
  toolName: string,
  args: unknown,
  context: ExecutionContext
) {
  const startTime = Date.now();
  const executionId = crypto.randomUUID();

  logger.info({
    event: "tool_call_start",
    executionId,
    clientId: context.clientId,
    toolName,
    // Sanitise args — remove PII before logging
    args: sanitiseForLogging(args),
  });

  try {
    const result = await executeToolInternal(toolName, args, context);

    logger.info({
      event: "tool_call_success",
      executionId,
      clientId: context.clientId,
      toolName,
      durationMs: Date.now() - startTime,
      recordsReturned: countRecords(result),
    });

    return result;

  } catch (error) {
    logger.error({
      event: "tool_call_error",
      executionId,
      clientId: context.clientId,
      toolName,
      durationMs: Date.now() - startTime,
      error: error instanceof Error ? error.message : String(error),
    });

    throw error;
  }
}

Log to CloudWatch (AWS) or Structured Destinations

// Ship logs to CloudWatch Logs for retention and alerting
const logger = pino({
  transport: {
    target: "@aws-lambda-powertools/logger",
    options: {
      serviceName: "mcp-server",
      logLevel: "INFO",
    },
  },
});

With structured JSON logs in CloudWatch, you can query them with CloudWatch Insights:

-- Find all tool calls by a specific client today
fields @timestamp, toolName, durationMs, recordsReturned
| filter clientId = "customer-service-agent"
| filter event = "tool_call_success"
| sort @timestamp desc
| limit 50
-- Find slowest tool calls (performance monitoring)
stats avg(durationMs) as avgMs, count(*) as callCount by toolName
| sort avgMs desc
-- Find all failed tool calls with error messages
filter event = "tool_call_error"
| fields @timestamp, clientId, toolName, error
| sort @timestamp desc

Health Checks and Observability

Add a health endpoint your load balancer and monitoring can probe:

interface HealthStatus {
  status: "healthy" | "degraded" | "unhealthy";
  checks: Record<string, { status: string; latencyMs?: number }>;
  uptime: number;
}

app.get("/health", async (req, res) => {
  const checks: HealthStatus["checks"] = {};

  // Check Zoho API connectivity
  const zohoStart = Date.now();
  try {
    await zohoClient.ping();
    checks.zoho_api = { status: "ok", latencyMs: Date.now() - zohoStart };
  } catch {
    checks.zoho_api = { status: "error" };
  }

  // Check Redis connectivity (if using for rate limiting)
  try {
    await redisClient.ping();
    checks.redis = { status: "ok" };
  } catch {
    checks.redis = { status: "error" };
  }

  const allHealthy = Object.values(checks).every(c => c.status === "ok");
  const anyFailed = Object.values(checks).some(c => c.status === "error");

  const health: HealthStatus = {
    status: allHealthy ? "healthy" : anyFailed ? "unhealthy" : "degraded",
    checks,
    uptime: process.uptime(),
  };

  res.status(allHealthy ? 200 : 503).json(health);
});

Deployment: Container-Based on ECS or App Runner

For a production MCP server with HTTP transport, run it as a container:

FROM node:20-alpine

WORKDIR /app

COPY package*.json ./
RUN npm ci --production

COPY dist/ ./dist/

# Don't run as root
USER node

EXPOSE 3000

HEALTHCHECK --interval=30s --timeout=5s --start-period=10s \
  CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1

CMD ["node", "dist/server.js"]

Deploy on AWS App Runner (simplest — handles scaling automatically) or ECS Fargate (more control). Both support VPC placement, secrets injection from Secrets Manager, and CloudWatch logging.

Testing Your Production Server

Before deploying:

Load test: Use k6 or Artillery to simulate concurrent AI agents. Verify rate limiting kicks in correctly and the server stays stable.

Auth testing: Try connecting with invalid API keys, expired tokens, missing certificates. Verify all return 401, not 500.

Injection testing: Pass malicious content through your tools. Verify input validation catches it.

Chaos testing: Simulate backend failures (Zoho API down, Redis unavailable). Verify graceful degradation.

Audit log review: After testing, review the logs. Can you reconstruct exactly what happened? Is PII properly excluded?


Building production-grade infrastructure takes more upfront time than a quick demo — but it's the difference between a tool your business can trust and one that fails when you need it most.

If you're building an MCP server for a real business integration and want the architecture reviewed before you ship, let's talk.

Mahesh Ramala

Mahesh Ramala

AI Specialist · Zoho Authorized Partner · Upwork Top Rated Plus

I build custom AI agents, MCP server integrations, and Zoho automation for businesses across industries. If you found this article useful, let’s connect.

More from the Blog