Lambda is an ideal host for Claude AI agents — zero server management, automatic scaling, and pay-per-use pricing. Here's how to build secure, production-ready serverless AI agents on AWS.
Lambda is my default deployment target for Claude AI agents. No servers to patch, automatic scaling from zero to thousands of concurrent requests, and you pay only for the compute time you actually use. For most business AI workloads — which tend to be bursty and event-driven rather than continuous — it's hard to beat.
But Lambda has quirks, and AI workloads amplify some of them. This guide covers the architecture, security, and gotchas you need to know.
Why Lambda Works Well for AI Agents
The AI agent execution model maps naturally onto Lambda's stateless function model:
- A user sends a request
- The agent runs (may take 5–30 seconds for complex multi-tool workflows)
- The agent returns a response
- The function terminates
There's no persistent process to manage, no idle server consuming resources between requests. For a business with 50–500 AI agent interactions per day, Lambda's economics are excellent.
Practical example: An AI agent that processes new CRM leads runs for 8 seconds per invocation. At 100 leads per day, that's 800 seconds of compute — under $0.02/day on Lambda. Compare that to an EC2 instance running 24/7.
Architecture Overview
Event Source Lambda Function Downstream
──────────── ─────────────── ──────────
API Gateway ──▶ ┌─────────────────┐ ──▶ Zoho CRM
EventBridge ──▶ │ Claude Agent │ ──▶ Anthropic API
SQS Queue ──▶ │ (Node.js/ │ ──▶ DynamoDB
Webhook ──▶ │ Python) │ ──▶ S3
──▶ └─────────────────┘ ──▶ External APIs
The Lambda function receives an event, initialises the agent, runs the agent loop (potentially multiple tool calls), and returns the result. The event source determines whether this is synchronous (API Gateway) or asynchronous (SQS, EventBridge).
Function Structure
A well-structured Lambda function for a Claude agent:
import { Handler } from "aws-lambda";
import Anthropic from "@anthropic-ai/sdk";
import { getSecret } from "./secrets";
import { executeTool } from "./tools";
import { auditLog } from "./logging";
// Initialise outside handler — reused across warm invocations
let anthropicClient: Anthropic | null = null;
async function getAnthropicClient(): Promise<Anthropic> {
if (!anthropicClient) {
const apiKey = await getSecret("production/ai-agent/anthropic-api-key");
anthropicClient = new Anthropic({ apiKey });
}
return anthropicClient;
}
export const handler: Handler = async (event, context) => {
const startTime = Date.now();
const requestId = context.awsRequestId;
try {
const { userMessage, sessionId, agentType } = parseEvent(event);
const client = await getAnthropicClient();
// Run the agent loop
const result = await runAgentLoop({
client,
userMessage,
sessionId,
agentType,
requestId,
});
await auditLog({
requestId,
sessionId,
agentType,
outcome: "success",
durationMs: Date.now() - startTime,
toolCallCount: result.toolCallCount,
});
return { statusCode: 200, body: JSON.stringify({ response: result.response }) };
} catch (error) {
await auditLog({
requestId,
outcome: "error",
error: error instanceof Error ? error.message : String(error),
durationMs: Date.now() - startTime,
});
return { statusCode: 500, body: JSON.stringify({ error: "Agent error" }) };
}
};
async function runAgentLoop({ client, userMessage, sessionId, agentType, requestId }) {
const messages: Anthropic.MessageParam[] = [{ role: "user", content: userMessage }];
const tools = getToolsForAgent(agentType);
let toolCallCount = 0;
while (true) {
const response = await client.messages.create({
model: "claude-3-5-sonnet-20241022",
max_tokens: 4096,
system: getSystemPrompt(agentType),
tools,
messages,
});
if (response.stop_reason === "end_turn") {
const textContent = response.content.find(b => b.type === "text");
return { response: textContent?.text ?? "", toolCallCount };
}
if (response.stop_reason === "tool_use") {
messages.push({ role: "assistant", content: response.content });
const toolResults: Anthropic.ToolResultBlockParam[] = [];
for (const block of response.content) {
if (block.type !== "tool_use") continue;
toolCallCount++;
const result = await executeTool(block.name, block.input, { requestId, sessionId });
toolResults.push({
type: "tool_result",
tool_use_id: block.id,
content: JSON.stringify(result),
});
}
messages.push({ role: "user", content: toolResults });
}
// Safety limit
if (toolCallCount > 20) {
throw new Error("Tool call limit exceeded");
}
}
}
Lambda Configuration for AI Workloads
Memory and Timeout
AI agent invocations are longer than typical Lambda functions. Configure accordingly:
# serverless.yml or CDK
functions:
aiAgent:
handler: dist/handler.handler
memorySize: 1024 # 1GB — AI workloads benefit from more memory
timeout: 120 # 2 minutes — enough for complex multi-tool workflows
reservedConcurrency: 50 # Prevent runaway costs from misbehaving clients
Lambda's CPU allocation scales with memory. 1024MB gives you roughly twice the CPU of 512MB — important for JSON parsing and processing tool results.
Why 2 minute timeout? A complex agent workflow might call 8–10 tools, each taking 1–3 seconds. Add Anthropic API latency (1–5 seconds per message) and you can hit 30–60 seconds for sophisticated workflows. Set the timeout with headroom.
Environment Variables vs Secrets
Never put secrets in Lambda environment variables — they're visible in the AWS console and can be read by anyone with lambda:GetFunctionConfiguration permission.
// Wrong — API key visible in console
process.env.ANTHROPIC_API_KEY
// Right — fetched from Secrets Manager at runtime
const apiKey = await secretsClient.send(
new GetSecretValueCommand({ SecretId: "production/ai-agent/anthropic-api-key" })
);
The performance cost of fetching from Secrets Manager is paid once per Lambda container lifetime (typically 15+ minutes). On warm invocations, the cached value is used.
Security: Lambda-Specific Considerations
IAM Role — Lambda Execution Role
The Lambda function's execution role is its identity. Apply the strictest possible permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "secretsmanager:GetSecretValue",
"Resource": "arn:aws:secretsmanager:ap-southeast-2:ACCOUNT:secret:production/ai-agent/*"
},
{
"Effect": "Allow",
"Action": ["dynamodb:GetItem", "dynamodb:Query", "dynamodb:PutItem"],
"Resource": "arn:aws:dynamodb:ap-southeast-2:ACCOUNT:table/AgentSessions"
},
{
"Effect": "Allow",
"Action": ["logs:CreateLogStream", "logs:PutLogEvents"],
"Resource": "arn:aws:logs:ap-southeast-2:ACCOUNT:log-group:/aws/lambda/ai-agent:*"
}
]
}
No wildcards. No lambda:*. No access to services the agent doesn't use.
VPC Placement
Put your Lambda function in a private VPC subnet when it accesses internal resources:
// CDK
const agentFunction = new lambda.Function(this, "AIAgentFunction", {
vpc: myVpc,
vpcSubnets: { subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS },
securityGroups: [agentSecurityGroup],
// ...
});
With VPC placement, use VPC endpoints for AWS services (Secrets Manager, DynamoDB, CloudWatch Logs) so traffic stays within the AWS network.
Resource-Based Policy on Lambda
Control which services can invoke your Lambda function:
{
"Effect": "Allow",
"Principal": { "Service": "apigateway.amazonaws.com" },
"Action": "lambda:InvokeFunction",
"Resource": "arn:aws:lambda:ap-southeast-2:ACCOUNT:function:ai-agent",
"Condition": {
"ArnLike": {
"AWS:SourceArn": "arn:aws:execute-api:ap-southeast-2:ACCOUNT:API_ID/*"
}
}
}
Only your specific API Gateway can invoke this function — not any other service or account.
Handling Asynchronous AI Workflows
Some AI workflows shouldn't be synchronous. If processing takes 30+ seconds, use an async pattern:
Client API Gateway Lambda SQS
│ │ │ │
│──POST /process────────▶│ │ │
│ │──Invoke sync─────▶│ │
│ │ │──Send to SQS──▶│
│◀──202 Accepted (jobId)─│◀─Return jobId────│ │
│ │ │ │
│ │ ┌───────────────────▶│
│ │ │ Lambda (async) │
│ │ │ processes job │
│ │ │ writes to DDB │
│ │ └────────────────────│
│ │ │ │
│──GET /status/{jobId}──▶│ │ │
│◀──200 {status: done}───│ │ │
The initial request queues the job and returns a job ID. The client polls for status. This avoids Lambda timeout limits and gives you better visibility into long-running workflows.
Cost Management
Lambda pricing is predictable but can surprise you with AI workloads:
Compute: ~$0.00001667 per GB-second. At 1GB memory and 15 seconds per invocation: $0.00025 per invocation. At 1000 invocations/day: $0.25/day.
Anthropic API: This is usually the dominant cost. Claude Sonnet is ~$3/M input tokens + $15/M output tokens. A complex 8-tool workflow might use 8,000 tokens — about $0.04 per invocation. At 500 invocations/day: $20/day.
Cost controls:
- Set
reservedConcurrency to cap simultaneous invocations
- Use CloudWatch alarms for anomalous invocation counts
- Implement per-user or per-session rate limiting at the application layer
- Consider Claude Haiku for simple routing tasks, Sonnet for complex reasoning
Observability
Enable Lambda Powertools for structured logging and tracing:
import { Logger } from "@aws-lambda-powertools/logger";
import { Tracer } from "@aws-lambda-powertools/tracer";
const logger = new Logger({ serviceName: "ai-agent" });
const tracer = new Tracer({ serviceName: "ai-agent" });
export const handler = tracer.captureLambdaHandler(async (event, context) => {
logger.addContext(context);
logger.info("Agent invocation started", { agentType, sessionId });
// ...
});
X-Ray tracing gives you end-to-end visibility: how long each tool call took, where latency is, which tools are called most.
Lambda + Claude is a powerful combination for event-driven AI workloads. If you're designing a serverless AI architecture and want to get the security model right from the start, get in touch.