Claude AIAWSSecurityCloud Architecture

Claude AI on AWS: A Security-First Deployment Guide (IAM, VPC, Secrets Manager)

Mahesh Ramala·8 min read·10 November 2025

Running Claude AI workloads on AWS introduces real security considerations. This guide covers IAM least-privilege design, VPC isolation, secrets management, and the architecture patterns that keep production AI safe.

Building something like this?

I implement AI agents, Zoho automation & MCP integrations — end to end.

AWS is the default choice for most serious AI deployments — and for good reason. The managed services, global reach, and security tooling are unmatched. But deploying Claude AI workloads on AWS introduces a distinct set of security concerns that most tutorials gloss over.

This guide is about getting it right from day one, not retrofitting security after something goes wrong.

Why AWS Security Matters More for AI Workloads

Traditional application security focuses on protecting the application itself. AI workloads add a new attack surface: the AI's access to your business data and systems.

If your Claude agent is compromised through prompt injection, it might try to:

Exfiltrate data through API calls
Modify records it shouldn't touch
Escalate its own permissions
Call services outside its intended scope

Your AWS security controls are the last line of defence when the AI behaves unexpectedly. This isn't theoretical — prompt injection attacks against production AI systems are already documented.

IAM: The Foundation of Everything

Identity and Access Management is where most AWS security fails, and it's even more critical for AI agents.

The Least-Privilege Principle for AI

Your Claude agent (running as a Lambda function, ECS task, or EC2 instance) needs an IAM role. That role should have the minimum permissions required for the agent's specific function — nothing more.

Wrong approach (common mistake):

{
  "Effect": "Allow",
  "Action": "*",
  "Resource": "*"
}

Never use wildcard permissions for AI workloads. If the agent is compromised, it has access to everything.

Right approach:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "ReadCustomerData",
      "Effect": "Allow",
      "Action": [
        "dynamodb:GetItem",
        "dynamodb:Query"
      ],
      "Resource": "arn:aws:dynamodb:ap-southeast-2:123456789:table/Customers"
    },
    {
      "Sid": "InvokeAnthropicAPI",
      "Effect": "Allow",
      "Action": "bedrock:InvokeModel",
      "Resource": "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-5-sonnet*"
    },
    {
      "Sid": "WriteAuditLogs",
      "Effect": "Allow",
      "Action": [
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": "arn:aws:logs:ap-southeast-2:123456789:log-group:/ai-agent/audit:*"
    }
  ]
}

Each statement has a specific purpose. The agent can read customer data, call Claude via Bedrock, and write audit logs — and nothing else.

Separate Roles for Separate Agents

If you're running multiple AI agents (e.g., a customer service agent and an operations agent), give each its own IAM role with permissions appropriate to its function. Don't share roles between agents with different access requirements.

IAM Permission Boundaries

For added safety, use IAM permission boundaries — a maximum permissions policy attached to a role that constrains what the role can ever be granted, even by an administrator.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "dynamodb:*",
        "s3:GetObject",
        "bedrock:InvokeModel",
        "logs:*"
      ],
      "Resource": "*"
    }
  ]
}

This boundary means even if a misconfiguration grants the agent a broader policy, the boundary prevents any action outside these services.

VPC Architecture: Network Isolation

Your AI agent should never have direct internet access if it doesn't need it. Use VPC architecture to enforce network boundaries.

Recommended VPC Layout

VPC (10.0.0.0/16)
├── Public Subnet (10.0.1.0/24)
│   └── Application Load Balancer (internet-facing)
│
├── Private Subnet - App Tier (10.0.2.0/24)
│   └── Lambda / ECS Tasks (AI Agent)
│       ↓ reaches internet via NAT Gateway
│
├── Private Subnet - Data Tier (10.0.3.0/24)
│   └── RDS / ElastiCache / OpenSearch
│       (no internet access, ever)
│
└── VPC Endpoints (private connectivity to AWS services)
    ├── com.amazonaws.region.bedrock-runtime
    ├── com.amazonaws.region.secretsmanager
    ├── com.amazonaws.region.dynamodb
    └── com.amazonaws.region.logs

Key points:

AI agent runs in private subnet — no direct internet exposure
VPC endpoints mean API calls to Bedrock, Secrets Manager, and DynamoDB never leave the AWS network
Data tier has no internet route at all
NAT Gateway provides outbound-only internet access when the agent needs to call external APIs

Security Groups

Define explicit security groups for each tier:

AI Agent Security Group:
  Inbound:  Allow HTTPS from ALB Security Group only
  Outbound: Allow HTTPS to AWS services (via VPC endpoints)
            Allow HTTPS to NAT Gateway (for approved external APIs)
            Deny everything else

Data Tier Security Group:
  Inbound:  Allow port 5432/3306 from AI Agent Security Group only
  Outbound: Deny all

Never allow 0.0.0.0/0 inbound to the AI agent or data tier.

Secrets Manager: No Credentials in Code

The single most common security mistake I see is API keys in environment variables, .env files committed to git, or hardcoded in Lambda environment configuration.

For a Claude AI deployment, you're managing:

Anthropic API key (or Bedrock credentials)
CRM API tokens (Zoho, Salesforce, etc.)
Database credentials
Third-party webhook secrets

All of these belong in AWS Secrets Manager.

Storing Secrets

# Store the Anthropic API key
aws secretsmanager create-secret \
  --name "production/ai-agent/anthropic-api-key" \
  --secret-string '{"api_key": "sk-ant-..."}' \
  --region ap-southeast-2

Enable automatic rotation for secrets that support it. For database passwords, Secrets Manager integrates with RDS to rotate credentials automatically without application downtime.

Retrieving Secrets in Your Application

import {
  SecretsManagerClient,
  GetSecretValueCommand,
} from "@aws-sdk/client-secrets-manager";

const client = new SecretsManagerClient({ region: "ap-southeast-2" });

async function getAnthropicApiKey(): Promise<string> {
  const response = await client.send(
    new GetSecretValueCommand({
      SecretId: "production/ai-agent/anthropic-api-key",
    })
  );
  const secret = JSON.parse(response.SecretString!);
  return secret.api_key;
}

Cache the retrieved secret in memory for the lifetime of the Lambda function instance. Don't call Secrets Manager on every request — that's expensive and adds latency.

IAM Access to Secrets

Your AI agent's IAM role needs specific permission to retrieve its secrets:

{
  "Effect": "Allow",
  "Action": "secretsmanager:GetSecretValue",
  "Resource": "arn:aws:secretsmanager:ap-southeast-2:123456789:secret:production/ai-agent/*"
}

The wildcard at the end of the resource ARN covers all secrets with that prefix — giving the agent access to its own secrets while keeping other application secrets inaccessible.

CloudTrail and Monitoring

Every API call made by your AI agent should be auditable. AWS CloudTrail records all API calls — enable it for all regions with a dedicated S3 bucket and log integrity validation.

What to Alert On

Set up CloudWatch alarms or GuardDuty findings for:

Unusual Secrets Manager access patterns: Agent accessing secrets it doesn't normally need
DynamoDB scan operations: Full table scans suggest the agent is not using queries correctly (or is fishing for data)
Calls to unexpected services: If your agent suddenly calls S3 or SQS when it shouldn't, that's a flag
High error rates on Bedrock calls: Could indicate prompt injection causing malformed requests
IAM role assumption from unexpected source IPs: Especially important if using IAM roles for cross-account access

Structured Logging for AI Decisions

Beyond AWS CloudTrail, log the AI agent's decisions in a structured way:

const auditLog = {
  timestamp: new Date().toISOString(),
  request_id: context.awsRequestId,
  user_session: sessionId,
  tool_calls: [],  // All tools the agent invoked this turn
  tokens_used: response.usage,
  latency_ms: Date.now() - startTime,
  outcome: "success" | "error" | "safety_refusal",
};

// Write to CloudWatch Logs (picked up by CloudTrail)
console.log(JSON.stringify(auditLog));

This gives you a complete picture: what the agent did, what data it accessed, how it responded, and how long it took.

Amazon Bedrock vs Direct Anthropic API

If your AWS deployment is in a region where Amazon Bedrock supports Claude models, Bedrock has security advantages over calling the Anthropic API directly:

No API key management: Bedrock uses your existing AWS credentials (IAM)
VPC endpoint support: Bedrock calls stay within AWS network
AWS CloudTrail integration: All model invocations are logged automatically
Data residency: Data doesn't leave AWS infrastructure (important for compliance)
AWS PrivateLink: Private connectivity with no internet exposure

For Australian businesses, Bedrock in ap-southeast-2 (Sydney) means your AI workloads and data stay in-country.

Before You Ship

I've seen production AI deployments go wrong in predictable ways — usually because security was an afterthought, not part of the initial design. Wildcarded IAM roles get exploited. API keys in environment variables leak through log groups. Agents with unnecessary S3 access get used for data exfiltration during prompt injection attacks.

The patterns above aren't theoretical hardening. They're the baseline I use for every Claude deployment I build for clients. The two that matter most: get IAM right (specific actions, specific resources, no wildcards) and put everything in Secrets Manager. Those two changes eliminate the most common failure modes.

If you're using Amazon Bedrock rather than the Anthropic API directly, you're already ahead — no API key to manage, VPC endpoint support built in, and every model invocation logged automatically to CloudTrail. For Australian businesses with data residency requirements, ap-southeast-2 keeps everything onshore.

One last thing worth saying: don't retrofit this. Every project I've been called into to "add security" after launch takes three times longer and costs more than building it in from the start. The IAM policies, VPC layout, and secrets management take maybe a day to set up properly. The alternative is a production incident.

If you're planning a Claude AI deployment on AWS and want to make sure the architecture is solid before you launch, let's talk.

Mahesh Ramala

AI Specialist · Zoho Authorized Partner · Upwork Top Rated Plus

I build custom AI agents, MCP server integrations, and Zoho automation for businesses across industries. If you found this article useful, let’s connect.

Upwork

Zoho Partner

← All Posts

More from the Blog

AWSSecurity

AWS Security Automation: GuardDuty, Security Hub, and CloudTrail for Continuous Threat Detection

Manual security reviews don't scale. Learn how to combine AWS GuardDuty, Security Hub, and CloudTrail into an automated threat detection and compliance pipeline that catches issues before they become incidents.

Your Self-Hosted Database Is One Failure Away from Disaster — Here's Why You Must Move to Amazon RDS Now

Most teams running databases on EC2 or bare metal have never actually tested a restore. That's not a backup strategy — it's a false sense of security. Here's the uncomfortable truth about self-hosted databases and why switching to RDS or Aurora isn't optional anymore.