Symptoms
You run an AWS CLI command, trigger a Lambda, or deploy via a CI pipeline and instead of the expected output you get something like this:
An error occurred (AccessDenied) when calling the GetObject operation:
User: arn:aws:iam::123456789012:user/infrarunbook-admin is not authorized
to perform: s3:GetObject on resource:
arn:aws:s3:::infrarunbook-backups/configs/prod.tar.gz
Or maybe you're assuming a role and hit this:
An error occurred (AccessDenied) when calling the AssumeRole operation:
User: arn:aws:iam::123456789012:user/infrarunbook-admin is not authorized
to perform: sts:AssumeRole on resource:
arn:aws:iam::987654321098:role/InfraDeployRole
Sometimes the error is clear-cut. More often it isn't. You've attached a policy, you've double-checked the ARN, and it still fails. The maddening part is that IAM evaluates permissions through multiple layers simultaneously — identity policies, resource policies, permission boundaries, service control policies, and session policies all play a role. Any one of them can silently veto an action. This guide walks through every major cause I've encountered and exactly how to diagnose each one.
Root Cause 1: Missing IAM Policy
This is the most common cause and the one people check first — and yet it still catches people out, usually because the policy was attached to the wrong entity. The permission isn't on the user, role, or group you think it is.
Why it happens: AWS IAM defaults to an implicit deny on everything. Unless a policy explicitly allows an action, it's blocked. If you attached a policy to a group but the user was removed from that group, or you created a new role and forgot to attach the execution policy, or you're testing with a role that only has a trust policy and no permissions policy — you'll hit this.
How to identify it: Start by simulating the exact action that's failing using
aws iam simulate-principal-policy. This is the cleanest way to confirm a missing allow without needing to reproduce the error in a live environment.
aws iam simulate-principal-policy \
--policy-source-arn arn:aws:iam::123456789012:user/infrarunbook-admin \
--action-names s3:GetObject \
--resource-arns arn:aws:s3:::infrarunbook-backups/configs/prod.tar.gz
If the permission is missing, the response will show:
{
"EvaluationResults": [
{
"EvalActionName": "s3:GetObject",
"EvalResourceName": "arn:aws:s3:::infrarunbook-backups/configs/prod.tar.gz",
"EvalDecision": "implicitDeny",
"MatchedStatements": []
}
]
}
implicitDenywith empty
MatchedStatementsis the tell. Nothing matched — no allow, no explicit deny. The action simply isn't permitted.
You can also list attached policies directly:
aws iam list-attached-user-policies --user-name infrarunbook-admin
aws iam list-user-policies --user-name infrarunbook-admin
aws iam list-groups-for-user --user-name infrarunbook-admin
How to fix it: Create or attach a policy that explicitly allows the action. If the principal is a role, use
attach-role-policy. For a user, use
attach-user-policy. Here's a minimal inline fix for the S3 case:
aws iam put-user-policy \
--user-name infrarunbook-admin \
--policy-name AllowS3BackupRead \
--policy-document '{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::infrarunbook-backups/configs/*"
}]
}'
In production, prefer managed policies over inline ones so you can reuse and audit them more easily. But for fast diagnosis and fixing in a pinch, inline works.
Root Cause 2: Explicit Deny Overriding Allow
This one is sneaky. You have a perfectly valid allow statement, everything looks right, but the call still fails. The reason: AWS IAM evaluates explicit denies before allows, and an explicit deny always wins. Always. There's no way to override an explicit deny with an allow in the same evaluation chain.
Why it happens: Explicit denies are commonly added as guardrails — to prevent access to specific S3 buckets, to block certain regions, to restrict destructive API calls. Over time these denies can end up in unexpected places: SCPs, permission boundaries, inline policies on groups, or resource-based policies. Someone adds a deny to lock down prod, and six months later a legitimate automation role starts hitting it.
How to identify it: Run
simulate-principal-policyagain, but this time look for
explicitDenyin the evaluation result:
{
"EvaluationResults": [
{
"EvalActionName": "s3:DeleteObject",
"EvalDecision": "explicitDeny",
"MatchedStatements": [
{
"SourcePolicyId": "InlinePolicy-infrarunbook-admin",
"StartPosition": { "Line": 7, "Column": 10 },
"EndPosition": { "Line": 14, "Column": 10 }
}
]
}
]
}
The
MatchedStatementsblock tells you exactly which policy and which lines contain the deny. That's your smoking gun. Go pull the policy and read those lines.
You can also enumerate inline and managed policies manually and inspect them for
"Effect": "Deny"blocks:
aws iam get-user-policy \
--user-name infrarunbook-admin \
--policy-name InlinePolicy-infrarunbook-admin
How to fix it: Either remove the explicit deny if it's no longer appropriate, narrow its scope using a condition (so it only applies to specific users or specific contexts), or exclude the affected principal using a
NotPrincipalblock in a resource policy. Don't try to override it with an allow — that will never work. The deny wins.
{
"Effect": "Deny",
"Action": "s3:DeleteObject",
"Resource": "arn:aws:s3:::infrarunbook-backups/*",
"Condition": {
"ArnNotEquals": {
"aws:PrincipalArn": "arn:aws:iam::123456789012:role/InfraBackupCleanupRole"
}
}
}
This pattern lets the cleanup role do its job while keeping the deny active for everyone else.
Root Cause 3: Resource Policy Conflict
Some AWS services support resource-based policies — S3 bucket policies, KMS key policies, SQS queue policies, SNS topic policies, Secrets Manager secret policies. When both an identity policy and a resource policy exist, AWS evaluates them together. For cross-account access, the resource policy must explicitly allow the other account. For same-account access, either the identity policy or the resource policy being present is enough — but a resource policy with an explicit deny will still block you.
Why it happens: I've seen this most often with S3 and KMS. Someone locks down a bucket with a policy requiring
aws:SecureTransport(HTTPS only) or restricts access to specific VPC endpoints, then an automation role running outside that VPC starts failing. Or a KMS key policy only lists specific roles, and a new role that needs to use that key was never added.
How to identify it: For S3, pull the bucket policy and look for deny statements or for allow statements that don't include your principal:
aws s3api get-bucket-policy \
--bucket infrarunbook-backups \
--query Policy \
--output text | python3 -m json.tool
A typical misconfiguration looks like this — a VPC endpoint restriction that blocks access from IAM users or roles running outside that endpoint:
{
"Effect": "Deny",
"Principal": "*",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::infrarunbook-backups",
"arn:aws:s3:::infrarunbook-backups/*"
],
"Condition": {
"StringNotEquals": {
"aws:SourceVpce": "vpce-0a1b2c3d4e5f67890"
}
}
}
For KMS, check the key policy:
aws kms get-key-policy \
--key-id alias/infrarunbook-prod-key \
--policy-name default \
--output text | python3 -m json.tool
If your role ARN isn't in the
Principalblock of a statement that allows
kms:Decryptor
kms:GenerateDataKey, that's your problem.
How to fix it: Add your principal to the resource policy. For S3:
aws s3api put-bucket-policy \
--bucket infrarunbook-backups \
--policy file://updated-bucket-policy.json
For KMS, update the key policy to include the role that needs access. Never add
"Principal": "*"to a KMS key policy without a tight condition — that's a security hole. Be specific with role ARNs.
Root Cause 4: SCP Blocking Action
Service Control Policies are applied at the AWS Organizations level — to OUs or individual accounts — and they act as a ceiling on what any identity in that account can do, regardless of what IAM policies say. Even the root user in an account can be blocked by an SCP. This is powerful, and it means that troubleshooting sometimes requires escalating beyond the account itself.
Why it happens: Organizations use SCPs to enforce guardrails at scale: block specific regions, prevent disabling CloudTrail, restrict which services can be used in sandbox accounts. The problem is that SCPs are managed at the organization or OU level and developers or account admins often don't have visibility into them. You can have a perfectly configured IAM policy, and an SCP that was put in place six months ago will silently block you.
How to identify it: The error message from an SCP-blocked call usually looks slightly different. You'll often see a reference to the organization or a less specific message:
An error occurred (AccessDenied) when calling the CreateVpc operation:
User: arn:aws:iam::123456789012:user/infrarunbook-admin is not authorized
to perform: ec2:CreateVpc with an explicit deny in a service control policy
Modern AWS error messages will say "explicit deny in a service control policy" — that's a direct indicator. For older APIs that don't surface this, run the simulation tool with the
--caller-arnflag and check whether the evaluation decision is
explicitDenywith an SCP source.
To view the SCPs attached to your account, you need Organizations-level access:
# Find your account's OU
aws organizations list-parents \
--child-id 123456789012
# List SCPs attached to that OU
aws organizations list-policies-for-target \
--target-id ou-xxxx-yyyyyyyy \
--filter SERVICE_CONTROL_POLICY
# Get the policy content
aws organizations describe-policy \
--policy-id p-xxxxxxxx \
--query Policy.Content \
--output text | python3 -m json.tool
Look for deny statements that match the action you're trying to perform.
How to fix it: You can't fix an SCP from within the account it's applied to — you need Organizations admin access. The fix is either to modify the SCP to create an exception, move the account to a different OU with less restrictive policies, or add a condition to the SCP that exempts specific roles. A common pattern is to exempt a breakglass or automation role:
{
"Effect": "Deny",
"Action": "ec2:CreateVpc",
"Resource": "*",
"Condition": {
"ArnNotLike": {
"aws:PrincipalARN": [
"arn:aws:iam::*:role/InfraNetworkAutomationRole"
]
}
}
}
If you don't have Organizations access yourself, you'll need to raise this with whoever manages your AWS organization. Bring the exact policy ID and the action being blocked — it makes the conversation much faster.
Root Cause 5: Condition Key Mismatch
Condition keys are how you make IAM policies dynamic and context-aware. They're also a common source of permission denials that are genuinely difficult to debug because the policy looks correct — the action and resource match — but the condition block silently fails to evaluate to true, resulting in an implicit or explicit deny.
Why it happens: A policy might require
aws:RequestedRegionto be
eu-west-1, but you're calling from
eu-central-1. Or a policy uses
s3:prefixcondition to restrict access to a folder, but the prefix you're requesting doesn't match the pattern. Multi-factor authentication conditions are another classic: a policy requires
aws:MultiFactorAuthPresentto be
true, but you're calling via an access key that wasn't obtained through an MFA-authenticated session.
How to identify it: The
simulate-principal-policycommand supports context entries, which lets you pass simulated condition key values and see how they affect the evaluation. First, run without context to see the baseline:
aws iam simulate-principal-policy \
--policy-source-arn arn:aws:iam::123456789012:user/infrarunbook-admin \
--action-names s3:GetObject \
--resource-arns arn:aws:s3:::infrarunbook-backups/configs/prod.tar.gz
Then run with context values that match what your request would actually send:
aws iam simulate-principal-policy \
--policy-source-arn arn:aws:iam::123456789012:user/infrarunbook-admin \
--action-names s3:GetObject \
--resource-arns arn:aws:s3:::infrarunbook-backups/configs/prod.tar.gz \
--context-entries \
ContextKeyName=aws:MultiFactorAuthPresent,ContextKeyValues=true,ContextKeyType=boolean \
ContextKeyName=aws:RequestedRegion,ContextKeyValues=eu-west-1,ContextKeyType=string
If the first call returns
implicitDenyand the second returns
allowed, the condition key is the cause. The policy is fine — the calling context isn't meeting the condition.
To check which conditions are in the policy directly:
aws iam get-policy-version \
--policy-arn arn:aws:iam::123456789012:policy/InfraS3ReadPolicy \
--version-id v3 \
--query PolicyVersion.Document \
--output text | python3 -m json.tool
Look for the
Conditionblock in each statement. Common gotchas include using
StringEqualswhere you need
StringLike(for wildcard matching), or using the wrong condition key entirely — for example, using
s3:prefixwhen the API you're calling doesn't actually support that condition key for that action.
How to fix it: The fix depends on whether the condition is being evaluated incorrectly or whether the caller isn't meeting a legitimate requirement. If the condition is too strict or uses the wrong operator, update the policy:
{
"Effect": "Allow",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::infrarunbook-backups/*",
"Condition": {
"StringLike": {
"s3:prefix": ["configs/*", "logs/*"]
}
}
}
If the condition is intentional — for example MFA is required — the caller needs to obtain credentials properly. For a CLI session that requires MFA, use
sts get-session-token:
aws sts get-session-token \
--serial-number arn:aws:iam::123456789012:mfa/infrarunbook-admin \
--token-code 123456
# Export the returned credentials
export AWS_ACCESS_KEY_ID=ASIA...
export AWS_SECRET_ACCESS_KEY=...
export AWS_SESSION_TOKEN=...
Now your session includes MFA context and the condition will evaluate to true.
Root Cause 6: Permission Boundary Blocking
Permission boundaries are an IAM feature that lets administrators set a hard ceiling on what permissions a role or user can have, regardless of what policies are attached. They're commonly used when delegating IAM management to developers — you let them create roles, but only roles that can't exceed the boundary you've defined. In my experience, they're the least understood IAM concept and the source of some genuinely confusing permission errors.
Why it happens: A permission boundary limits the effective permissions to the intersection of the identity policy and the boundary. If the boundary doesn't include
s3:PutObject, then no matter what the identity policy says,
s3:PutObjectis blocked. Developers who have the ability to create roles sometimes set the wrong boundary, or the boundary was created conservatively and nobody updated it when requirements expanded.
How to identify it: Check whether the role has a permission boundary attached:
aws iam get-role \
--role-name InfraDeployRole \
--query Role.PermissionsBoundary
{
"PermissionsBoundaryArn": "arn:aws:iam::123456789012:policy/DeveloperBoundary",
"PermissionsBoundaryType": "Policy"
}
If there's a boundary, fetch it and check whether it allows the action that's failing:
aws iam get-policy-version \
--policy-arn arn:aws:iam::123456789012:policy/DeveloperBoundary \
--version-id v1 \
--query PolicyVersion.Document \
--output text | python3 -m json.tool
The
simulate-principal-policytool does account for boundaries in its evaluation, so if the result is
implicitDenyand the identity policy looks correct, the boundary is a strong candidate.
How to fix it: Update the permission boundary to include the missing action. This typically requires an IAM admin, not the developer who created the role. Treat permission boundaries like a policy for policies — they need to be maintained as application permissions evolve.
Root Cause 7: Assuming the Wrong Role or Using Stale Credentials
This is less of an IAM misconfiguration and more of an operational mistake, but it causes enough
AccessDeniederrors in practice that it belongs here. The credentials in your environment — whether from
~/.aws/credentials, environment variables, or the EC2 instance metadata service — may be for a different principal than you think, or they may have expired mid-session.
Why it happens: Engineers working across multiple AWS accounts or with multiple profiles regularly hit this. A pipeline assumes a role during initialization but uses cached credentials that have since expired. An EC2 instance has an instance profile attached, but the wrong profile — and the instance metadata credentials are for a restricted role, not the deployment role. Or a developer has
AWS_ACCESS_KEY_IDset in their shell environment, which overrides their intended profile.
How to identify it: Always verify the actual identity in use before digging into policies:
aws sts get-caller-identity
{
"UserId": "AROAXXXXXXXXXXXXXXXXX:session-name",
"Account": "123456789012",
"Arn": "arn:aws:sts::123456789012:assumed-role/WrongRole/session-name"
}
If the ARN doesn't match what you expected, that's your answer. Check your environment variables, your AWS config profile, and any credential helper scripts that might be setting credentials automatically.
How to fix it: Clear stale credentials from the environment, switch to the correct profile, or re-run your role assumption flow. For EC2 and ECS, confirm the correct instance profile or task role is attached via the console or CLI:
aws ec2 describe-instances \
--instance-ids i-0abc1234def567890 \
--query 'Reservations[0].Instances[0].IamInstanceProfile'
Prevention
Most IAM permission errors are preventable with a few habits that don't add much overhead but save a lot of debugging time.
Use
aws iam simulate-principal-policyas a standard part of your deployment testing. Before you push a new role or policy to production, simulate the exact actions it needs to perform and confirm they return
allowed. This catches issues before they cause outages rather than after.
Enable AWS CloudTrail and specifically the
LookupEventsAPI for
errorCodefiltering. When a permission error happens in production, CloudTrail gives you the exact principal, action, resource, and condition context — all the information the policy simulator needs. Without CloudTrail you're guessing; with it you're diagnosing.
aws cloudtrail lookup-events \
--lookup-attributes AttributeKey=EventName,AttributeValue=GetObject \
--query 'Events[?ErrorCode==`AccessDenied`].[EventTime,Username,Resources]'
Apply least privilege from the start, but use AWS Access Analyzer to identify what permissions are actually being used versus what's been granted. Access Analyzer's unused access findings will show you policies that are overly permissive, which makes future auditing and debugging much faster because there's less noise.
Document your SCP structure and make it accessible to account owners. One of the most time-consuming permission errors I've seen in large organizations is developers spending hours debugging IAM policies when the actual block is an SCP they didn't know existed. A simple wiki page listing which SCPs apply to which OUs prevents a lot of that wasted time.
Finally, use IAM tags and naming conventions consistently. When a role name is
InfraDeployRole-prodversus
InfraDeployRole-stagingit's immediately obvious when you're operating with the wrong one.
aws sts get-caller-identitybecomes a reflex check at the start of any troubleshooting session — make it a habit.
