Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EMR Cluster Creation Fails Intermittently Due to Missing depends_on Dependency for Security Group Rules #34

Open
1 task done
kedar9696 opened this issue Jan 12, 2025 · 1 comment

Comments

@kedar9696
Copy link

Description

The EMR cluster creation occasionally fails due to a dependency issue related to security group rules. The error occurs randomly with the following log:

Error: waiting for EMR Cluster (j-IGN70IGMZ1W8) create: unexpected state 'TERMINATED_WITH_ERRORS', wanted target 'RUNNING, WAITING'. last error: VALIDATION_ERROR: ServiceAccessSecurityGroup is missing ingress rule from EmrManagedMasterSecurityGroup on port 9443

  with module.emr[0].aws_emr_cluster.this[0],
  on .terraform/modules/emr/main.tf line 26, in resource "aws_emr_cluster" "this":
  26: resource "aws_emr_cluster" "this" {

The issue is resolved by explicitly adding a depends_on parameter to ensure the EMR cluster waits for the security group rules to be created. The current configuration does not account for this dependency.

  • ✋ I have searched the open/closed issues, and my issue is not listed.

⚠️ Note

Before submitting this issue, I performed the following:

  1. Removed the local .terraform directory: rm -rf .terraform/
  2. Re-initialized the project root to pull down modules: terraform init
  3. Re-attempted terraform apply and confirmed the issue persists without the suggested change.

Versions

  • Module version [Required]: 2.3.0
  • Terraform version: 1.10.0
  • Provider version(s): 5.83.0

Reproduction Code [Required]

The issue occurs under the following conditions:

  1. Use the EMR module with a setup similar to the example below.
  2. Attempt to create an EMR cluster.

Reproduction Configuration:

module "emr" {
  source  = "terraform-aws-modules/emr/aws"
  version = "2.3.0"

  name                  = "example-emr-cluster"
  release_label         = "emr-6.10.0"
  applications          = ["Spark", "Hadoop"]
}

Expected behavior

The EMR cluster should successfully reach the RUNNING or WAITING state without errors.

Actual behavior

Cluster creation intermittently fails with the error: VALIDATION_ERROR: ServiceAccessSecurityGroup is missing ingress rule from EmrManagedMasterSecurityGroup on port 9443.

Terminal Output Screenshot(s)

N/A

Additional context

The issue is resolved by adding a dependency to the aws_emr_cluster resource block in the module's main.tf file. The following change fixes the issue:

  depends_on = [
    aws_iam_role_policy_attachment.service,
    aws_iam_role_policy_attachment.service_pass_role,
    aws_iam_role_policy_attachment.instance_profile,
    aws_iam_role_policy_attachment.autoscaling,
    aws_security_group_rule.service
  ]

This ensures the EMR cluster creation waits for the aws_security_group_rule resource to complete, avoiding the race condition.

Note : Please incorporate this fix into the registry code and release a new version of the module so users can use it without encountering this issue.

@kedar9696
Copy link
Author

@bryantbiggs could you please check this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant