Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add alarm for database migration failure #1728

Merged
merged 5 commits into from
Jan 21, 2025
Merged

Conversation

sastels
Copy link
Contributor

@sastels sastels commented Jan 20, 2025

Summary | Résumé

Add a critical alarm for a database migration failure. The required logging is added in this PR.

Related Issues | Cartes liées

Before merging this PR

Read code suggestions left by the
cds-ai-codereviewer bot. Address
valid suggestions and shortly write down reasons to not address others. To help
with the classification of the comments, please use these reactions on each of the
comments made by the AI review:

Classification Reaction Emoticon
Useful +1 👍
Noisy eyes 👀
Hallucination confused 😕
Wrong but teachable rocket 🚀
Wrong and incorrect -1 👎

The classifications will be extracted and summarized into an analysis of how helpful
or not the AI code review really is.

Test instructions | Instructions pour tester la modification

None

Release Instructions | Instructions pour le déploiement

None.

Reviewer checklist | Liste de vérification du réviseur

  • This PR does not break existing functionality.
  • This PR does not violate GCNotify's privacy policies.
  • This PR does not raise new security concerns. Refer to our GC Notify Risk Register document on our Google drive.
  • This PR does not significantly alter performance.
  • Additional required documentation resulting of these changes is covered (such as the README, setup instructions, a related ADR or the technical documentation).

⚠ If boxes cannot be checked off before merging the PR, they should be moved to the "Release Instructions" section with appropriate steps required to verify before release. For example, changes to celery code may require tests on staging to verify that performance has not been affected.

@sastels sastels requested a review from jimleroyer as a code owner January 20, 2025 20:15
Copy link

Updating alarms ⏰? Great! Please update the Google Sheet and add a 👍 to this message after 🙏

alarm_name = "db-migration-failure-warning"
alarm_description = "The database migration in the api k8s pods has failed"
comparison_operator = "GreaterThanOrEqualToThreshold"
evaluation_periods = "1"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The evaluation_periods should be an integer, not a string. Change "1" to 1.

@sastels sastels changed the title add alarm for database migration failure Add alarm for database migration failure Jan 20, 2025
@sastels sastels requested a review from a team January 20, 2025 21:28
Copy link

Updating alarms ⏰? Great! Please update the Google Sheet and add a 👍 to this message after 🙏

Copy link
Contributor

@ben851 ben851 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

staging: eks

✅   Terraform Init: success
✅   Terraform Validate: success
✅   Terraform Format: success
✅   Terraform Plan: success
✅   Conftest: success

Plan: 2 to add, 0 to change, 0 to destroy
Show summary
CHANGE NAME
add aws_cloudwatch_log_metric_filter.db-migration-failure[0]
aws_cloudwatch_metric_alarm.db-migration-failure-critical[0]
Show plan
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # aws_cloudwatch_log_metric_filter.db-migration-failure[0] will be created
  + resource "aws_cloudwatch_log_metric_filter" "db-migration-failure" {
      + id             = (known after apply)
      + log_group_name = "/aws/containerinsights/notification-canada-ca-staging-eks-cluster/application"
      + name           = "db-migration-failure"
      + pattern        = "database migration failed"

      + metric_transformation {
          + name      = "db-migration-failure"
          + namespace = "LogMetrics"
          + unit      = "None"
          + value     = "1"
        }
    }

  # aws_cloudwatch_metric_alarm.db-migration-failure-critical[0] will be created
  + resource "aws_cloudwatch_metric_alarm" "db-migration-failure-critical" {
      + actions_enabled                       = true
      + alarm_actions                         = [
          + "arn:aws:sns:ca-central-1:239043911459:alert-critical",
        ]
      + alarm_description                     = "The database migration running in the api k8s pods has failed"
      + alarm_name                            = "db-migration-failure-critical"
      + arn                                   = (known after apply)
      + comparison_operator                   = "GreaterThanOrEqualToThreshold"
      + evaluate_low_sample_count_percentiles = (known after apply)
      + evaluation_periods                    = 1
      + id                                    = (known after apply)
      + metric_name                           = "db-migration-failure"
      + namespace                             = "LogMetrics"
      + period                                = 60
      + statistic                             = "Sum"
      + tags_all                              = (known after apply)
      + threshold                             = 1
      + treat_missing_data                    = "notBreaching"
    }

Plan: 2 to add, 0 to change, 0 to destroy.

─────────────────────────────────────────────────────────────────────────────

Saved the plan to: plan.tfplan

To perform exactly these actions, run the following command to apply:
    terraform apply "plan.tfplan"
Show Conftest results
WARN - plan.json - main - Cloudwatch log metric pattern is invalid: ["aws_cloudwatch_log_metric_filter.celery-error[0]"]
WARN - plan.json - main - Cloudwatch log metric pattern is invalid: ["aws_cloudwatch_log_metric_filter.scanfiles-timeout[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_acm_certificate.client_vpn"]
WARN - plan.json - main - Missing Common Tags: ["aws_acm_certificate.notification-canada-ca"]
WARN - plan.json - main - Missing Common Tags: ["aws_acm_certificate.notification-canada-ca-alt[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_alb.notification-canada-ca"]
WARN - plan.json - main - Missing Common Tags: ["aws_alb_listener.internal_alb_tls"]
WARN - plan.json - main - Missing Common Tags: ["aws_alb_listener.notification-canada-ca"]
WARN - plan.json - main - Missing Common Tags: ["aws_alb_target_group.internal_nginx_http"]
WARN - plan.json - main - Missing Common Tags: ["aws_alb_target_group.notification-canada-ca-admin"]
WARN - plan.json - main - Missing Common Tags: ["aws_alb_target_group.notification-canada-ca-api"]
WARN - plan.json - main - Missing Common Tags: ["aws_alb_target_group.notification-canada-ca-document"]
WARN - plan.json - main - Missing Common Tags: ["aws_alb_target_group.notification-canada-ca-document-api"]
WARN - plan.json - main - Missing Common Tags: ["aws_alb_target_group.notification-canada-ca-documentation"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_log_group.blazer[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_log_group.notification-canada-ca-eks-application-logs[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_log_group.notification-canada-ca-eks-cluster-logs[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_log_group.notification-canada-ca-eks-prometheus-logs[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.admin-evicted-pods[0]"]
WARN - plan.json - main - Missing Common Tags:...

@sastels sastels merged commit c14535a into main Jan 21, 2025
29 checks passed
@sastels sastels deleted the db-migration-failure-alarm branch January 21, 2025 19:37
@ben851 ben851 mentioned this pull request Jan 22, 2025
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants