-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
19530 move copy local path based pvs outside of hwn01 or make basemap stateless #1409
base: main
Are you sure you want to change the base?
19530 move copy local path based pvs outside of hwn01 or make basemap stateless #1409
Conversation
Also, new instances config is added. For details about new format see https://kontur.fibery.io/favorites/Tasks/document/1258#Tasks/document/Managing-PGO-instances-with-kustomize-1388 Purpose: test new instance management schema with event-api database Desired effect: new instance launched on hwn04, replicating from master
…01.02) Prerequisites: * hwn04 must catch up to current master * instance ID and trigger-switchover must be actualized in config See more: https://kontur.fibery.io/Tasks/document/Managing-PGO-instances-with-kustomize-1388/anchor=Switchover--4153b0a2-c4ff-441a-8913-158a3caac361
…tances config to final format (#19530 step 01.04) Prerequisites: wait until hwn03 will catch up on master (so we'll have healthy replicated setup after this patch is applied) Desired effect: instance set '01' is eliminated, cluster runs with two named instances created on previous steps
Also, new instances config is added. For details about new format see https://kontur.fibery.io/favorites/Tasks/document/1258#Tasks/document/Managing-PGO-instances-with-kustomize-1388
….02) * For the sake of dev/test/parity, and also least-disrputive test of new config
…rom hwn01 (#19530 step 03.03) Desired effect: * Deployment rollout is triggered due to spec.template change * New POD is provisioned outside of hwn01
…rom hwn01 (#19530 step 03.04) Desired effect: * Deployment rollout is triggered due to spec.template change * New POD is provisioned outside of hwn01
Modifying Helm chart in backwards-compatible manner, without immediate effect (therefore version not updated)
…01 (#19530, step 04.02) Desired effect: * Deployment rollout triggered by changes in spec.template * New pod launched outside of hwn01
… hwn01 (#19530, step 04.03) Desired effect: * Deployment rollout triggered by changes in spec.template * New pod launched outside of hwn01
* This step serves as least-disruptive test of basemap-generator node affinity, and is also required for dev/test/prod parity
….02) * One more least-disruptive testing change of basemap-generator config update, also needed for dev/test/prod parity
… 07.01) Prerequisites: * hwn04 must catch up to current master * instance ID and trigger-switchover must be actualized in config Documentation: https://kontur.fibery.io/Tasks/document/Managing-PGO-instances-with-kustomize-1388/anchor=Switchover--4153b0a2-c4ff-441a-8913-158a3caac361 Desired effect: hwn04 becomes new master
… 07.02) Prerequisites: ensure previously suspended heavy cronjob (basemap-generator-dev) is not running on target instance hwn03 (kill it if still around) Purpose: migrating to multi-instances configuration Desired effect: new slave starts on hwn03, replicating from hwn04 Tradeoff: legacy instance set '01' is still running on hwn01/hwn02 Watch out: replication lag and WALs accumulation on master (hwn04)
…stances config to final format (#19530 step 07.03) Purpose: migrating to multi-instances configuration Prerequisites: wait until hwn03 will catch up on master (so we'll have healthy replicated setup after this patch is applied) Desired effect: instance set '01' is eliminated, cluster runs with two named instances on hwn04 (master) and hwn03 (replica)
…>hwn02, unsuspend (#19530 step 08.01) * Context: hwn01 is being prepared for draining, repacking workloads on 3 remaining nodes * Temporary solution: all basemap-generators are isolated on hwn02 to avoid interference with other workloads * Prerequisites: event-api-db-prod is eliminated from hwn02 (see previous steps)
…n01->hwn02 (#19530 step 08.02) * Context: hwn01 is being prepared for draining, repacking workloads on 3 remaining nodes * Temporary solution: all basemap-generators are isolated on hwn02 to avoid interference with other workloads * Prerequisites: event-api-db-prod is eliminated from hwn02 (see previous steps) * Desired results: * cronjob is scheduled to run on hwn02, no more bound to pvc * hwn01 could be safely drained!
WalkthroughThe changes involve updates to various Kubernetes configurations, primarily focusing on the introduction of node affinity specifications for Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
🧹 Outside diff range and nitpick comments (25)
helm/raster-tiler/Chart.yaml (1)
10-10
: Version increment looks good.The chart version has been appropriately incremented from 0.0.77 to 0.0.79, which is a good practice to reflect changes in the chart or its templates.
Note: The version jumped from 0.0.77 to 0.0.79, skipping 0.0.78. While this is not an issue, it might be worth clarifying if this was intentional or if there was an intermediate version that wasn't committed.
helm/basemap/values/values-prod.yaml (1)
Line range hint
1-26
: Clarify the overall strategy for stateless architectureThe changes to both the tileserver and basemap components to use
emptyDir
storage in the production environment represent a significant shift towards a stateless architecture. While this aligns with the PR objective, it's important to address the following points:
Data Persistence: Ensure that all critical data is either stored in S3 or can be regenerated as needed. Any data that was previously persisted locally will now be lost on pod restarts.
Performance Implications: Consider if there are any performance impacts from potentially increased network calls to S3, especially for frequently accessed data.
Disaster Recovery: Verify that the disaster recovery strategy has been updated to account for the lack of local persistent storage.
Scaling and High Availability: This change could potentially improve the ability to scale and provide high availability, as pods become more interchangeable.
Could you provide more context on the overall strategy behind this move to a stateless architecture? Specifically:
- Are there plans to implement caching mechanisms to mitigate potential performance impacts?
- How will this change affect the application's startup time and behavior during S3 outages?
- Are there any components that still require persistent storage, and if so, how will they be handled?
To get a broader view of the storage usage across the cluster, you can run:
#!/bin/bash # List all PersistentVolumeClaims in the production namespace kubectl get pvc -n $(kubectl get namespaces -o jsonpath='{.items[*].metadata.name}' | tr ' ' '\n' | grep prod) # List all pods with their volume information kubectl get pods -n $(kubectl get namespaces -o jsonpath='{.items[*].metadata.name}' | tr ' ' '\n' | grep prod) -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{range .spec.volumes[*]}{.name}{": "}{.emptyDir.medium}{"\n"}{end}{"\n"}{end}'This information will help ensure that the move to a stateless architecture is comprehensive and well-planned across all components.
helm/basemap/values/values-test.yaml (1)
Line range hint
1-26
: Review overall system architecture and data flowThe changes to both tileserver and basemap components to use emptyDir storage represent a significant shift in the system's architecture. While this aligns with the PR objective of making the basemap stateless, it necessitates a comprehensive review of the entire system's data flow and storage strategy.
Consider the following architectural aspects:
- Data Flow: Review and document how data now flows through the system, from initial ingestion to final storage in S3.
- Caching Strategy: Evaluate if any caching mechanisms are needed to maintain performance, given the lack of persistent local storage.
- Failure Recovery: Ensure the system can recover gracefully if a pod restarts, potentially needing to re-download or regenerate data.
- S3 Integration: Verify that the S3 integration is robust enough to handle potentially increased read/write operations.
- Database Usage: Clarify the role of the database in this new architecture, ensuring it's being used effectively alongside S3 and emptyDir storage.
- Performance Implications: Conduct thorough performance testing to ensure the new architecture meets required performance standards.
I recommend creating or updating system architecture diagrams to reflect these changes, which will help the team understand and maintain the new stateless design.
helm/raster-tiler/values.yaml (1)
20-20
: LGTM! Consider adding a comment for clarity.The addition of
storageType: pvc
is appropriate and aligns with the PR objective. It's correctly placed within the storage configuration section and properly indented.Consider adding a brief comment above this line to explain the purpose and possible values of
storageType
. For example:# Storage type for the application (e.g., 'pvc' for PersistentVolumeClaim) storageType: pvcThis would enhance the configuration's self-documentation and make it easier for other developers to understand and modify if needed.
helm/raster-tiler/templates/pvc.yaml (2)
Line range hint
4-15
: LGTM: Comprehensive metadata with room for minor improvement.The metadata section is well-structured and follows Kubernetes best practices:
- Annotations for Helm release information are correctly set.
- Labels provide comprehensive information for resource management.
- Dynamic configuration using Helm variables is implemented effectively.
Consider adding a
helm.sh/chart
label to identify the specific chart version:labels: helm.sh/chart: {{ .Chart.Name }}-{{ .Chart.Version | replace "+" "_" }}This can be helpful for tracking which version of the chart created this resource.
🧰 Tools
🪛 yamllint
[error] 2-2: syntax error: expected '', but found ''
(syntax)
Line range hint
1-25
: Summary: Well-implemented PVC template aligning with PR objectives.This new PVC template effectively addresses the PR objective of improving storage management:
- The conditional creation based on
.Values.storageType
allows for flexible deployment configurations.- The use of Helm variables throughout the template enables easy customization across different environments.
- The structure follows Kubernetes best practices for PVC definitions.
These changes contribute to making the basemap more configurable and potentially stateless, depending on how the PVC is utilized in the broader context of the application.
To fully achieve the PR objectives:
- Ensure that other components in the helm chart are updated to use this PVC when appropriate.
- Update the documentation to reflect the new storage configuration options.
- Consider adding a section in the
values.yaml
file to clearly define the available storage options and their implications.As you continue to work on making the basemap stateless, consider:
- Evaluating if any data currently stored in the PVC can be moved to a separate stateful service.
- Implementing caching mechanisms to reduce reliance on persistent storage.
- Designing the application to function with ephemeral storage where possible, using the PVC only for critical, non-reproducible data.
🧰 Tools
🪛 yamllint
[error] 2-2: syntax error: expected '', but found ''
(syntax)
helm/basemap/Chart.yaml (1)
20-21
: Reminder: Importance of image tag usage for Flux automation.The comment about not using
appVersion
and instead using{{ .Values.images...tag's }}
is crucial for Flux automation. This approach allows different stages to have different versions within the same branch watched by Flux. Ensure that this practice is consistently followed across all relevant files and deployments.flux/clusters/k8s-01/raster-tiler/dev/kustomization.yaml (1)
Line range hint
1-34
: Summary: Node affinity patch added to dev environment configurationThe changes to this Kustomization file are focused and align with the PR objectives. The addition of the node affinity patch for the 'dev-raster-tiler' Deployment ensures that pods are not scheduled on the 'hwn01.k8s-01.kontur.io' node, effectively moving resources outside of hwn01 as intended.
The existing configuration, including the HelmRelease patch, remains unchanged, which helps maintain stability in other aspects of the deployment. This targeted approach to implementing the required changes is commendable.
Consider documenting this node affinity rule in your project's architecture or operations guide. This will help maintain consistency across environments and inform future scaling decisions.
flux/clusters/k8s-01/raster-tiler/geocint/kustomization.yaml (1)
Line range hint
1-34
: Overall, the changes look good and align with the PR objectives.The modifications to the HelmRelease and the addition of the nodeAffinity configuration for the Deployment are well-structured and follow Kubernetes best practices. These changes appear to support the goal of moving copy local path based PVs outside of hwn01 or making the basemap stateless.
However, to ensure these changes don't introduce any unforeseen issues:
- Verify the existence and content of the new Helm values files.
- Check the impact of the new nodeAffinity rules on service availability and performance.
- Review the overall node utilization after these changes are applied.
Consider documenting the rationale behind these changes in the project documentation or README. This will help future maintainers understand the design decisions made here.
helm/basemap/templates/pvc.yaml (4)
Line range hint
1-15
: LGTM! Consider adding storage class specification.The tileserver PVC definition looks good. The conditional creation based on storage type, environment-specific naming, and consistent resource allocation are all well implemented.
Consider adding a
storageClassName
field to the PVC specification. This allows for more control over the type of storage provisioned. For example:spec: storageClassName: {{ .Values.tileserver.storageClass | default "standard" }} accessModes: - ReadWriteOnce # ... rest of the specThis change would require adding a
storageClass
field to thetileserver
section in your values file, with a default value to maintain backwards compatibility.🧰 Tools
🪛 yamllint
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
Line range hint
17-31
: LGTM! Consider adding storage class specification.The basemap processor PVC definition is well-structured and consistent with the tileserver PVC. The conditional creation and environment-specific naming are appropriately implemented.
As with the tileserver PVC, consider adding a
storageClassName
field:spec: storageClassName: {{ .Values.basemap.storageClass | default "standard" }} accessModes: - ReadWriteOnce # ... rest of the specThis change would require adding a
storageClass
field to thebasemap
section in your values file, with a default value to maintain backwards compatibility.
Line range hint
32-45
: LGTM! Consider separate conditional and storage class specification.The Postgres PVC definition is well-structured and consistent with the other PVCs. However, there are two suggestions for improvement:
Consider creating a separate conditional block for the Postgres PVC. This allows for more flexibility in configuring storage types independently for each component.
As with the other PVCs, consider adding a
storageClassName
field:Here's a suggested implementation incorporating both improvements:
{{- if eq .Values.postgres.storageType "pvc" }} apiVersion: v1 kind: PersistentVolumeClaim metadata: name: {{ .Values.envName }}-basemap-postgres-pvc namespace: {{ .Release.Namespace }} spec: storageClassName: {{ .Values.postgres.storageClass | default "standard" }} accessModes: - ReadWriteOnce resources: requests: storage: {{ .Values.pvc.postgresStorageSize }} limits: storage: {{ .Values.pvc.postgresStorageSize }} {{- end }}This change would require adding
storageType
andstorageClass
fields to thepostgres
section in your values file, with default values to maintain backwards compatibility.
Line range hint
1-46
: Overall structure is good, but consider consistent conditionals and ignore YAML lint error.The overall structure of the file is consistent and well-organized. However, there are two points to consider:
For consistency and flexibility, consider using separate conditional blocks for each PVC. This allows for independent configuration of storage types for each component (tileserver, processor, and Postgres).
The YAML syntax error reported by yamllint at the beginning of the file is likely a false positive due to the Helm template syntax. You can safely ignore this error, as it's common for YAML linters to misinterpret Helm template directives.
To improve consistency, consider restructuring the file like this:
{{- if eq .Values.tileserver.storageType "pvc" }} # Tileserver PVC definition {{- end }} {{- if eq .Values.basemap.storageType "pvc" }} # Basemap processor PVC definition {{- end }} {{- if eq .Values.postgres.storageType "pvc" }} # Postgres PVC definition {{- end }}This structure allows for more granular control over each PVC's creation conditions.
flux/clusters/k8s-01/basemap/prod/kustomization.yaml (1)
Line range hint
1-51
: Summary: Kustomization changes align with PR objectives, but consider overall cluster strategy.The modifications to this Kustomization file successfully implement node affinity rules that align with the PR objective of moving copy local path based PVs outside of hwn01 or making basemap stateless. The changes include:
- Retaining the existing HelmRelease configuration.
- Adding a node anti-affinity rule for the 'prod-basemap-tileserver' Deployment to avoid 'hwn01'.
- Adding a node affinity rule for the 'prod-basemap' CronJob to require 'hwn02'.
These changes will affect how these resources are scheduled in the cluster. While they achieve the immediate goal, it's important to consider the following:
- The overall load distribution across the cluster nodes.
- The fault tolerance of the system, especially for the CronJob that's tied to a specific node.
- The long-term strategy for resource allocation and whether this approach scales well as the cluster grows or changes.
Consider documenting the reasons for these specific node affinity rules and how they fit into the broader cluster management strategy. This will help maintain clarity as the system evolves and new team members onboard.
Additionally, it might be beneficial to set up monitoring or alerts to ensure that these scheduling constraints don't lead to resource contention issues in the future.
flux/clusters/k8s-01/basemap/test/kustomization.yaml (1)
19-50
: Address the temporary nature of the solution and consider long-term architectural improvements.The changes implemented in this PR effectively move workloads away from hwn01, which aligns with one part of the PR objective. However, there are several architectural considerations to address:
- Both patches are marked as temporary solutions, indicating a need for a more permanent fix.
- The alternative objective of making basemap stateless is not addressed by these changes.
- The current approach of using node-specific affinity rules may not be the most scalable or maintainable solution in the long run.
Consider the following recommendations for improving the overall architecture:
- Develop a plan to transition from these temporary node affinity rules to a more robust, scalable solution.
- Investigate the possibility of making basemap stateless, as mentioned in the PR objective. This could potentially eliminate the need for node-specific scheduling altogether.
- If persistent volumes are the root cause of node-specific requirements, consider implementing a storage solution that allows for more flexible PV management across nodes.
- Document the current limitations and the reasoning behind these temporary changes to facilitate future improvements.
Would you like assistance in drafting a plan for transitioning to a more permanent solution or exploring the option of making basemap stateless?
helm/basemap/values.yaml (1)
26-30
: Summary: Storage improvements and job control enhancementsThe changes in this file collectively improve the storage configuration for both the tileserver and basemap components, aligning with the PR objective of enhancing persistent volume handling. The explicit setting of job suspension for the basemap adds clarity to its operational behavior.
Key points to consider:
- Ensure that the PVC configurations in the templates match these new settings.
- Verify that the storage sizes defined in the
pvc
section are appropriate for both components.- Consider documenting these changes in the chart's README or CHANGELOG.
- Test the deployment in a non-production environment to ensure smooth transitions, especially regarding data persistence and job scheduling.
Given these changes, it might be beneficial to review the overall storage strategy for the application. Consider:
- Implementing a consistent approach to storage type configuration across all components.
- Evaluating the impact on backup and restore procedures, if any.
- Assessing whether these changes affect the application's scalability or cloud provider portability.
flux/clusters/k8s-01/event-api-db/overlays/DEV/postgrescluster-instances.yaml (3)
2-2
: Remove trailing spaces to improve code cleanliness.There are trailing spaces on lines 2, 12, 13, and 19. While these don't affect functionality, removing them improves code cleanliness and adheres to best practices.
You can remove these trailing spaces to improve code cleanliness. Most modern text editors have settings to automatically trim trailing whitespace.
Also applies to: 12-13, 19-19
🧰 Tools
🪛 yamllint
[error] 2-2: trailing spaces
(trailing-spaces)
53-71
: Approve switchover configuration with suggestions for improvement.The addition of the switchover section is a good practice for enabling high availability in your PostgreSQL cluster. A few suggestions for improvement:
Add a clear warning about the importance of updating the placeholder values (OVERRIDEME) before applying this configuration in any environment.
Consider documenting the complete switchover process in a separate document and add a link to it in these comments. This would include steps like:
- How to determine the fully-qualified instance id of the new master
- How to check if the new master is in sync with the current master
- The process of updating the annotation to trigger the switchover
Add a comment explaining the significance of the
postgres-operator.crunchydata.com/trigger-switchover
annotation and how it should be used in practice.Here's a suggested addition to the comments:
# WARNING: Ensure all OVERRIDEME placeholders are replaced with appropriate values before applying this configuration. # For a detailed guide on performing a switchover, see: [Add link to your switchover documentation here]Would you like assistance in creating a detailed switchover guide document?
1-71
: Summary of review: Improvements with areas needing attentionThis configuration file introduces significant changes to the PostgreSQL cluster setup, including detailed instance configurations and switchover functionality. While these changes generally improve the setup, there are several areas that require attention:
Resource Allocation: The current CPU and memory limits seem exceptionally high, especially for a DEV environment. Consider reviewing and adjusting these based on actual usage patterns.
Storage Configuration: There's a large discrepancy between the allocated storage (100Gi) and the commented empirical recommendation (2500Gi). This needs to be addressed to ensure appropriate storage provisioning.
Node Affinity: The strict node affinity might limit flexibility. Consider using less strict affinity rules for better resilience and resource utilization.
Switchover Functionality: The addition of switchover capability is positive, but ensure all placeholder values are replaced and the process is well-documented.
Code Cleanliness: Minor issues like trailing spaces should be addressed for better maintainability.
Overall, these changes provide a good foundation for a robust PostgreSQL setup, but careful consideration of the highlighted areas will further improve the configuration's efficiency and reliability.
To ensure this configuration scales well across different environments (DEV, TEST, PROD), consider:
- Using environment variables or helm chart values for resource allocations and storage sizes.
- Implementing a strategy for gradually increasing resources and storage as you move from DEV to PROD.
- Developing a comprehensive guide for managing and maintaining this PostgreSQL setup across different environments.
🧰 Tools
🪛 yamllint
[error] 2-2: trailing spaces
(trailing-spaces)
[error] 12-12: trailing spaces
(trailing-spaces)
[error] 13-13: trailing spaces
(trailing-spaces)
[error] 19-19: trailing spaces
(trailing-spaces)
flux/clusters/k8s-01/event-api-db/overlays/PROD/postgrescluster-instances.yaml (2)
8-19
: Review resource allocations and consider removing outdated comments.The resource allocations, particularly for memory (512Gi limit), seem quite high. While this may be necessary for your use case, it's important to monitor actual usage to ensure efficient resource utilization.
Consider removing or updating the commented out "empirical recommendations" if they are no longer relevant. This will improve code readability. If they are still relevant, consider adding a comment explaining their purpose and when they might be used.
The use of YAML anchors for resources and storage specifications is a good practice for maintaining consistency and reducing duplication.
🧰 Tools
🪛 yamllint
[error] 12-12: trailing spaces
(trailing-spaces)
[error] 13-13: trailing spaces
(trailing-spaces)
[error] 19-19: trailing spaces
(trailing-spaces)
2-2
: Minor: Remove trailing spaces.The static analysis tool detected trailing spaces on lines 2, 12, 13, and 19. While this doesn't affect functionality, removing them improves code cleanliness.
You can remove these trailing spaces to improve the overall code quality. Many text editors have options to automatically trim trailing whitespace, which can help prevent this issue in the future.
Also applies to: 12-13, 19-19
🧰 Tools
🪛 yamllint
[error] 2-2: trailing spaces
(trailing-spaces)
helm/basemap/templates/deployment.yaml (1)
46-53
: Flexible volume configuration looks good!The changes to the volume definition improve flexibility by allowing different storage types. The use of
storageType
in the name is a good practice for clarity and configuration management.A minor suggestion for improvement:
Consider adding a comment explaining the purpose of the
storageType
variable and its possible values for better maintainability. For example:volumes: # storageType can be either "pvc" for persistent volume claim or any other value for emptyDir - name: {{ .Values.envName }}-basemap-tileserver-{{ .Values.tileserver.storageType }} {{- if eq .Values.tileserver.storageType "pvc" }} ...helm/raster-tiler/templates/deployment.yaml (1)
128-134
: Flexible storage configuration looks good!The introduction of conditional storage type configuration is a good improvement. It allows for more flexibility in different environments and use cases. The use of a PersistentVolumeClaim when
storageType
is "pvc" provides a persistent storage option, while the emptyDir with a size limit offers a controlled temporary storage solution.Consider adding a comment explaining the purpose and implications of each storage type option. This can help other developers understand the reasoning behind this configuration. For example:
{{- if eq .Values.storageType "pvc" }} # Use PersistentVolumeClaim for persistent storage across pod restarts persistentVolumeClaim: claimName: {{ .Values.envName }}-raster-tiler {{- else }} # Use emptyDir for temporary storage, limited to specified size emptyDir: sizeLimit: {{ .Values.storage_size }} {{- end }}helm/basemap/templates/cronjobs.yaml (2)
Line range hint
24-51
: LGTM: Init container for PVC usage check addedThe addition of the init container for checking PVC usage is a good practice. It ensures that the PVC has enough space before starting the main containers. A few observations:
- The condition for adding the init container is correct (only when storageType is "pvc").
- Environment variables are properly set, including the use of fieldRef for the current pod name.
- Resource limits and requests are correctly defined using values from the Helm chart.
Suggestion for improvement:
Consider adding afailureThreshold
andsuccessThreshold
to the init container to control how many times it should retry before considering the job as failed. This can help in cases of temporary issues with PVC checks.Here's an example of how you could add these fields:
initContainers: - name: check-pvc-usage # ... (existing configuration) ... failureThreshold: 3 successThreshold: 1🧰 Tools
🪛 yamllint
[warning] 20-20: wrong indentation: expected 4 but found 6
(indentation)
133-148
: LGTM: Flexible volume definitions addedThe changes to the volume definitions are well-implemented and provide good flexibility:
- The configuration now supports both PVC and emptyDir storage types based on the
storageType
value.- PersistentVolumeClaim is correctly used when
storageType
is "pvc", otherwise emptyDir is used.- The emptyDir configuration includes a sizeLimit, which is a good practice to prevent unbounded growth.
- The changes are applied consistently for both processor and postgres volumes.
Suggestion for improvement:
Consider adding a validation for thestorageType
value in the Helm chart's values.yaml or in the templates to ensure it only accepts valid options (e.g., "pvc" or "emptyDir"). This can prevent potential issues if an invalid storage type is specified.You could add a validation in the templates like this:
{{- if not (or (eq .Values.basemap.storageType "pvc") (eq .Values.basemap.storageType "emptyDir")) }} {{- fail "storageType must be either 'pvc' or 'emptyDir'" }} {{- end }}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (25)
- flux/clusters/k8s-01/basemap/dev/kustomization.yaml (1 hunks)
- flux/clusters/k8s-01/basemap/prod/kustomization.yaml (1 hunks)
- flux/clusters/k8s-01/basemap/test/kustomization.yaml (1 hunks)
- flux/clusters/k8s-01/event-api-db/overlays/DEV/kustomization.yaml (1 hunks)
- flux/clusters/k8s-01/event-api-db/overlays/DEV/postgrescluster-assigning.yaml (0 hunks)
- flux/clusters/k8s-01/event-api-db/overlays/DEV/postgrescluster-instances.yaml (1 hunks)
- flux/clusters/k8s-01/event-api-db/overlays/PROD/kustomization.yaml (1 hunks)
- flux/clusters/k8s-01/event-api-db/overlays/PROD/postgrescluster-instances.yaml (1 hunks)
- flux/clusters/k8s-01/event-api-db/overlays/PROD/postgrescluster-resource-adjustment.yaml (0 hunks)
- flux/clusters/k8s-01/raster-tiler/dev/kustomization.yaml (1 hunks)
- flux/clusters/k8s-01/raster-tiler/geocint/kustomization.yaml (1 hunks)
- helm/basemap/Chart.yaml (1 hunks)
- helm/basemap/templates/cronjobs.yaml (5 hunks)
- helm/basemap/templates/deployment.yaml (3 hunks)
- helm/basemap/templates/pvc.yaml (3 hunks)
- helm/basemap/values.yaml (1 hunks)
- helm/basemap/values/values-dev.yaml (2 hunks)
- helm/basemap/values/values-prod.yaml (2 hunks)
- helm/basemap/values/values-test.yaml (2 hunks)
- helm/raster-tiler/Chart.yaml (1 hunks)
- helm/raster-tiler/templates/deployment.yaml (1 hunks)
- helm/raster-tiler/templates/pvc.yaml (2 hunks)
- helm/raster-tiler/values.yaml (1 hunks)
- helm/raster-tiler/values/values-dev.yaml (1 hunks)
- helm/raster-tiler/values/values-geocint.yaml (1 hunks)
💤 Files with no reviewable changes (2)
- flux/clusters/k8s-01/event-api-db/overlays/DEV/postgrescluster-assigning.yaml
- flux/clusters/k8s-01/event-api-db/overlays/PROD/postgrescluster-resource-adjustment.yaml
🧰 Additional context used
🪛 yamllint
flux/clusters/k8s-01/event-api-db/overlays/DEV/postgrescluster-instances.yaml
[error] 2-2: trailing spaces
(trailing-spaces)
[error] 12-12: trailing spaces
(trailing-spaces)
[error] 13-13: trailing spaces
(trailing-spaces)
[error] 19-19: trailing spaces
(trailing-spaces)
flux/clusters/k8s-01/event-api-db/overlays/PROD/postgrescluster-instances.yaml
[error] 2-2: trailing spaces
(trailing-spaces)
[error] 12-12: trailing spaces
(trailing-spaces)
[error] 13-13: trailing spaces
(trailing-spaces)
[error] 19-19: trailing spaces
(trailing-spaces)
helm/basemap/templates/cronjobs.yaml
[warning] 20-20: wrong indentation: expected 4 but found 6
(indentation)
helm/basemap/templates/pvc.yaml
[error] 1-1: syntax error: expected the node content, but found '-'
(syntax)
🔇 Additional comments (42)
helm/basemap/values/values-dev.yaml (4)
Line range hint
1-26
: Summary of review and next stepsThe changes to
values-dev.yaml
represent a significant shift towards a stateless architecture for both the tileserver and basemap components. While this aligns with the PR objectives, several critical areas require further clarification and documentation:
- Data persistence strategies for both components
- Performance and scalability implications
- Integration between ephemeral storage and S3
- Impact on the overall system architecture and data flow
Next steps:
- Provide responses to the questions raised in the review comments
- Update or create architectural documentation to reflect these changes
- Consider conducting performance tests to validate the impact of these changes
- Ensure that operational procedures are updated to account for the new architecture
The changes are approved pending the clarifications and documentation updates requested above.
26-26
: Evaluate the impact of emptyDir storage on basemap processing and data flowThe addition of
storageType: emptydir
for the basemap component raises several important considerations:
- Processing persistence: Given the processing schedule (
"0 0 */2 * *"
), how will the use of ephemeral storage affect the continuity of processing between runs?- Data flow: With S3 storage configured (
s3: path: s3://basemap-locker01/DEV
), clarify the data flow between the ephemeral local storage and S3.- Database implications: How does this change impact the PostgreSQL database configuration (
db: image: "nexus.kontur.io:8084/postgis/postgis"
)?Let's verify the related configurations to understand the full context:
#!/bin/bash # Search for basemap-related configurations and S3 usage rg --type yaml 'basemap:' -A 15 rg --type yaml 's3:'Consider documenting the new data flow and processing strategy to ensure all team members understand the implications of this architectural change.
13-13
: Document the new stateless architecture and data flowThe changes to both
tileserver
andbasemap
components to useemptydir
storage represent a significant shift towards a stateless architecture, aligning with the PR objectives. However, this architectural change has far-reaching implications that should be carefully considered and documented:
- Overall system resilience: How does this change affect the system's ability to handle component failures or restarts?
- Data consistency: What mechanisms are in place to ensure data consistency between ephemeral storage, S3, and the PostgreSQL database?
- Scalability: How does this new architecture support horizontal scaling of the tileserver and basemap components?
- Performance: What is the expected impact on system performance, particularly for frequently accessed tiles or during basemap processing?
- Disaster recovery: How has the disaster recovery strategy been adjusted to account for the increased reliance on ephemeral storage?
I recommend creating or updating architectural documentation to reflect these changes. This should include:
- A diagram illustrating the new data flow between components
- Clear explanations of how data persistence is maintained despite the use of ephemeral storage
- Updated operational procedures for managing and monitoring the system
Would you like assistance in creating a template for this architectural documentation?
Let's check for existing documentation that might need updating:
#!/bin/bash # Search for documentation files related to architecture or data flow fd -e md -e txt | xargs rg -i 'architecture|data flow|storage'Also applies to: 26-26
13-13
: Clarify the implications of using emptyDir for tileserver storageThe addition of
storageType: emptydir
for the tileserver implies that you're using ephemeral storage. This aligns with the PR objective of making components stateless, but it's important to consider the following:
- Data persistence: emptyDir volumes are deleted when a Pod is removed. How will this affect the tileserver's data between restarts or redeployments?
- Performance: Depending on the size and frequency of tile requests, how might this impact the tileserver's performance compared to persistent storage?
- Scalability: With ephemeral storage, how will you handle scaling the tileserver across multiple replicas?
Could you provide more context on the data persistence strategy for the tileserver? Let's verify if there are any related configurations:
✅ Verification successful
Verified the use of
storageType: emptydir
inhelm/basemap/values/values-dev.yaml
aligns with other environment configurations. No issues detected regarding the data persistence strategy.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for tileserver-related configurations rg --type yaml 'tileserver:' -A 10Length of output: 220
Script:
#!/bin/bash # Search for storage or persistence configurations related to tileserver rg --type yaml 'tileserver' -A 10 | rg 'storageType|persistence|volume'Length of output: 1116
helm/basemap/values/values-prod.yaml (2)
26-26
: Evaluate the use of emptyDir for basemap storageThe addition of
storageType: emptydir
for the basemap component raises some concerns:
- EmptyDir storage is ephemeral, which means any local data will be lost when the pod is terminated or rescheduled.
- The basemap configuration includes an S3 path and URL, suggesting that some data is stored externally. However, there might be local data or caches that could be affected by this change.
- The
processingSchedule
indicates that periodic tasks are run. The output or intermediate data from these tasks might need persistent storage.Please confirm if:
- All necessary data for the basemap component is stored in S3, and local storage is only used for temporary processing.
- The periodic processing tasks do not require persistent local storage between runs.
To verify the current storage usage and configuration, you can run:
#!/bin/bash # Check for any existing PersistentVolumeClaims or PersistentVolumes related to basemap kubectl get pvc,pv -l app=basemap -n $(kubectl get namespaces -o jsonpath='{.items[*].metadata.name}' | tr ' ' '\n' | grep prod) # Check if there are any volumes mounted to the basemap pods kubectl get pods -l app=basemap -n $(kubectl get namespaces -o jsonpath='{.items[*].metadata.name}' | tr ' ' '\n' | grep prod) -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{range .spec.volumes[*]}{.name}{": "}{.emptyDir.medium}{"\n"}{end}{"\n"}{end}'This will help us understand the current storage configuration and ensure that the change to emptyDir doesn't negatively impact the basemap component's functionality.
13-13
: Verify the implications of using emptyDir for tileserver storageThe addition of
storageType: emptydir
for the tileserver component means that the storage will be ephemeral and tied to the pod's lifecycle. This change could have significant implications:
- Any data stored by the tileserver will be lost when the pod is terminated or rescheduled.
- This might affect the persistence of cached tiles or other data that the tileserver might need to retain across restarts.
Could you please clarify the intent behind this change? If persistent storage is not required for the tileserver, this change is appropriate. However, if any data needs to persist across pod restarts, we might need to reconsider this approach.
To verify the current storage configuration, you can run:
This will help us understand if there are any existing persistent storage resources that might be affected by this change.
helm/basemap/values/values-test.yaml (2)
13-13
: Verify the impact of using emptyDir storage for tileserverThe addition of
storageType: emptydir
for the tileserver aligns with the PR objective of making the basemap stateless. However, this change has important implications:
- Data stored by the tileserver will not persist across pod restarts or rescheduling.
- This could potentially impact the performance and functionality of the tileserver, depending on how it uses local storage.
Please confirm the following:
- The tileserver can function correctly without persistent storage.
- Any necessary data is either fetched from external sources or regenerated as needed.
- The performance impact of using emptyDir has been evaluated and is acceptable for the test environment.
You may want to run load tests to ensure the system performs adequately with this configuration.
26-26
: Confirm the implications of using emptyDir storage for basemapThe addition of
storageType: emptydir
for the basemap component is consistent with the similar change made to the tileserver, further supporting the PR objective of making the basemap stateless. However, this change requires careful consideration:
- It ensures that no data persists locally across pod restarts or rescheduling.
- This could significantly impact the basemap processing workflow, especially considering the presence of a processing schedule and S3 storage configuration.
Please address the following points:
- Confirm that the basemap processing can function correctly without persistent local storage.
- Verify that all necessary data can be retrieved from S3 or other external sources as needed.
- Ensure that the processing results are properly stored in S3 or other persistent storage.
- Check if this change affects the processing schedule efficiency, as data might need to be re-downloaded or regenerated more frequently.
Additionally, consider documenting these architectural changes and their implications for the team's reference.
helm/raster-tiler/templates/pvc.yaml (4)
1-1
: LGTM: Conditional PVC creation based on storage type.The use of a conditional block to create the PVC only when
.Values.storageType
is "pvc" is a good practice. It allows for flexible deployment configurations and ensures resources are created only when needed.
2-3
: LGTM: Correct API version and resource kind for PVC.The API version "v1" and resource kind "PersistentVolumeClaim" are correctly specified for creating a PVC in Kubernetes.
🧰 Tools
🪛 yamllint
[error] 2-2: syntax error: expected '', but found ''
(syntax)
25-25
: LGTM: Proper closure of conditional block.The conditional block is correctly closed, ensuring that the entire PVC definition is only included when
.Values.storageType
is "pvc".
Line range hint
16-24
: LGTM: Well-structured PVC spec with flexible configuration.The spec section is correctly defined and uses Helm variables effectively:
- Storage size and class are configurable via Helm values.
- Volume mode is set to Filesystem, which is appropriate for most use cases.
Verify that ReadWriteOnce is the intended access mode for this PVC. If multiple pods need simultaneous access, consider using ReadWriteMany (if supported by the storage class).
To check the available access modes for your storage class, run:
Ensure that the access mode aligns with your application's requirements.
flux/clusters/k8s-01/event-api-db/overlays/DEV/kustomization.yaml (1)
Line range hint
1-33
: Verify removal of previously existing patchesThe AI summary mentions the removal of
postgrescluster-resource-adjustment.yaml
andpostgrescluster-assigning.yaml
patches, but these changes are not visible in the provided code snippet. Let's verify if these patches have indeed been removed.Run the following script to check for the existence and references of these patches:
If the script confirms the removal of these patches and finds no remaining references, then the changes are consistent with the AI summary. Otherwise, we may need to investigate further or update the kustomization file accordingly.
✅ Verification successful
Patch removals in DEV overlay verified successfully.
All specified patches have been removed from
flux/clusters/k8s-01/event-api-db/overlays/DEV/kustomization.yaml
. No remaining references to these patches were found within this file.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the removal of previously existing patches # Test 1: Check if the removed patch files still exist for file in "postgrescluster-resource-adjustment.yaml" "postgrescluster-assigning.yaml"; do if [ -f "flux/clusters/k8s-01/event-api-db/overlays/DEV/$file" ]; then echo "Warning: $file still exists." else echo "$file has been removed." fi done # Test 2: Check for any remaining references to the removed patches rg "postgrescluster-resource-adjustment.yaml|postgrescluster-assigning.yaml" --type yamlLength of output: 3555
helm/basemap/Chart.yaml (2)
18-18
: Version bump approved.The chart version has been incremented from 0.0.37 to 0.0.45, which is a significant minor version increase. This aligns with the expectation of substantial changes related to the PR objectives.
18-18
: Verify completeness of changes for PR objectives.The version bump suggests significant changes, but it's important to ensure all necessary modifications for moving copy local path based PVs or making basemap stateless are included in this release.
Please confirm that all required changes for the PR objectives "move copy local path based pvs outside of hwn01 or make basemap stateless" are implemented and tested in this version. Consider running the following script to check for related changes:
✅ Verification successful
All required changes are present and correctly implemented for version 0.0.45.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for changes related to PV management or stateless modifications # Test: Search for PV-related changes echo "Checking for PV-related changes:" rg --type yaml 'persistentVolume|pv:|storageClass' helm/ # Test: Search for stateless-related modifications echo "Checking for stateless-related changes:" rg --type yaml 'statefulset|deployment' helm/ # Test: Look for any new environment variables or config changes that might indicate stateless modifications echo "Checking for potential stateless config changes:" rg --type yaml 'env:|configMap|secret' helm/Length of output: 17768
flux/clusters/k8s-01/raster-tiler/dev/kustomization.yaml (1)
19-34
: LGTM! The node affinity patch aligns with the PR objectives.The added patch correctly implements node affinity rules for the 'dev-raster-tiler' Deployment, preventing scheduling on the 'hwn01.k8s-01.kontur.io' node. This change aligns well with the PR objective of moving resources outside of hwn01.
To ensure this change doesn't inadvertently affect other environments, let's verify that similar patches aren't present in production configurations:
flux/clusters/k8s-01/raster-tiler/geocint/kustomization.yaml (2)
19-34
: LGTM! Verify impact on service availability.The addition of the nodeAffinity configuration to the Deployment 'geocint-raster-tiler' looks good. This change aligns with the PR objective of moving copy local path based PVs outside of hwn01.
To ensure this change doesn't negatively impact service availability, run the following script:
#!/bin/bash # Description: Verify the impact of nodeAffinity on service availability # Test 1: Check if there are enough nodes available that meet the new affinity requirements echo "Checking available nodes:" kubectl get nodes -o wide | grep -v hwn01.k8s-01.kontur.io # Test 2: Check if the Deployment can be scheduled with the new affinity rules echo "Checking Deployment schedulability:" kubectl get deployment geocint-raster-tiler -o yaml | rg "Progressing|Available" # Test 3: Check for any potential single point of failure echo "Checking for potential single point of failure:" kubectl get pods -l app=geocint-raster-tiler -o wide | rg "NODE"Also, consider if this change might concentrate too many pods on a subset of nodes, potentially impacting performance. You may want to review the overall node utilization after this change.
Line range hint
14-17
: LGTM! Verify the new values files.The update to the
valuesFiles
field in the HelmRelease patch looks good. This change aligns with the PR objective of potentially refactoring the basemap or relocating PVs.To ensure the new values files exist and contain the expected configurations, run the following script:
flux/clusters/k8s-01/event-api-db/overlays/PROD/kustomization.yaml (3)
17-17
: Replaced resource adjustment patch with instances patchThe
postgrescluster-resource-adjustment.yaml
patch has been replaced withpostgrescluster-instances.yaml
. This change suggests a shift from general resource adjustments to more specific instance management.Let's verify the removal of the old patch file and compare the contents of the old and new patch files:
#!/bin/bash # Check if the old patch file still exists if [ -f "flux/clusters/k8s-01/event-api-db/overlays/PROD/postgrescluster-resource-adjustment.yaml" ]; then echo "Warning: Old patch file postgrescluster-resource-adjustment.yaml still exists." echo "Contents of old patch file:" cat "flux/clusters/k8s-01/event-api-db/overlays/PROD/postgrescluster-resource-adjustment.yaml" else echo "Old patch file postgrescluster-resource-adjustment.yaml has been removed as expected." fi # Compare the old and new patch files if both exist if [ -f "flux/clusters/k8s-01/event-api-db/overlays/PROD/postgrescluster-resource-adjustment.yaml" ] && [ -f "flux/clusters/k8s-01/event-api-db/overlays/PROD/postgrescluster-instances.yaml" ]; then echo "Diff between old and new patch files:" diff "flux/clusters/k8s-01/event-api-db/overlays/PROD/postgrescluster-resource-adjustment.yaml" "flux/clusters/k8s-01/event-api-db/overlays/PROD/postgrescluster-instances.yaml" fi
Line range hint
1-41
: Overall impact aligns with PR objectivesThe changes in this file, particularly the introduction of the new instances patch and the replacement of the resource adjustment patch, align well with the PR objective of potentially making the basemap stateless or moving persistent volumes outside of hwn01. These modifications suggest a more granular approach to instance management, which could facilitate better control over stateful components.
However, to fully understand the impact and ensure alignment with the PR objectives, it would be beneficial to review the contents of the new
postgrescluster-instances.yaml
file. This will help verify if the changes indeed contribute to making the basemap stateless or moving PVs as intended.Let's verify the contents of the new patch file and check for any PV-related configurations:
#!/bin/bash # Display contents of the new patch file echo "Contents of postgrescluster-instances.yaml:" cat "flux/clusters/k8s-01/event-api-db/overlays/PROD/postgrescluster-instances.yaml" # Check for PV-related configurations echo "Checking for PV-related configurations in the new patch file:" grep -i "persistentvolume" "flux/clusters/k8s-01/event-api-db/overlays/PROD/postgrescluster-instances.yaml"
12-17
: New patch added for instance managementA new patch has been introduced for managing PostgresCluster instances. This addition suggests an intent to have more granular control over the database instances.
To ensure this change is consistent with other environments and to understand its contents, let's verify the existence and contents of the new patch file:
✅ Verification successful
Verification Successful: No Issues Found
The new patch for instance management in
postgrescluster-instances.yaml
has been successfully added without removing or replacing any existing configurations. Similar patches are present in other environments, ensuring consistency.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check if the new patch file exists and display its contents if [ -f "flux/clusters/k8s-01/event-api-db/overlays/PROD/postgrescluster-instances.yaml" ]; then echo "Contents of postgrescluster-instances.yaml:" cat "flux/clusters/k8s-01/event-api-db/overlays/PROD/postgrescluster-instances.yaml" else echo "postgrescluster-instances.yaml not found in the expected location." fi # Check if similar patches exist in other environments echo "Checking for similar patches in other environments:" find flux/clusters/k8s-01/event-api-db/overlays -name "postgrescluster-instances.yaml"Length of output: 3322
Script:
#!/bin/bash # Check if the new patch file exists and display its contents PATCH_FILE="flux/clusters/k8s-01/event-api-db/overlays/PROD/postgrescluster-instances.yaml" if [ -f "$PATCH_FILE" ]; then echo "Contents of postgrescluster-instances.yaml:" cat "$PATCH_FILE" else echo "postgrescluster-instances.yaml not found in the expected location." fi # Check if similar patches exist in other environments echo "Checking for similar patches in other environments:" find flux/clusters/k8s-01/event-api-db/overlays -name "postgrescluster-instances.yaml" # Check for PV-related configurations in the new patch file echo "Checking for PV-related configurations in the new patch file:" grep -i "persistentvolume" "$PATCH_FILE"Length of output: 3768
flux/clusters/k8s-01/basemap/prod/kustomization.yaml (3)
Line range hint
7-18
: LGTM: HelmRelease patch is correctly configured.The HelmRelease patch is properly set up to use both the general
values.yaml
and the environment-specificvalues-prod.yaml
. This configuration allows for a good separation of common and production-specific settings.
19-34
: LGTM: Deployment patch adds correct node affinity rule.The new patch for the 'prod-basemap-tileserver' Deployment correctly adds a node affinity rule to avoid scheduling on 'hwn01.k8s-01.kontur.io'. This aligns with the PR objective of moving resources outside of hwn01.
To ensure this change doesn't negatively impact the cluster's load distribution, please verify the following:
- The cluster has sufficient resources on other nodes to accommodate this Deployment.
- There are no specific requirements for this Deployment that are only met by the excluded node.
You can use the following command to check the current node utilization:
And to verify the number of available nodes:
35-50
: LGTM: CronJob patch adds correct node affinity rule, but consider the implications.The new patch for the 'prod-basemap' CronJob correctly adds a node affinity rule to require scheduling on 'hwn02.k8s-01.kontur.io'. This aligns with the PR objective of adjusting resource placement.
However, it's worth noting that this affinity rule is more restrictive than the one used for the Deployment. While the Deployment avoids a specific node, this CronJob is required to run on a specific node.
Please confirm the following:
- The 'hwn02.k8s-01.kontur.io' node has the necessary resources and configuration to run this CronJob.
- There's a specific reason for running this CronJob on this particular node (e.g., data locality, specialized hardware).
- There's a plan in place for what happens if the specified node is unavailable when the CronJob needs to run.
You can check the node's status and resources with:
And verify the CronJob's configuration after applying these changes:
flux/clusters/k8s-01/basemap/test/kustomization.yaml (2)
35-50
: Evaluate the impact of excluding hwn01 for the Deployment.The addition of a node affinity rule for the 'test-basemap-tileserver' Deployment to avoid 'hwn01.k8s-01.kontur.io' is in line with the PR objective. This approach offers more flexibility compared to the CronJob patch as it allows scheduling on any node except hwn01. However, there are still some points to consider:
- The comment indicates this is a temporary solution, which is good for addressing immediate concerns but may require a follow-up.
- Excluding a specific node might impact load distribution and resource utilization.
Consider the following suggestions:
- Document the specific reasons for excluding hwn01 to aid in future decision-making.
- Create a plan for a more permanent solution that doesn't rely on excluding specific nodes.
- Ensure that the cluster has sufficient resources to handle this deployment without hwn01.
To validate the impact of this change, run:
#!/bin/bash # Check the current node distribution of pods for this deployment kubectl get pods -l app=test-basemap-tileserver -o wide # Verify if there are any other affinity rules or node selectors for this deployment rg --type yaml 'kind:\s*Deployment' -A 20 | rg 'name:\s*test-basemap-tileserver' -A 20 | rg -e 'nodeSelector:' -e 'affinity:'
19-34
: Consider the implications of node-specific scheduling for the CronJob.The addition of a node affinity rule for the 'test-basemap' CronJob to run specifically on 'hwn02.k8s-01.kontur.io' is noted. While this achieves the immediate goal of moving the job outside of hwn01, there are some considerations:
- This approach might limit scalability and flexibility in the long term.
- The comment indicates this is a temporary solution, which aligns with the PR objective but may need revisiting.
Consider the following suggestions:
- If possible, use more generic labels (e.g., node roles or types) instead of specific hostnames for better scalability.
- Document the rationale behind this temporary change and create a follow-up task to implement a more permanent solution.
- Evaluate if this change addresses the root cause of why the job needs to be moved from hwn01.
To ensure this change doesn't conflict with existing configurations, run:
✅ Verification successful
Node affinity verification successful for 'test-basemap' CronJob.
No existing
nodeAffinity
ornodeSelector
configurations found for thetest-basemap
CronJob. The added node affinity rule to schedule the job onhwn02.k8s-01.kontur.io
is confirmed without conflicts.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check for any other affinity rules or node selectors for this CronJob rg --type yaml 'kind:\s*CronJob' -A 20 | rg 'name:\s*test-basemap' -A 20 | rg -e 'nodeSelector:' -e 'affinity:'Length of output: 256
Script:
#!/bin/bash # Search for node affinity or node selectors in the 'test-basemap' CronJob configurations rg --type yaml 'kind:\s*CronJob' -A 20 | rg 'name:\s*test-basemap' -A 20 | rg -e 'nodeAffinity:' -e 'nodeSelector:'Length of output: 113
helm/basemap/values.yaml (3)
29-29
: Approved: Basemap job suspension explicitly disabledThe addition of
suspend: false
for the basemap job ensures that it will run as scheduled and not be suspended.Could you please clarify:
- Was the basemap job previously suspended by default?
- Is this change part of a broader strategy to ensure consistent job execution across environments?
To verify the impact of this change, run the following script:
#!/bin/bash # Description: Check for usage of the 'suspend' field in CronJob templates echo "Checking for 'suspend' field usage in CronJob templates:" rg --type yaml 'kind:\s*CronJob' -A 20 -g 'templates/*.yaml' | rg '^\s*suspend:'This will help us understand how the
suspend
field is used across different CronJob templates in the chart.
30-30
: Approved: Basemap storage type set to PVCThe addition of
storageType: pvc
for the basemap component aligns with the PR objective of improving persistent volume handling. This change should provide more robust and scalable storage for the basemap data.To ensure proper configuration, please verify the following:
- The PVC for the basemap is correctly defined in the templates.
- The basemap deployment or job is updated to use this PVC.
- The
processorStorageSize
in thepvc
section (currently 500Gi) is appropriate for the basemap's needs.Run the following script to check the PVC configuration for the basemap:
#!/bin/bash # Description: Verify PVC configuration for basemap # Check for PVC template echo "Checking for basemap PVC template:" rg --type yaml 'kind:\s*PersistentVolumeClaim' -A 10 -g 'templates/*basemap*.yaml' # Check for PVC usage in basemap deployment or job echo "Checking for PVC usage in basemap deployment or job:" rg --type yaml 'kind:\s*(Deployment|Job|CronJob)' -A 50 -g 'templates/*basemap*.yaml' | rg -A 10 'volumes:' | rg 'persistentVolumeClaim'Additionally, could you please clarify if this change makes the basemap stateless or if it's part of moving the local path-based PVs? Understanding the specific goal will help ensure the implementation aligns with the intended architecture.
26-26
: Approved: Tileserver storage type set to PVCThe addition of
storageType: pvc
for the tileserver aligns with the PR objective of improving persistent volume handling. This change should provide more robust and scalable storage for the tileserver component.To ensure proper configuration, please verify the following:
- The PVC for the tileserver is correctly defined in the templates.
- The tileserver deployment is updated to use this PVC.
- The
tileStorageSize
in thepvc
section (currently 150Gi) is appropriate for the tileserver's needs.Run the following script to check the PVC configuration:
flux/clusters/k8s-01/event-api-db/overlays/DEV/postgrescluster-instances.yaml (3)
28-51
: Consider the trade-offs of strict node affinity.The current configuration uses strict node affinity, binding each instance to a specific node:
- hwn04 instance to hwn04.k8s-01.kontur.io
- hwn03 instance to hwn03.k8s-01.kontur.io
While this ensures predictable scheduling, it may limit flexibility and resilience:
- If a specified node goes down, the pod cannot be rescheduled on another node.
- It may lead to uneven resource utilization across your cluster.
Consider using a less strict affinity rule (e.g., preferredDuringSchedulingIgnoredDuringExecution) or node labels that are less specific, allowing for more flexible scheduling while still maintaining some control over pod placement.
To check the current node status and labels, which can help in designing more flexible affinity rules, run:
#!/bin/bash # Get node status and labels kubectl get nodes -o wide --show-labelsThis will provide an overview of your nodes and their labels, helping you design more flexible affinity rules if needed.
20-27
: Review storage allocation and align with empirical recommendations.The current storage allocation (100Gi) is significantly lower than the commented empirical recommendation (2500Gi). This large discrepancy warrants attention:
- If 100Gi is sufficient for your DEV environment, consider removing the commented recommendation to avoid confusion.
- If your data growth projections align more closely with the 2300Gi to 6200Gi range mentioned in the comment, consider increasing the allocated storage.
Remember that underprovisioning storage can lead to operational issues, while overprovisioning may result in unnecessary costs.
To check the current storage usage and help determine the appropriate size, you can run:
#!/bin/bash # Get the current storage usage of PostgreSQL PVCs kubectl get pvc -l app.kubernetes.io/name=postgres -o custom-columns=NAME:.metadata.name,SIZE:.spec.resources.requests.storage,USED:.status.capacity.storageThis will show you the allocated and used storage for your PostgreSQL volumes.
8-19
: Review resource allocations and consider adjusting them.The current resource allocations seem quite high, especially for a DEV environment:
- Memory limit of 512Gi is extremely high. Even the commented empirical recommendation of 100Gi seems excessive for most development scenarios.
- CPU limit of 64 cores is also very high. The empirical recommendation of 5 cores might be more appropriate.
Consider adjusting these values based on actual usage patterns in your development environment. Over-allocation can lead to inefficient resource utilization.
To help determine appropriate resource allocations, you can run the following command to check the actual resource usage of your PostgreSQL pods:
This will provide insights into the actual CPU and memory usage, helping you set more accurate resource limits and requests.
🧰 Tools
🪛 yamllint
[error] 12-12: trailing spaces
(trailing-spaces)
[error] 13-13: trailing spaces
(trailing-spaces)
[error] 19-19: trailing spaces
(trailing-spaces)
flux/clusters/k8s-01/event-api-db/overlays/PROD/postgrescluster-instances.yaml (2)
28-51
: LGTM: Node affinity configuration.The node affinity configuration for both instances (hwn04 and hwn03) is well-defined. This ensures that each instance is scheduled on a specific node, which is good for data locality and performance in a PostgreSQL cluster.
1-71
: Overall: Configuration aligns well with PR objectives.This configuration file for PostgreSQL cluster instances aligns well with the PR objectives of moving or making the basemap stateless. The use of node affinity ensures better control over instance placement, while the switchover configuration provides flexibility for managing the cluster state.
Key points:
- Well-defined resource allocations and storage specifications.
- Proper use of node affinity for instance placement.
- Inclusion of switchover capabilities for improved state management.
These changes contribute to a more robust and manageable PostgreSQL cluster configuration, which should help in achieving the goals of the PR.
🧰 Tools
🪛 yamllint
[error] 2-2: trailing spaces
(trailing-spaces)
[error] 12-12: trailing spaces
(trailing-spaces)
[error] 13-13: trailing spaces
(trailing-spaces)
[error] 19-19: trailing spaces
(trailing-spaces)
helm/basemap/templates/deployment.yaml (2)
71-71
: Consistent volume mount naming in initContainerThe change in the volume mount name for the initContainer is consistent with the volume definition changes. This ensures that the correct volume is used based on the configured storage type.
Line range hint
46-113
: Summary: Improved flexibility in storage configurationThe changes in this deployment template significantly enhance the flexibility of storage configuration for the basemap tileserver. By introducing a dynamic
storageType
variable, the template now supports both persistent volume claims and emptyDir volumes, allowing for easier adaptation to different environments or requirements.Key improvements:
- Dynamic volume naming based on
storageType
- Conditional logic to handle different storage types
- Consistent naming across volume definition and mounts in both initContainer and main container
These changes align well with the PR objective of making the basemap more flexible or stateless, depending on the chosen configuration. The implementation is clean and maintains good Helm practices.
helm/raster-tiler/templates/deployment.yaml (2)
Line range hint
1-134
: Summary: Good improvement in storage flexibilityThe changes to this deployment configuration file are focused and beneficial:
- The introduction of conditional storage type configuration enhances flexibility.
- The rest of the deployment configuration remains intact, minimizing potential side effects.
- This change allows for easier management of different storage requirements across various environments.
Overall, this is a positive change that improves the adaptability of the raster-tiler deployment without introducing unnecessary complexity.
To ensure a smooth transition and proper use of this new feature:
- Make sure to update the Helm chart version to reflect these changes.
- Consider adding a note to the changelog or release notes about this new storage configuration option.
- Ensure that the team is aware of this new configuration option and understands when to use each storage type.
Great job on improving the deployment configuration!
128-134
: Verify Helm values and update documentationThe new storage configuration looks good, but it's important to ensure that the supporting elements are in place:
- Verify that the Helm values file includes the new
storageType
andstorage_size
variables, with appropriate default values.- Update the README or other documentation to explain these new configuration options and their implications.
- Consider adding comments in the values file to guide users on when to use each storage type.
To check the Helm values file, you can run:
This will help ensure that the new configuration options are properly supported and documented.
✅ Verification successful
Helm values and documentation are properly configured
The
storageType
andstorage_size
variables are correctly defined in the Helm values file and appropriately utilized across the deployment templates. Ensure that the documentation has been updated to reflect these configurations.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check if storageType and storage_size are defined in values.yaml grep -E "storageType:|storage_size:" helm/raster-tiler/values.yaml # Check if there's any documentation update grep -ri "storageType" helm/raster-tiler/Length of output: 490
helm/basemap/templates/cronjobs.yaml (4)
19-21
: LGTM: Suspend feature addedThe addition of the suspend feature is a good improvement. It allows for easy temporary disabling of the CronJob when needed, which can be useful for maintenance or debugging purposes.
Note: The indentation warning from yamllint can be safely ignored as it's likely due to the Helm templating.
🧰 Tools
🪛 yamllint
[warning] 20-20: wrong indentation: expected 4 but found 6
(indentation)
63-63
: LGTM: Volume mount name updatedThe update to the volume mount name is correct and consistent with the changes in volume definitions. This modification allows for dynamic naming based on the storage type, which increases the flexibility of the configuration.
123-123
: LGTM: Postgres volume mount name updatedThe update to the postgres volume mount name is correct and consistent with the changes in volume definitions and the basemap-rendering container. This modification maintains consistency across the configuration and supports the dynamic storage type feature.
Line range hint
1-148
: Overall assessment: Well-implemented improvements to CronJob flexibilityThe changes in this file significantly enhance the flexibility and configurability of the CronJob:
- The new suspend feature allows for easy temporary disabling of the CronJob.
- The addition of an init container for PVC usage checks improves reliability when using persistent storage.
- The flexible volume definitions support both PVC and emptyDir storage types, allowing for different deployment scenarios.
- The changes are consistently applied throughout the file, maintaining coherence in the configuration.
These improvements will make the basemap CronJob more adaptable to different environments and use cases, while also enhancing its reliability and maintainability.
|
||
storageType: emptydir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Conflict Between storageType
and Persistent Volume Claims
The storageType
is set to emptydir
in the values files, but the deployment template references a persistentVolumeClaim
. This inconsistency may lead to deployment issues.
- File:
helm/raster-tiler/templates/deployment.yaml
containspersistentVolumeClaim:
🔗 Analysis chain
Verify the implications of using emptyDir for storage
The addition of storageType: emptydir
suggests a shift in how storage is managed for this application. While emptyDir can be useful for temporary storage, it has some important considerations:
-
Data persistence: emptyDir is ephemeral and does not persist data across pod restarts or rescheduling. Ensure this aligns with the application's data storage requirements.
-
Performance: Depending on the emptyDir medium (memory or disk), performance characteristics may differ from the previous storage solution.
-
Capacity: There's no explicit size limit for emptyDir (unless using memory medium). Ensure this doesn't conflict with the
storage_size: 100Gi
parameter defined earlier. -
Consistency with PR objectives: This change appears to align with the PR objective of making the basemap stateless, as emptyDir provides temporary, non-persistent storage.
To ensure this change is consistent across environments and doesn't conflict with existing configurations, please run the following script:
Consider documenting this change in the chart's README or values description, explaining the rationale behind using emptyDir and any implications for data persistence or application behavior.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for storageType configuration and potential conflicts
# Check if storageType is consistently set across all environment files
echo "Checking storageType across environments:"
rg "storageType:" helm/raster-tiler/values/
# Check for potential conflicts with persistent volume claims
echo "\nChecking for PVC configurations:"
rg "persistentVolumeClaim" helm/raster-tiler/templates/
Length of output: 478
@@ -32,3 +32,5 @@ resources: | |||
memory: 4Gi | |||
limits: | |||
memory: 512Gi | |||
|
|||
storageType: emptydir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Potential Conflict with Existing Storage Configurations
The addition of storageType: emptydir
in values-dev.yaml
may conflict with the existing storage_size: 100Gi
and storageClass: local-path
parameters. Consider the following:
- Clarify Intent: Determine whether
storageType: emptydir
is meant to replace or complement the existing storage settings. - Adjust Configurations: If
emptydir
is intended to override, remove or adjuststorage_size
andstorageClass
accordingly to prevent conflicts. - Scoped Usage: If
emptydir
is for specific components, ensure that the configuration clearly reflects its scoped application.
🔗 Analysis chain
New storageType
parameter added
The addition of storageType: emptydir
introduces a new configuration option for storage in the dev environment. This change has potential implications:
-
Using
emptydir
means that the storage is ephemeral and tied to the pod's lifecycle. This is suitable for temporary storage needs but may lead to data loss if the pod is rescheduled or restarted. -
This configuration might conflict with or override the existing
storage_size: 100Gi
andstorageClass: local-path
parameters (lines 17-18). It's unclear which storage configuration takes precedence.
To ensure consistency and understand the usage of these storage parameters, let's check other files in the project:
Consider the following recommendations:
- Clarify the relationship between
storageType
,storage_size
, andstorageClass
in comments or documentation. - If
emptydir
is intended for specific components only, consider using a more specific parameter name or structuring the configuration to clearly indicate which components use which storage type. - Evaluate if using
emptydir
in the dev environment aligns with the application's data persistence requirements.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for storage-related configurations in other value files
echo "Searching for storage configurations in other value files:"
rg --type yaml 'storage(Type|Class|_size):' helm/raster-tiler/values/
Length of output: 985
- target: # new config for instances management | ||
group: postgres-operator.crunchydata.com | ||
version: v1beta1 | ||
kind: PostgresCluster | ||
name: db-event-api | ||
path: postgrescluster-assigning.yaml | ||
path: postgrescluster-instances.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Incomplete removal of patch files
The patch files postgrescluster-resource-adjustment.yaml
and postgrescluster-assigning.yaml
are still referenced in several kustomization.yaml
files across different overlays. Please remove these references to ensure consistency and prevent configuration issues.
-
References to
postgrescluster-resource-adjustment.yaml
:- flux/clusters/k8s-01/user-profile-api-db/overlays/TEST/kustomization.yaml
- flux/clusters/k8s-01/user-profile-api-db/overlays/DEV/kustomization.yaml
- flux/clusters/k8s-01/user-profile-api-db/overlays/PROD/kustomization.yaml
- flux/clusters/k8s-01/layers-db/overlays/PROD/kustomization.yaml
- flux/clusters/k8s-01/layers-db/overlays/DEV/kustomization.yaml
- flux/clusters/k8s-01/layers-db/overlays/TEST/kustomization.yaml
- flux/clusters/k8s-01/keycloak-db/overlays/TEST/kustomization.yaml
- flux/clusters/k8s-01/keycloak-db/overlays/PROD/kustomization.yaml
- flux/clusters/k8s-01/keycloak-db/overlays/DEV/kustomization.yaml
- flux/clusters/k8s-01/insights-api-db/overlays/TEST/kustomization.yaml
- flux/clusters/k8s-01/event-api-db/overlays/TEST/kustomization.yaml
- flux/clusters/k8s-01/insights-api-db/overlays/PROD/kustomization.yaml
- flux/clusters/k8s-01/insights-api-db/overlays/DEV/kustomization.yaml
-
References to
postgrescluster-assigning.yaml
:- flux/clusters/k8s-01/user-profile-api-db/overlays/TEST/kustomization.yaml
- flux/clusters/k8s-01/layers-db/overlays/DEV/kustomization.yaml
- flux/clusters/k8s-01/layers-db/overlays/PROD/kustomization.yaml
- flux/clusters/k8s-01/layers-db/overlays/TEST/kustomization.yaml
- flux/clusters/k8s-01/user-profile-api-db/overlays/DEV/kustomization.yaml
- flux/clusters/k8s-01/user-profile-api-db/overlays/PROD/kustomization.yaml
- flux/clusters/k8s-01/insights-api-db/overlays/TEST/kustomization.yaml
- flux/clusters/k8s-01/keycloak-db/overlays/TEST/kustomization.yaml
- flux/clusters/k8s-01/insights-api-db/overlays/PROD/kustomization.yaml
- flux/clusters/k8s-01/keycloak-db/overlays/PROD/kustomization.yaml
- flux/clusters/k8s-01/keycloak-db/overlays/DEV/kustomization.yaml
- flux/clusters/k8s-01/insights-api-db/overlays/DEV/kustomization.yaml
🔗 Analysis chain
New patch added for instances management
A new patch postgrescluster-instances.yaml
has been added for the PostgresCluster resource db-event-api
. This addition aligns with the PR objective of potentially making the basemap stateless or moving persistent volumes.
To ensure this change is consistent with the rest of the configuration, please run the following script:
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the contents of the new patch file and its references
# Test 1: Check if the new patch file exists
if [ -f "flux/clusters/k8s-01/event-api-db/overlays/DEV/postgrescluster-instances.yaml" ]; then
echo "postgrescluster-instances.yaml exists."
# Display the contents of the file
cat "flux/clusters/k8s-01/event-api-db/overlays/DEV/postgrescluster-instances.yaml"
else
echo "Error: postgrescluster-instances.yaml not found."
fi
# Test 2: Check for any other references to the new patch file
rg "postgrescluster-instances.yaml" --type yaml
Length of output: 3449
Script:
#!/bin/bash
# Description: Verify the removal of postgrescluster-resource-adjustment.yaml and postgrescluster-assigning.yaml
# Files to check
FILES=("postgrescluster-resource-adjustment.yaml" "postgrescluster-assigning.yaml")
BASE_DIR="flux/clusters/k8s-01/event-api-db/overlays/DEV"
for FILE in "${FILES[@]}"; do
# Test 1: Check if the patch file exists
if [ -f "${BASE_DIR}/${FILE}" ]; then
echo "Error: ${FILE} still exists in ${BASE_DIR}."
else
echo "Success: ${FILE} has been removed from ${BASE_DIR}."
fi
# Test 2: Check for any references to the removed patch file in all kustomization.yaml files
MATCHES=$(rg "${FILE}" --type yaml)
if [ -z "$MATCHES" ]; then
echo "Success: No references to ${FILE} found in any kustomization.yaml files."
else
echo "Error: References to ${FILE} found in the following kustomization.yaml files:"
echo "$MATCHES"
fi
done
Length of output: 12927
kind: CronJob | ||
name: 'dev-basemap' | ||
group: batch | ||
version: v1 | ||
patch: |- | ||
- op: add | ||
path: /spec/jobTemplate/spec/template/spec/affinity/nodeAffinity | ||
value: | ||
requiredDuringSchedulingIgnoredDuringExecution: | ||
nodeSelectorTerms: | ||
- matchExpressions: | ||
- key: kubernetes.io/hostname | ||
operator: In | ||
values: | ||
- hwn02.k8s-01.kontur.io |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Inconsistent Node Affinity Strategy Detected
The addition of a hard-coded hostname (hwn02.k8s-01.kontur.io
) for node affinity in the CronJob introduces an inconsistency with the repository's existing practices, which utilize labels for node selection. This deviation can lead to maintenance challenges and reduced flexibility.
Recommendations:
-
Align with Existing Practices:
- Use node labels instead of hard-coded hostnames for affinity. For example:
nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: node-type operator: In values: - basemap-storage
- Use node labels instead of hard-coded hostnames for affinity. For example:
-
Document Rationale:
- If using a specific hostname is necessary, include comments within the YAML file or in the PR description to explain the reasoning behind this choice.
-
Consider Flexibility:
- Implement a fallback mechanism or use
preferredDuringSchedulingIgnoredDuringExecution
to ensure the CronJob can run even if the specified node is unavailable.
- Implement a fallback mechanism or use
🔗 Analysis chain
Consider using more flexible node selection criteria
The addition of node affinity to the CronJob aligns with the PR objective of moving copy local path based PVs outside of hwn01. However, there are some concerns and suggestions to consider:
- Hard-coding a specific node hostname (
hwn02.k8s-01.kontur.io
) reduces flexibility and can cause issues if the node is unavailable or renamed. - This approach might not scale well in a dynamic cluster environment.
Consider the following recommendations:
-
Use more flexible node selection criteria, such as node labels or zones, instead of a specific hostname. For example:
nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: node-type operator: In values: - basemap-storage
-
Document the reason for this specific node requirement in a comment within the YAML file or in the PR description. This will help future maintainers understand the rationale behind this decision.
-
Consider implementing a fallback mechanism or using
preferredDuringSchedulingIgnoredDuringExecution
instead ofrequiredDuringSchedulingIgnoredDuringExecution
to ensure the job can still run if the preferred node is unavailable.
To ensure this change doesn't conflict with other scheduling policies, let's check for any existing affinity rules or node selectors:
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check for existing affinity rules or node selectors in the dev environment
rg --type yaml 'affinity:|nodeSelector:' flux/clusters/k8s-01/basemap/dev/
Length of output: 76
Script:
#!/bin/bash
# Comprehensive search for affinity rules or node selectors in the entire repository
rg --type yaml 'affinity:|nodeSelector:'
Length of output: 1176
Script:
#!/bin/bash
# Search for node affinity definitions using labels instead of hard-coded hostnames
rg --type yaml 'nodeSelector' flux/clusters/k8s-01/ | rg -v 'kubernetes\.io/hostname'
Length of output: 2624
# Switchover section | ||
# https://access.crunchydata.com/documentation/postgres-operator/latest/tutorials/cluster-management/administrative-tasks#changing-the-primary | ||
# https://access.crunchydata.com/documentation/postgres-operator/latest/tutorials/cluster-management/administrative-tasks#targeting-an-instance | ||
# 1. Requires fully-qualified instance id of new master | ||
# 2. New master must be in sink with current master (check via `patronictl list`) | ||
# 3. Its recommended to retain switchover section, to keep track of last desired master both in code and k8s | ||
- op: add | ||
path: /spec/patroni/switchover | ||
value: | ||
enabled: true | ||
targetInstance: db-event-api-hwn04-OVERRIDEME | ||
#type: Failover | ||
# trigger-switchover annotation triggers actual switchover whenever its updated | ||
# value is arbitrary, but it's recommended to reference the date of switchover | ||
# special syntax is used to escape slash with sequence ~1 | ||
# See https://windsock.io/json-pointer-syntax-in-json-patches-with-kustomize/ | ||
- op: add | ||
path: /metadata/annotations/postgres-operator.crunchydata.com~1trigger-switchover | ||
value: OVERRIDEME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Action Required: Replace Placeholder Values Before Applying Configuration
The switchover configuration contains placeholder values that must be updated before it can be safely applied:
- Line 63:
targetInstance: db-event-api-hwn04-OVERRIDEME
- Line 71:
value: OVERRIDEME
Please replace these placeholders with the appropriate instance identifiers and values to ensure proper functionality.
🔗 Analysis chain
LGTM: Switchover configuration with a note on placeholders.
The switchover configuration is well-documented with helpful comments explaining the process and requirements. The use of an annotation to trigger the switchover is a good practice for Kubernetes-based operations.
However, please note that there are placeholders that need to be replaced before this configuration can be used:
- Line 63:
db-event-api-hwn04-OVERRIDEME
- Line 71:
OVERRIDEME
Ensure these are replaced with appropriate values before applying this configuration.
To verify the presence of these placeholders, you can run:
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check for OVERRIDEME placeholders in the file
grep -n "OVERRIDEME" flux/clusters/k8s-01/event-api-db/overlays/PROD/postgrescluster-instances.yaml
Length of output: 175
@@ -105,7 +110,7 @@ spec: | |||
protocol: TCP | |||
volumeMounts: | |||
- mountPath: /data | |||
name: {{ .Values.envName }}-basemap-tileserver-pvc | |||
name: {{ .Values.envName }}-basemap-tileserver-{{ .Values.tileserver.storageType }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codebase verification
Inconsistent Volume Mount Naming Detected
The old volume naming convention is still present in helm/basemap/templates/pvc.yaml
. Please update it to match the new naming convention used in the deployment.
🔗 Analysis chain
Consistent volume mount naming in main container
The change in the volume mount name for the main container is consistent with the volume definition and initContainer changes. This ensures that the correct volume is used based on the configured storage type.
To ensure consistency across the entire template, please run the following verification script:
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify consistent use of the new volume naming convention
# Test: Check for any remaining instances of the old naming convention
# Expect: No output, indicating all instances have been updated
rg --type yaml 'name:\s+\{\{\s*\.Values\.envName\s*\}\}-basemap-tileserver-pvc' helm/basemap/templates/
# Test: Verify consistent use of the new naming convention
# Expect: Consistent output showing the new naming pattern
rg --type yaml 'name:\s+\{\{\s*\.Values\.envName\s*\}\}-basemap-tileserver-\{\{\s*\.Values\.tileserver\.storageType\s*\}\}' helm/basemap/templates/
Length of output: 737
No description provided.