Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Addition of Istio observability add-ons and enabling cross-zone loadbalancing on Ingress NLB #1735

Closed
wants to merge 3 commits into from

Conversation

vchintal
Copy link
Contributor

@vchintal vchintal commented Aug 17, 2023

Description

  1. Added [jaeger, prometheus, grafana, kiali] observability add-ons
  2. Added annotation on the istio-ingress NLB to allow cross-zone loadbalancing

How was this change tested?

  • Yes, I have tested the PR using my local account setup (Provide any test evidence report under Additional Notes)
  • Yes, I ran pre-commit run -a with this PR

@vchintal vchintal requested a review from a team as a code owner August 17, 2023 18:18
@vchintal vchintal temporarily deployed to EKS Blueprints Test August 17, 2023 18:24 — with GitHub Actions Inactive
@dcyoung
Copy link

dcyoung commented Aug 18, 2023

I tried this PR yesterday.

In the previous code, the "istio-system" namespace is created by terraform with injection enabled, but after the proposed changes I see that neither istio-system nor istio-ingress namespaces contain the injection label. Does this matter/is it intentional?

Additionally, inspecting the target groups in the created load balancer show multiple unhealthy instances. I'm guessing this is because the ingress is deployed to a single node? This was also the case in the original code, and likely not introduced by this PR. However, while changes to this example are being proposed, is there any expected node selection logic or updates to the target groups that would avoid the unhealthy instances?

@vchintal
Copy link
Contributor Author

In the previous code, the "istio-system" namespace is created by terraform with injection enabled, but after the proposed changes I see that neither istio-system nor istio-ingress namespaces contain the injection label. Does this matter/is it intentional?

It was intentional. The install documentation never actually suggested that istio-system have that label. However once istio is installed, this label must be there on namespaces where microservices will be deployed that are meant to be part of the mesh.

Additionally, inspecting the target groups in the created load balancer show multiple unhealthy instances. I'm guessing this is because the ingress is deployed to a single node? This was also the case in the original code, and likely not introduced by this PR. However, while changes to this example are being proposed, is there any expected node selection logic or updates to the target groups that would avoid the unhealthy instances?

I deployed the full bookinfo application and tested it out and everything is working just fine. Of all the listeners, the TargetGroup for Protocol:Port of TCP:15021 is all healthy. The TargetGroup for Protocol:Port of TCP:80 was initially all in unhealthy state but once the bookinfo application is deployed one of the instances started showing up healthy. I am unable to provide further explanation on this as I need to look into this further but it seems the health status is driven applications opening the corresponding node port.

@vchintal
Copy link
Contributor Author

I have further update. The istio-ingress LB accepts traffic on three ports and forwards it to the istio-ingress pod in istio-ingress namespace. Which ports (443 or 80) get opened up depends on the type of Gateway one deploys. In my example, bookinfo-gateway opens the traffic on port 80 and hence the instance on which istio-ingress deployment was running on ended up being healthy. If the istio-ingress deployment is scaled to 3 all of the pods land on unique instances and hence all them instantly turned healthy. Hope this explanation helps.

@dcyoung
Copy link

dcyoung commented Aug 18, 2023

Hope this explanation helps.

it does. thanks for the insight!


exec {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the exec method is the method that is recommended by the Hashicorp providers (Kubernetes, Helm) - lets not change this

@@ -154,21 +157,19 @@ resource "helm_release" "istiod" {
name = "istiod"
namespace = helm_release.istio_base.metadata[0].namespace
version = local.istio_chart_version
wait = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by default, our addons do not wait - instead, we rely on the normal Kubernetes contructs provided by the operators/controllers deployed by the charts to eventually reconcile their state. Lets revert this and remove the explicit depends on for other Istio resources

}
}
}
)
]
depends_on = [helm_release.istiod, helm_release.istio_base]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets remove this

content = each.value.response_body
}

resource "kubectl_manifest" "istio_addon" {
Copy link
Contributor

@bryantbiggs bryantbiggs Aug 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we want to minimize, and eventually remove, the use of this provider since has not been updated in some time and looks like it might be abandoned gavinbunney/terraform-provider-kubectl#247

Can we just add instructions in the document to do something like curl <your file url> | kubectl apply -f -

such as curl https://raw.githubusercontent.com/istio/istio/master/samples/addons/kiali.yaml | kubectl apply -f -

@@ -14,6 +14,14 @@ terraform {
source = "hashicorp/kubernetes"
version = ">= 2.20"
}
kubectl = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with the changes above, we should be able to remove these two

@vchintal
Copy link
Contributor Author

Thanks @bryantbiggs the review was clear and extremely helpful. Though I know I can fix the same PR, I would rather issue a clean one based on the recommendations.

@vchintal vchintal closed this Aug 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants