Spike - Wazuh kubernetes #907

teddytpc1 · 2024-11-28T19:22:45Z

Objective
https://github.com/wazuh/internal-devel-requests/issues/1319

Description

As part of the DevOps overhaul objective we need to conduct research, analyze alternatives, and design how to implement the following changes.

Repository Scope Clarification:

The Wazuh Kubernetes repository should focus solely on deployments.
Analyze improvements for the Wazuh Kubernetes deployment, including an analysis of the Ingress controller and Load Balancer optimization.
Leverage the Docker images improvements to use health checks for the different pods.

Deployment Simplification:

Utilize out-of-the-box Wazuh configurations for deployments.

Testing Improvements:

Enhance deployment tests with additional checks, including log validation for errors and warnings.
The test workflows must work with development images.
The test workflows must allow using specific image tags for each component for manual workflow executions.

Documentation Updates:

Simplify/improve the Kubernetes installation documentation.

Implementation restrictions

Testing Environment: The tests must be implemented using GitHub Actions (GHA).
Compatibility: The workflow should be compatible with the environments used for PR testing and manual testing.
Logs Validation: The logs checking must identify and report critical issues (e.g., errors, warnings) in a clear and actionable way.
Minimal Maintenance: The implementation should aim for low complexity and minimal maintenance overhead.

Plan

TBD

vcerenu · 2024-12-10T19:38:15Z

I have been investigating the service options to use to deploy Wazuh, eliminating the use of AWS NLBs and thereby simplifying the use of cloud resources.

vcerenu · 2024-12-11T15:10:39Z

Conclusions

Repository Scope Clarification:

The Wazuh Kubernetes repository should focus solely on deployments.

Analyze improvements for the Wazuh Kubernetes deployment, including an analysis of the Ingress controller and Load Balancer optimization.

Leverage the Docker images improvements to use health checks for the different pods.

Currently the Kubernetes repository is focused solely on deployments, so we do not have to split it up.
The Load Balancer service in Kubernetes is vendor agnostic, so its use depends on the need we have when deploying the components.
The option to use Load Balancer is a design issue. The need for load balancing can be solved with a Load Balancer or by adding an Ingress Controller that takes care of this work, both solutions work and will depend on the client's infrastructure. As an example of deployment, I see the use of Load Balancers as convenient, because the purpose of the services is better understood and requires fewer dependencies at the time of a test.
It is also necessary to review what types of connections Wazuh Manager will require in the version to be able to determine what type of Load Balancer is the most convenient for the connections, since currently in AWS we are using a CLB, but we should migrate this deployment to use an NLB or an ALB.
Regarding the health checks of the pods, we can add liveness and readiness probes, which perform small checks before labeling a pod as healthy and also the need to restart it. These health checks can also be useful for the tests we perform, since we can use all the deployment review logic in each deployed pod and thus be able to trust the pod's status better, without having to depend on external controls.

Deployment Simplification:

Utilize out-of-the-box Wazuh configurations for deployments.

Regarding the simplification of the implementation, we have to take into account that Kubernetes is responsible for deploying all the components that normally correspond to both software and hardware in a deployment in VMs, so its simplification depends more on the purpose of the deployment itself. If we want to have a product ready to run, we have to add all the necessary components for it, such as deployments, statefulset, services, secrets and others, which must have the default parameters in many cases and depending on the Docker images we have, the possibility of adding parameters that allow the configuration of the deployment. Currently we have a basic scheme that differentiates a deployment in a cluster installed by itself and a deployment that customizes many parameters for AWS services. We need to check if we are going to keep these customizations or if we are only going to provide a base that can be adapted by the user to the environment he wants and if we are going to request dependencies for the deployment, such as the use of network load balancers in AWS or the use of Ingress Controller.

Testing Improvements:

Enhance deployment tests with additional checks, including log validation for errors and warnings.

The test workflows must work with development images.

The test workflows must allow using specific image tags for each component for manual workflow executions.

We currently have a test workflow, which has several checks but requires adding some more, such as error controls in logs. This workflow only uses images published on Docker Hub, so it requires having images in those image repositories. We must adapt these tests to use images from private repositories, which is possible and the use of these repositories was successfully analyzed. Regarding the issue of using specific image tags, it is required to add the logic within the test.

Documentation Updates:

Simplify/improve the Kubernetes installation documentation.

Regarding the Kubernets documentation, it currently contains the necessary parameters for the deployment itself, but it is missing information regarding the possibility of adding integrations and other configurations that require additional software, such as AWS CLI for integrations with AWS, an SMTP client for sending emails, etc. We should plan what type of information we require in our documentation, not duplicate data that may already be in the documentation for each component and, if necessary, add links to the software and dependencies that we require. ex: Kubectl and Kustomize.
We should also provide more information regarding the communication required by the Wazuh components, in order to help users build their own deployments based on the base we provide.

Currently we do not have all the information regarding the communication protocols that Wazuh will require for the connection, such as the internal processes that would allow us to determine the tests with greater precision, so this issue is blocked until we receive this information.

teddytpc1 · 2024-12-17T18:48:17Z

Update

We need to develop a plan with all the items from the analysis. The plan must be ordered and each task must have an Owner and the teams involved.
After the plan is validated, we should create the corresponding issues to start working on each task.

AurimasNav · 2024-12-20T06:53:38Z

please consider some community feedback in the scope of this overhaul:
#576
#251

vcerenu · 2024-12-20T16:34:24Z

Hello @AurimasNav

We are analyzing this type of requests.
In the first instance we want to build a better base so that any user can customize and deploy Wazuh as each one needs, so we will focus more on making it a basic installation and that they can take our repo and customize it with their own manifests rather than on providing a fully customized complete deployment. From there we leave it free for each one to add their own functionalities and not provide them with a closed package ready to install according to our decisions.
We have not yet started with this development, but any proposal in this regard will be analyzed. I only ask that these proposals be made in their respective issue, in order to maintain better control over them.

Tokynet · 2024-12-28T15:53:18Z

Hello all,

From the outside it seems like there are 2 tasks bundled in this request.

How to install Wazuh to a k8s cluster.
Proposed deployment architecture.

IMO, the project should focus on (#1) creating a Helm chart for installation of the Wazuh platform first, not on what Ingress Controller or Loadbalancer users/customers choose to leverage. Having a Helm chart should allow iteration in the installation method to then add other components like cert-manager etc. To me, having a Helm chart is a maturity component and it helps align the installation of the platform with GitOps workflows, since Helm charts are versioned.

For #2, you can build the best possible Architecture and deployment and label it as "Suggested Architecture" to help users visualize and plan (costs and changes to their environment etc) in preparation for implementation.

I would also mention that as a security tool/application you should focus on including Network policies in your k8s manifests.

vcerenu · 2025-01-07T22:30:10Z

Update

According to the spike performed, I have determined a series of general tasks that we must perform for the correct update of the wazuh-kubernetes repository and its respective documentation:

Steps:

Modify the Kubernetes manifests to eliminate the use of configmaps used to mount files inside containers. This task depends largely on the possibility of customizing the configuration files of Wazuh components through environment variables, which is already solved in Wazuh dashboard and Wazuh indexer, but we have to wait to see the new configuration file for Wazuh manager to be able to adapt the image to this form of deployment. In addition, it is necessary to verify the possibility of changing passwords at the start of the cluster, finding in the first instance a solution that is included in the Docker image.
Modify the EKS deployment to use NLB, which was previously tested and reduces the number of LBs needed to deploy the full Wazuh stack
Modify the documentation, updating the changes made in the deployment.
Modify the Kubernetes test so that it can use development images, adding the repository that contains them and having the possibility of running the test from custom images created and uploaded to a repository other than the production one.

These tasks can be modified according to the development needs of each component and the changes made to the images being deployed, so these steps may change before or during the development stage.

Update

Analyze the use of network policies within deployed services

vcerenu · 2025-01-08T14:42:06Z

Hello @Tokynet

We are currently making several changes to our product, so our deployments will undergo a large number of changes.
We are adapting our repository to provide a deployment guide that everyone can use as a basis and develop on it the type of deployment they prefer and that best suits their environment.
First of all, we want to provide a series of manifests so that users can take as an example and, if necessary, perform a deployment test, which allows them to see how Wazuh works in their cluster and on that they can develop the solutions they need, so we also want to support these changes with documentation that they can review and that will help them in their development.

As we continue with the development of these new changes, we will analyze the incorporation of more and better deployment solutions, but in the first instance we want to start with a base that allows our users to adapt our product to their deployment needs.

crlsgms · 2025-01-16T20:14:24Z

Hello all,

From the outside it seems like there are 2 tasks bundled in this request.
1. How to install Wazuh to a k8s cluster.

2. Proposed deployment architecture.
IMO, the project should focus on (#1) creating a Helm chart for installation of the Wazuh platform first, not on what Ingress Controller or Loadbalancer users/customers choose to leverage. Having a Helm chart should allow iteration in the installation method to then add other components like cert-manager etc. To me, having a Helm chart is a maturity component and it helps align the installation of the platform with GitOps workflows, since Helm charts are versioned. ....

Hello everyone, just giving my 2 cents as I struggled a lot to setup wazuh here on our environment :D

I'v tryied to use the @morgoved helm setup, but it depends on a 3rd party reloader and also not much customizable, similar to the wazuh-kubernetes deployment that takes into account that you will install ONLY ONE instance.

Here its 100% on premises (rancher + k8s + longhorn) and the 'local-env' do not help much as the only suggestion is over changing the storage class, what can be done before the deployment and fixed on the overlays.

Also it do not count on any customization and parallel usage, such as:

changing the cluster name, as there are some hardcoded references
also changing the namespace the hostname of the manager needs to be adapted to k8s dns, with the namespace not the cluster name, took some time to get this 👯 (if its not proper to the namespace, everything works, but agents wont register)

<nodes>
        <node>wazuh-manager-master-0.wazuh-cluster.MY-NAMESPACE</node> ## 
</nodes>

dashboard is somewhat heavy, so using a VPA become quite helpful to serve a smooth usage, and both to the dashboard and the indexer the cpu/ ram resources suggested are waaaay below that they need, mostly the indexer. The indexer can run fine with lower resources, but on the first deployment as its creating data / shards / etc it needs a lot of cpu and ram, and without enough it goes always to crashloop-backoff.
the indexer cannot use a vpa, as a statefulset with splitup data it needs always to scale / gracefully shutdown and the hpa/ vpa features from k8s do not mix well with this kind of statefulset, the workaround is to when first deploy abuse on resources, and when stable manually patch it down.
using multiple instances and letting access come from the internet, its not good to let the registrations opened only with password, so its good to use the certificates and its not there on the creation scripts, nor as a volume into the master / workers to be added, also needed to tailor affinity / volumes for the managers to get this done
the approach both to the big clouds and local-env are the same for the services, but on premises almost all setups have a firewall on the border that can handle the load balancing, so its more useful to have nodeports available (also on the 514 upd that isnt on the .yaml) for syslog input, so having it all or customizable is easier to setup and have access throughout the internet directly
also an ingress / service would be nice to setup for local environments, as we usually have a wildcard or whatever tls certificate and a domain to set public
about automation, its quite annoying to create users / rules / rulesets with opensearch scripts that in the end just convert a text password to base64 hash, that can be done with vault or with a script ran before getting kustomize to apply everything there.

with all those changes needed, considering that for now the view on the deployment focus still a single deployment without documentation on how to properly change the clustername and namespace, IMHO do not think that helm would be a good approach.

For instance if you setup cert-manager, longhorn, vault, jenkins or whatever deployment you will have only one deployment on the system namespace, but wazuh as for the latest versions is clearly going to multiple custers, centralized management for many deployments and etc, pining it down even more to be a single setup could be more hard to maintain and upgrade as will always have a lot to change over the base-simpler deployment.

So... what about going straight to an operator? even doing just the first level that's the deployment, we the product evolves surely can use all features of an operator to auto-manage itself and be easy to deploy many instances on the same cluster.

https://operatorframework.io/

https://developers.redhat.com/articles/2024/01/29/developers-guide-kubernetes-operators

vcerenu · 2025-01-20T15:35:43Z

An issue has been created for task control for the Wazuh Kubernetes MVP v5.0.0:

MVP - Kubernetes - Kubernetes Deployment #959

teddytpc1 added type/enhancement level/task Task issue labels Nov 28, 2024

vcerenu self-assigned this Dec 10, 2024

wazuhci moved this to Pending review in XDR+SIEM/Release 5.0.0 Jan 22, 2025

wazuhci added this to XDR+SIEM/Release 5.0.0 Jan 22, 2025

vcerenu mentioned this issue Jan 23, 2025

MVP - Kubernetes - Kubernetes Deployment #959

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spike - Wazuh kubernetes #907

Spike - Wazuh kubernetes #907

teddytpc1 commented Nov 28, 2024

vcerenu commented Dec 10, 2024

vcerenu commented Dec 11, 2024 •

edited

Loading

teddytpc1 commented Dec 17, 2024

AurimasNav commented Dec 20, 2024

vcerenu commented Dec 20, 2024

Tokynet commented Dec 28, 2024

vcerenu commented Jan 7, 2025 •

edited

Loading

vcerenu commented Jan 8, 2025

crlsgms commented Jan 16, 2025 •

edited

Loading

vcerenu commented Jan 20, 2025

Spike - Wazuh kubernetes #907

Spike - Wazuh kubernetes #907

Comments

teddytpc1 commented Nov 28, 2024

Description

Implementation restrictions

Plan

vcerenu commented Dec 10, 2024

vcerenu commented Dec 11, 2024 • edited Loading

Conclusions

teddytpc1 commented Dec 17, 2024

Update

AurimasNav commented Dec 20, 2024

vcerenu commented Dec 20, 2024

Tokynet commented Dec 28, 2024

vcerenu commented Jan 7, 2025 • edited Loading

Update

Steps:

Update

vcerenu commented Jan 8, 2025

crlsgms commented Jan 16, 2025 • edited Loading

vcerenu commented Jan 20, 2025

vcerenu commented Dec 11, 2024 •

edited

Loading

vcerenu commented Jan 7, 2025 •

edited

Loading

crlsgms commented Jan 16, 2025 •

edited

Loading