Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CCXDEV-12875: Enable Insight Operator entitlements for multi arch clusters #1066

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

opokornyy
Copy link

This PR implements the gathering of architectures used by nodes and the retrieval of entitlement certificates for each architecture in use. These certificates are then stored in secrets.

If only one architecture is present, the secret is named etc-pki-entitlement.
If multiple architectures are present, secrets are created with names like etc-pki-entitlement-ARCH, where ARCH represents the specific architecture.

Categories

  • Bugfix
  • Data Enhancement
  • Feature
  • Backporting
  • Others (CI, Infrastructure, Documentation)

Sample Archive

The archive won't change with this feature

Documentation

No documentation update

Unit Tests

  • pkg/ocm/sca/architectures_gather_test.go
  • pkg/ocm/sca/sca_test.go

Privacy

Yes. There are no sensitive data in the newly collected information.

Changelog

No

Breaking Changes

Yes/No

References

https://issues.redhat.com/browse/CCXDEV-12875

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jan 21, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jan 21, 2025

@opokornyy: This pull request references CCXDEV-12875 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.19.0" version, but no target version was set.

In response to this:

This PR implements the gathering of architectures used by nodes and the retrieval of entitlement certificates for each architecture in use. These certificates are then stored in secrets.

If only one architecture is present, the secret is named etc-pki-entitlement.
If multiple architectures are present, secrets are created with names like etc-pki-entitlement-ARCH, where ARCH represents the specific architecture.

Categories

  • Bugfix
  • Data Enhancement
  • Feature
  • Backporting
  • Others (CI, Infrastructure, Documentation)

Sample Archive

The archive won't change with this feature

Documentation

No documentation update

Unit Tests

  • pkg/ocm/sca/architectures_gather_test.go
  • pkg/ocm/sca/sca_test.go

Privacy

Yes. There are no sensitive data in the newly collected information.

Changelog

No

Breaking Changes

Yes/No

References

https://issues.redhat.com/browse/CCXDEV-12875

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from ncaak and tremes January 21, 2025 12:44
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 21, 2025
)

// Mapping of kubernetes architecture labels to the format used by SCA API
var kubernetesArchMapping = map[string]string{
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if we can use only the map or if we need a function to return a default value when no value is found.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I think it would make sense to use some default value - e.g the arch of node the operator is running on

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree. Let's add a little function to return the arch from operator's node as default value. That could even help with testing.

@opokornyy
Copy link
Author

There is one case I am not exactly sure how to handle: if a secret is already created and a node with a different architecture is added to the cluster, we will have one secret named etc-pki-entitlement and another with an architecture suffix, both containing the same information. I am not sure if we should, clean up the old secret or take some other action. WDYT?

return
}

klog.Infof("%s secret successfully updated", secretName)
c.StatusController.UpdateStatus(controllerstatus.Summary{
klog.Info("sca secret successfully updated")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to keep the secretName in the log message.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have moved this log into the createSecret and updateSecret where I can log the secret name easily

// check & update the secret here
err = c.checkSecret(ctx, &ocmRes)
if err != nil {
klog.Errorf("Error when checking the %s secret: %v", secretName, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. There's no sca secret right.

if err != nil {
klog.Errorf("Unable to decode response: %v", err)
return true, err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would better to move this out of the exponential backoff function call

architectures, err := c.gatherArchitectures(ctx)
if err != nil {
klog.Warningf("Gathering nodes architectures failed: %s", err.Error())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to continue in case of this error? It seems the architecture will be nil so I am wondering what will happen with the request.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will most likely fail on a request to the api with a 400 http return code, so at least it would save one request if we return early here

@opokornyy
Copy link
Author

/test e2e-gcp-ovn-techpreview

@opokornyy
Copy link
Author

/test okd-scos-e2e-aws-ovn

Comment on lines +14 to +15
"ppc": "ppc",
"ppc64": "ppc64",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"ppc": "ppc",
"ppc64": "ppc64",
"ppc": "ppc",
"ppc64": "ppc64",

only ppc64le is supported. There are no plans to support Big Endian

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have used all architectures that are supported by AMS api. But you are right that OCM right now supports only 4 architectures.
It was also discussed here: https://docs.google.com/document/d/1kT8uzjbmTTN2Zyhfo1jhMNR8q3PhkNJ9yGx83FOFL8k/edit?pli=1&disco=AAABZjOhM1A

Comment on lines +38 to +40
"ppc": "ppc",
"ppc64": "ppc64",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment on ppc/ppc64 support
We only support ppc64le


architectures := make(map[string]struct{})
for i := range nodes.Items {
nodeArch := nodes.Items[i].Status.NodeInfo.Architecture
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some situations where AutoScale MachineSet from zero where the secondary architecture is not present when this code 'could' run.

@aleskandro you might be interested in this and have some insights.

Copy link
Member

@aleskandro aleskandro Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Autoscale from zero should work and be architecture-aware with AWS, Azure and GCP IPI clusters.

It is not supported on BM infrastructures (IPI included).. I've never looked into the IBM Machine API providers though.

I'll get a better look at this PR asap

@opokornyy opokornyy force-pushed the CCX12875-multi-arch branch from ee43090 to 0e67f85 Compare January 27, 2025 14:09
@opokornyy
Copy link
Author

/test e2e-gcp-ovn-techpreview

@opokornyy
Copy link
Author

/test insights-operator-e2e-tests

Copy link

openshift-ci bot commented Jan 28, 2025

@opokornyy: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-scos-e2e-aws-ovn 0e67f85 link false /test okd-scos-e2e-aws-ovn

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@tremes
Copy link
Contributor

tremes commented Jan 29, 2025

I didn't try this, but the changes look good. Thank you!
/approve
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 29, 2025
Copy link

openshift-ci bot commented Jan 29, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: opokornyy, tremes

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants