Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Set resource version for CRs #1823

Merged
merged 1 commit into from
Oct 17, 2024

Conversation

rexagod
Copy link
Member

@rexagod rexagod commented Oct 8, 2024

Set resource version for all monitoring.go GVRs. Earlier, this was not done by the machinery responsible for it, which caused manifest applications that had no resource version present to encounter the following error:

metadata.resourceVersion: Invalid value: 0x0: must be specified for an update

This patch addresses that regression, which was introduced originally in #1575.


EDIT: I believe this is not a regression since 4.17 did not include #1575.

@openshift-ci openshift-ci bot requested review from bertinatto and deads2k October 8, 2024 19:15
@openshift-ci openshift-ci bot added the do-not-merge/invalid-owners-file Indicates that a PR should not merge because it has an invalid OWNERS file in it. label Oct 8, 2024
@rexagod rexagod changed the title regression: Set resource version for CRs fix: Set resource version for CRs Oct 8, 2024
"k8s.io/client-go/dynamic"
)

func TestResourceVersionApplication(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when were and how does this run?

Copy link
Member

@dgrisonnet dgrisonnet Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rexagod I added the CI wiring in openshift/release#57664, if you add something similar to the test-e2e-encryption rule in the Makefile https://github.com/openshift/library-go/blob/master/Makefile#L26-L27, the test will be run in CI

@dgrisonnet
Copy link
Member

The code change looks good to me, pending the resolution of #1823 (comment)

@deads2k
Copy link
Contributor

deads2k commented Oct 9, 2024

/approve
/assign @p0lyn0mial

for lgtm

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 9, 2024
Comment on lines 142 to 106
existingCopy.Object["metadata"], err = runtime.DefaultUnstructuredConverter.ToUnstructured(&existingObjectMetaTyped)
if err != nil {
return nil, false, err
}
Copy link
Member Author

@rexagod rexagod Oct 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is a breaking change, as earlier the method was only used to detect if the metadata fields differed, but not actually mutate them (which was intended).

However, in line with @p0lyn0mial's comment above, introducing this means a more consistent behavior (same as what we do for spec), while also offloading the merging behavior here to ensure the same practices OpenShift-wide when updating labels (merging instead of replacing).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously the metadata from the required obj were merged to the existing obj but the required obj was stored in the database. I thin this behaviour was unexpected, am I right ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, name and namespace fields were replaced, while the others were merged, however, instead of using the merged fields, required ones were given precedence. I didn't modify this behavior in my other PR since this was legacy, albeit less preferable, which is why we had to perform the "merge" on our end without breaking things.

@p0lyn0mial
Copy link
Contributor

@rexagod In general, I have a couple of suggestions for you to consider:

  1. It seems that ApplyUnstructuredResourceImproved method could be made unexported/private. Do you agree?

  2. The calling functions of the applyUnstructuredResourceImproved pass nil for defaultingFunc and equalityChecker, so I would like to remove these arguments from applyUnstructuredResourceImproved.

  3. I would like to remove the Fail-fast if the resource versions differ code, since we merge into the existing object, not the required one.

  4. I would like to remove an extra copy from the ensureGenericsSpec method.

  5. I would like to create a new method, EnsureObjectMetaForUnstructured, where EnsureObjectMeta is currently located.

  6. I would like to move the code that merges metadata into this new function (code from line 105 to line 144).

What do you think?

@rexagod rexagod marked this pull request as draft October 11, 2024 09:03
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 11, 2024
@rexagod
Copy link
Member Author

rexagod commented Oct 11, 2024

It seems that ApplyUnstructuredResourceImproved method could be made unexported/private. Do you agree?

The motivation to move to an unstructured.Unstructured approach was to accommodate for resources that do not have their own Apply<Resource> methods exposed. The ones in the library right now are only there for the sake of backwards-compatibility.

The calling functions of the applyUnstructuredResourceImproved pass nil for defaultingFunc and equalityChecker, so I would like to remove these arguments from applyUnstructuredResourceImproved.

To refer to my same reasoning as above, those are nil for the exposed resource-dedicated methods as that was the original behavior. However, when using ApplyUnstructuredResourceImproved for either (a) unknown resources (ones that do not have a dedicated method here), or (b) more control over the caching mechanism, those are still needed to leverage what the machinery has to offer. For instance, the exposed methods do not use a cache as well, to preserve older behavior, but nonetheless, something the user may still need.

I would like to remove the Fail-fast if the resource versions differ code, since we merge into the existing object, not the required one.

Done.

I would like to remove an extra copy from the ensureGenericsSpec method.

Done.

I would like to create a new method, EnsureObjectMetaForUnstructured, where EnsureObjectMeta is currently located. I would like to move the code that merges metadata into this new function (code from line 105 to line 144).

Done.

@rexagod rexagod marked this pull request as ready for review October 11, 2024 09:14
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 11, 2024
@openshift-ci openshift-ci bot requested review from deads2k and jsafrane October 11, 2024 09:14
@rexagod
Copy link
Member Author

rexagod commented Oct 11, 2024

Thank you for the thorough review, @p0lyn0mial. I'd like to point out that I've refactored monitoring.go code to make it more consistent and readable (for instance, the ensureGenericSpec signature changes), in addition to the much-needed refactors you pointed out owing to the older code smells that existed pre-#1575. PLMK if I missed something.

@@ -88,7 +84,8 @@ func ApplyUnstructuredResourceImproved(
}
existing, err := client.Resource(resourceGVR).Namespace(namespace).Get(ctx, name, metav1.GetOptions{})
if errors.IsNotFound(err) {
want, err := client.Resource(resourceGVR).Namespace(namespace).Create(ctx, required, metav1.CreateOptions{})
var want *unstructured.Unstructured
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: since we are returning from this branch anyway I would remove this variable and just simply want, err := client

Copy link
Member Author

@rexagod rexagod Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The err here was shadowed, but I've renamed it to avoid ambiguity.

}
err = runtime.DefaultUnstructuredConverter.FromUnstructured(requiredObjectMeta, &requiredObjectMetaTyped)
// Replace and/or merge certain metadata fields.
didMetadataModify := ptr.To(false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you change to a non-pointer variable to avoid dereferencing bellow ? (didMetadataModify := false)

// Deep-check the spec objects for equality, and update the cache in either case.
if defaultingFunc == nil {
defaultingFunc = noDefaulting
}
if equalityChecker == nil {
equalityChecker = equality.Semantic
}
existingCopy, didSpecModify, err := ensureGenericSpec(required, existingCopy, defaultingFunc, equalityChecker)
didSpecModify := ptr.To(false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here.

actual, err := client.Resource(resourceGVR).Namespace(namespace).Update(ctx, required, metav1.UpdateOptions{})
resourcehelper.ReportUpdateEvent(recorder, required, err)
cache.UpdateCachedResourceMetadata(required, actual)
var actual *unstructured.Unstructured
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also remove it and simply do actual, err :=

@@ -18,6 +22,47 @@ func EnsureObjectMeta(modified *bool, existing *metav1.ObjectMeta, required meta
MergeOwnerRefs(modified, &existing.OwnerReferences, required.OwnerReferences)
}

func EnsureObjectMetaForUnstructured(didMetadataModify *bool, existingCopy *unstructured.Unstructured, required *unstructured.Unstructured) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: for consistency with the EnsureObjectMeta could you rename the params to modified , existing , required?

@p0lyn0mial
Copy link
Contributor

let's try with #1836 - i have run it on my local machine:

❯ make test-e2e-monitoring
go test -mod=vendor -race ./test/e2e-monitoring/...
ok  	github.com/openshift/library-go/test/e2e-monitoring	(cached)

i'm waiting for the ci job to finish, if it works then we need something similar for the e2e-aws-encryption.

@dgrisonnet
Copy link
Member

/test e2e-aws-encryption (and local invocation) returns a make: Nothing to be done for 'test-e2e-encryption' at the moment

Oh I see, I missed that. If you add the following line, it should fix it:

diff --git a/Makefile b/Makefile                                 
index 43276499c..b466e269a 100644
--- a/Makefile
+++ b/Makefile
@@ -24,6 +24,7 @@ verify-podnetworkconnectivitychecks:
 	$(MAKE) -C pkg/operator/connectivitycheckcontroller verify
 
 test-e2e-encryption: GO_TEST_PACKAGES :=./test/e2e-encryption/...
+test-e2e-encryption: test-unit
 .PHONY: test-e2e-encryption
 
 test-e2e-monitoring

It must have been missed in the past

@dgrisonnet
Copy link
Member

Hmm actually it reminded me of something and it looks like I had a PR for it that never merged: #1470

@p0lyn0mial
Copy link
Contributor

@rexagod #1836 worked, we have two options we can merge it or you can pull the changes to your pr.

@rexagod
Copy link
Member Author

rexagod commented Oct 16, 2024

i'm waiting for the ci job to finish, if it works then we need something similar for the e2e-aws-encryption.

Since my contextual test-e2e-encryption-knowledge is limited and the fact that we've identified at-least a couple of issues [1][2] around the test, I'd reckon it'd be better if one of the previous authors with some spare cycles can take a look.

Otherwise, I'll revisit this as soon as I can find some.

@p0lyn0mial
Copy link
Contributor

Since my contextual test-e2e-encryption-knowledge is limited and the fact that we've identified at-least a couple of issues [1][2] around the test, I'd reckon it'd be better if one of the previous authors with some spare cycles can take a look.

@dgrisonnet has a pr for it - #1470

Copy link
Contributor

@p0lyn0mial p0lyn0mial left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one last comment, otherwise LGTM.

t.Fatalf("Failed to update resource: %v", err)
}
require.False(t, didUpdate)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to GET the updated obj and compare it to an expected one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, PTAL.

@@ -26,5 +26,6 @@ verify-podnetworkconnectivitychecks:
test-e2e-encryption: GO_TEST_PACKAGES :=./test/e2e-encryption/...
.PHONY: test-e2e-encryption

test-e2e-monitoring:
test-e2e-monitoring: GO_TEST_PACKAGES :=./test/e2e-monitoring/...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be useful to add verbosity in case the test starts failing in CI in the future.
You can do that by adding:

test-e2e-encryption: GO_TEST_FLAGS += -v

Copy link
Member Author

@rexagod rexagod Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added similar flags in addition to this, however, one that seemed quite off was the 4 hour timeout limit for the encryption test. Is that taking into account certain build overheads (which only happen in the CI)? I've added a 5 minute timeout for the monitoring test (the test itself takes less than a minute), but LMK if that should be increased to a similar amount.

Makefile Outdated
.PHONY: test-e2e-encryption

test-e2e-monitoring:
test-e2e-monitoring: GO_TEST_PACKAGES :=./test/e2e-monitoring/...
test-e2e-encryption: GO_TEST_FLAGS += -v
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to test-e2e-monitoring

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! 🤦🏼

Makefile Outdated
test-e2e-monitoring:
test-e2e-monitoring: GO_TEST_PACKAGES :=./test/e2e-monitoring/...
test-e2e-encryption: GO_TEST_FLAGS += -v
test-e2e-encryption: GO_TEST_FLAGS += -timeout 5m
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove this entry

Makefile Outdated
test-e2e-monitoring: GO_TEST_PACKAGES :=./test/e2e-monitoring/...
test-e2e-encryption: GO_TEST_FLAGS += -v
test-e2e-encryption: GO_TEST_FLAGS += -timeout 5m
test-e2e-encryption: GO_TEST_FLAGS += -p 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove this entry

Makefile Outdated
@@ -24,7 +24,15 @@ verify-podnetworkconnectivitychecks:
$(MAKE) -C pkg/operator/connectivitycheckcontroller verify

test-e2e-encryption: GO_TEST_PACKAGES :=./test/e2e-encryption/...
test-e2e-encryption: GO_TEST_FLAGS += -v
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's not change flags for test-e2e-encryption in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you remove changes to test-e2e-encryption? this part will be updated in #1470

require.False(t, didUpdate)
existingResource, err = dynamicClient.Resource(gvr).Namespace(resource.GetNamespace()).Get(context.TODO(), resource.GetName(), metav1.GetOptions{})
require.NoError(t, err)
require.Equal(t, gotUnstructured.UnstructuredContent(), existingResource.UnstructuredContent())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does require.Equal know how to compare maps reliably ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of comparing to the obj we received from the apply function shouldn't we compare to the expected one ?

Copy link
Contributor

@p0lyn0mial p0lyn0mial Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expected object is the one we created at the beginning of the test (resource) which was later updated at line 90, right ? (excluding fields like ResourceVersion)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by comparing against the expected object we know there no other (unexpected) modifications were made.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does require.Equal know how to compare maps reliably ?

ok, it looks like it uses reflect.DeepEqual under the hood.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer comparing against the expected object. Now, I think we should compare both the received object and the stored object against the expected one, since callers might use the received object as well, wdyt?

I don't expect many fields to be changed by the server. We could just overwrite or clear the ones we know will be provided by the server, such as RV.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, can you clarify what the expected, received, and stored objects are in this context?

To me (and I'm probably wrong here), unstructuredResource seems the same as the expected and stored object (as it's what we originally create in the beginning, and what we store?), whereas gotUnstructured is the one we receive?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I believe we are comparing the received object with the expected one here: 17a435d (#1823)? Not sure what the stored object is though (do you mean the one in the cluster?).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a patch that compares the expected (original) resource to the existing (cluster) and the received (returned by ApplyUnstructuredResourceImproved).

@rexagod rexagod force-pushed the regression-1575 branch 5 times, most recently from 8c680f3 to 9efe8f5 Compare October 17, 2024 15:01
Set resource version for all `monitoring.go` GVRs. Earlier, this was not
done by the machinery responsible for it, which caused manifest
applications that had no resource version present to encounter the
following error:
```
metadata.resourceVersion: Invalid value: 0x0: must be specified for an update
```
This patch addresses that regression, which was introduced originally in openshift#1575.

Signed-off-by: Pranshu Srivastava <[email protected]>
test-e2e-monitoring:
test-e2e-monitoring: GO_TEST_PACKAGES :=./test/e2e-monitoring/...
test-e2e-monitoring: GO_TEST_FLAGS += -v
test-e2e-monitoring: GO_TEST_FLAGS += -timeout 5m
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in a follow-up pr i would remove the timeout (let's use the default timeout) and the -p flag.

Copy link
Contributor

openshift-ci bot commented Oct 17, 2024

@rexagod: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@p0lyn0mial
Copy link
Contributor

/lgtm

@rexagod many thanks for the PR.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 17, 2024
Copy link
Contributor

openshift-ci bot commented Oct 17, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, p0lyn0mial, rexagod

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit c32b334 into openshift:master Oct 17, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants