Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incompatible change of OCM-Component-Descriptor-Normalisation between OCM-CLI v0.18.0 vs v0.19.0 #1214

Closed
ccwienk opened this issue Dec 27, 2024 · 30 comments · Fixed by #1218
Assignees
Labels
area/ipcei Important Project of Common European Interest kind/bugfix Bug
Milestone

Comments

@ccwienk
Copy link
Contributor

ccwienk commented Dec 27, 2024

Context

Our delivery-setup encompasses the replication of OCM-Component-Descriptors between multiple OCM-Repositories (dev, staging, ..). We sign our root component-descriptors at the "left-most" OCM-Repository, and validate signatures after replication into "downstream" OCM-Repositories to ensure integrity of replicated contents.

After upgrading from OCM-CLI v0.18.0 to v0.19.0, validation failed for downstream OCM-Repositories (while, interestingly, consistently not failing for the "left-most" OCM-Repository). I assume this deviance might stem from normalisation(s) that happened during downstream replication.

At any rate, it can consistently be shown that ocm hash componentversions command outputs different hash-digests for the same componentversion.

Reproducer

Assuming ocm-v0.18.0 and ocm-v0.19.0 are available from PATH, and are copies of OCM-CLI equal to respective version-suffixed, and furthermore assuming OCM-CLI is provided w/ required credentials, the following script will yield different hash-digests:

ocm-v0.18.0 \
 ocm hash componentversions \
 --output yaml \
--lookup OCIRegistry::europe-docker.pkg.dev/sap-se-gcp-k8s-delivery/releases-staging-private \
  github.com/gardener/gardener:v1.110.1

# digest: b40006cb7a995be00fdc9c5f85cc4239ee7387288697a9ce2e29482a579bf66d

ocm-v0.19.0 \
 ocm hash componentversions \
 --output yaml \
--lookup OCIRegistry::europe-docker.pkg.dev/sap-se-gcp-k8s-delivery/releases-staging-private \
  github.com/gardener/gardener:v1.110.1

# digest: a2d421e0ede2f34905a1006872e07017d2cd435a0680f3f378a593a5c7ade097

Wheareas the following calls (same assumptions as before) will yield equal digests (as expected):

ocm-v0.18.0 \
 ocm hash componentversions \
 --output yaml \
--lookup OCIRegistry::europe-docker.pkg.dev/sap-se-gcp-k8s-delivery/releases-private \
  github.com/gardener/gardener:v1.110.1

# digest: b40006cb7a995be00fdc9c5f85cc4239ee7387288697a9ce2e29482a579bf66d

ocm-v0.19.0 \
 ocm hash componentversions \
 --output yaml \
--lookup OCIRegistry::europe-docker.pkg.dev/sap-se-gcp-k8s-delivery/releases-private \
  github.com/gardener/gardener:v1.110.1

# digest: b40006cb7a995be00fdc9c5f85cc4239ee7387288697a9ce2e29482a579bf66d

The OCM-Repository from the second two-tuple of commands is the "left-most" OCM-Repository, whereas the OCM-Repository from the first two-tuple of commands is target of replication (from left-most repository).

Observed behaviour:

OCM-CLI v0.19.0 and v0.18.0 will yield different normalisations / hash-digests for equal component-descriptor-trees (see above)

Expected behaviour:

OCM-CLI v0.19.0 and v0.18.0 should yield equal normalisations / hash-digest for equal component-descriptor-tree.

@ccwienk ccwienk added the kind/bugfix Bug label Dec 27, 2024
@github-actions github-actions bot added the area/ipcei Important Project of Common European Interest label Dec 27, 2024
@jakobmoellerdev
Copy link
Contributor

Im assuming this is due to #1026
However Im struggling what to recommend now.
There should be an output argument for the normalized component descriptor called outfile in ocm hash which you can use to compare the two results. would be great if you could check the diff between them and let me know what youre regressing on.

@ccwienk
Copy link
Contributor Author

ccwienk commented Dec 29, 2024

digests-ocmcli.tar.gz
I uploaded the outputs of mentioned invocations of OCM-CLI

@jakobmoellerdev
Copy link
Contributor

As suspected, the diff (https://www.diffchecker.com/hsZgn13L/ for some clarity on the relevant diff) explains and redirects the issue back to the change mentioned above:

              "extraIdentity": [
                {
                  "version": "v1.11.3"
                }
              ]

in v0.18

now becomes

 "extraIdentity": null

in v0.19

The normalization now doesnt encode the version in by default anymore which is a breaking change that killed the signature. this has to be adjusted again (valid bug). the pr tried to circumvent this break with the following

So far the version was implicutly added to the extraIdentity of a CV if the rest is not
unique. In the future the extraIdentity should be explicitly set to be unique. Therefore,
the version was now implicitly added to the extraIdentity. This has two problems:

it was only done, if there was already an identity map set,
it was always done, when serializing a CD

To be comparable with older signed component versions, such defaulting may only be done
if the content of a component version is changed, otherwise the signature would be brocken.

This is only after some superficial checking from me ... but I believe this behavior regressed because the content check for the component version was only targeting handling of new component versions but accidentally also changed how existing component versions are normalized.

I am currently theorizing that since this behavior of hashing the component version is an implicit expectation on the signing with jsonNormalization/v1 this might prove difficult... basically we can never change the version identity defualting ever because it is implicitly part of the signing contract. meaning that we have to keep the old behavior the default (encode it) and have to explicitly toggle it off if wanted. What a mess... still investigating further

@jakobmoellerdev jakobmoellerdev moved this from 🆕 ToDo to 📋 Next-UP in OCM Backlog Board Dec 30, 2024
@jakobmoellerdev jakobmoellerdev added this to the 2025-Q1 milestone Dec 30, 2024
@jakobmoellerdev
Copy link
Contributor

for confirmation, could you also check the digests when you run with -N jsonNormalisation/v2 and let me know if the digests match up?

@ccwienk
Copy link
Contributor Author

ccwienk commented Dec 30, 2024

digests-ocmcli.tar.gz
I created the same outputs again, setting the v2-normalisation flag (those files I named w/ a v2-infix)

This will yield yet different hash-digests..

@jakobmoellerdev
Copy link
Contributor

I believe the issue is that the logic that was added will only work if the repository is the same. If we do a transfer between repositories, the modification flag on the component version will be true as the component version is not yet present in the staging repo.

Then, when the component version is added into the staging repository, it thinks that we are creating a new version instead of just transferring an old version. This then leads to the new logic kicking in and not writing the extraIdentity by default anymore where it previously did, which then triggers this issue.

Now we have to decide what to do:

  1. We potentially find a way so that the extraIdentity is left untouched during a transfer if a custom flag is set (for adding new component versions, its called preserve-signatures and was recently added)
  2. We implicitly do not modify any version during the transfer (this is currently what Im targeting). For this approach I am not sure how feasible it is. Maybe we can pass the modification false to the transfer, but that might break other logic, still investigating
  3. We revert the change

Still investigating further now.

@ccwienk
Copy link
Contributor Author

ccwienk commented Dec 30, 2024

@jakobmoellerdev : what would happen if we would always (implicitly) add version to artefact-identity + handle the case of version being part of extra-Identity to be equivalent to it not being added to extraID?

i.e. adding version to extraID basically would be ignored (unless version's value differs from version attribute, in which case this would result in an error).

if there is an incompatible change in ocm-cli, we will have to define an upgrade-path (note that our full delivery-chain has a turnaround of roughly two weeks, so ideally such an upgrade-path needs to be automation-friendly).

@jakobmoellerdev
Copy link
Contributor

jakobmoellerdev commented Dec 30, 2024

what would happen if we would always (implicitly) add version to artefact-identity + handle the case of version being part of extra-Identity to be equivalent to it not being added to extraID?

This is the defacto rule now (even though with a slight deviation) for the normalization v1 and v2, but only in case we can not uniquely identify a component version without it. Because that got removed in the recent change, the break appeared. Because this is part of the normalization in both v1 and v2, the only way we can ensure backwards compatibility is by reverting how this normalization is handled and then creating a v3 normalization that doesnt have this logic in its expecation.

Basically we need to Ensure that json normalization v1 & v2 always assume that extraIdentity will have a field called version defaulted if both the extraIdentity was not set before and there is another resource with the same name (but not the same version). This is now (even though it was never intended) a part of the Normalization Contract as it is implicit behavior.

@jakobmoellerdev
Copy link
Contributor

after verifiyng the component descriptors manually it turned out that one of the descriptors had an extraIdentity version field explicitly set. However during signature calculation in 0.19.0 and above (main) this was niled out so this is definetly buggy. I believe this should be fixeable, checking where this breaks now

@jakobmoellerdev
Copy link
Contributor

considering that a simple ocm get cv used to return:

ocm0.18.0 get resources --lookup OCIRegistry::europe-docker.pkg.dev/sap-se-gcp-k8s-delivery/releases-staging-private   github.com/gardener/gardener:v1.110.1 nginx-ingress-controller
NAME                     VERSION IDENTITY            TYPE     RELATION
nginx-ingress-controller v1.9.6                      ociImage external
nginx-ingress-controller v1.11.3 "version"="v1.11.3" ociImage external

and now returns

ocm get resources --lookup OCIRegistry::europe-docker.pkg.dev/sap-se-gcp-k8s-delivery/releases-staging-private   github.com/gardener/gardener:v1.110.1 nginx-ingress-controller
NAME                     VERSION IDENTITY            TYPE     RELATION
nginx-ingress-controller v1.11.3 "version"="v1.11.3" ociImage external
nginx-ingress-controller v1.9.6  "version"="v1.9.6"  ociImage external

is concerning because the v1.11.3 identity attribute does not actually exist in the repository. I believe that get cv should only return what is in the repository, but the logic seems to have touched rendering of all component versions, regardless if it comes from a repository or not.

Current Assumption: Because ocm 0.19.0 now has the version encoded always, it also cannot determine if the identity previously existed or not. Then it will fail to check if it needs to add the version to the identity during the signature check. This will then break.

@jakobmoellerdev
Copy link
Contributor

To fix this issue I now managed to return backwards compatibility to v1+v2 normalizations with a legacy code patch and by introducing a new v3 normalisation algorithm that doesnt need to do this. Additionally I added some test cases to make sure this doesnt happen again.

Apart from the extra identity defaulting v2 and v3 are identical. Now also old component versions will be verifiable if using the correct algorithm.

I will backport the v1/v2 correction to v0.19 for sure. I will discuss with the team if we even will offer the v3 normalization as part of the hotfix but that shouldnt matter for this bug. If not, then v3 will be available with v0.20

@mandelsoft
Copy link
Contributor

The problem is not the normalization, which should be completely fine (at least for this scenario, here).
The problem is the defaulting of the extra identity accidently done during the transport, which has been introduced with #1026.

We have to assure, that this defaulting is only done during component version composition for creation, not during the transport process.

So far, the transport tool checks for a non breaking change during the transport, and then uses the Modification option when composing the target component version. Unfortunately, this option also enables the defaulting. Therefore, the transport changes signature relevant information of the component descriptor, which finally breaks the hash and the signature.

A solution is shown in #1223, which disabled this defaulting by using a new modification option.
Another possibility would be to change the transport behavior by not using the SetResource method anymore. But this would be even more complicated and complex.

Unfortunately, I don't have access to the SAP internal staging repos anymore, so I cannot test this change for the actual issue.
I added a test for transports using CTFs, only.

@jakobmoellerdev
Copy link
Contributor

I checked the normalization explicitly via test case and confirmed it is indeed not fine. The normalized json should not have changed, but it did. The extra identity defaulting in v1/v2 is part of the normalization contract and thus also of the signature. It is much easier for users to migrate to the new normalization algorithm than to have to check for "should modify" everytime they do a transport.

We simply cannot continue increasing usage of "should modify" in every single command that touches the descriptors. Not changing the descriptors should be the default but as you mentioned its already messy enough with SetResource as is. I do not want to change behavior anymore without incrementing versions.

We can also adopt the flag to prohibit modifying signature relevant elements, but it should still not have broken the existing component versions after the default happened in the transport.

@mandelsoft
Copy link
Contributor

mandelsoft commented Jan 6, 2025

The normalized json only changes, if the extra identity field of the CD is changed. This is not a problem of the normalization, this is the intention of the normalization.

In the old version (18) there was no defaulting when calling SetResource, this has been changed with 19. Now the transport indeed changes the extraIdentity field in the CD. This for sure MUST change the normalization version. So, the normalization works perfectly fine.

The problem is the change of the extraIdentity field during the transport, which MUST NOT happen. So, the fix is to prohibit this change of the signature relevant CD fields during the transport instead of changing the normalization.

The defaulting should not be part of the normalization and it is not part of the normalization. The existing call to the Default method of the serialization version just
initializes the slices to prevent different handling of an empty slice and no slice. It does not change the content.

@jakobmoellerdev
Copy link
Contributor

If you compare ocm's output of v0.18.0 and v0.19.0 on the same component version, the hash changes. This is not intended, but now unfortunately part of the contract.

Since between v0.18.0 and v0.19.0 the digest can not have changed on its own, this cannot be due to the transport. (because the transport was only triggered once and that should have changed the digest only once, however I can reliable "switch" between the digests with the same component version). Thus the normalization is also broken. It is NOT fine for the hash to suddenly change from older to newer versions, even if the behavior is potentially incorrect. This is why v3 exists.

@mandelsoft
Copy link
Contributor

mandelsoft commented Jan 6, 2025

The hash changed AFTER the transport when using version 19 as explained in the issue. It did not happen with version 18. Again, this is the problem. It occurs because of an error in #1026. The behavior of version 19 should NOT be preserved, because it is WRONG.

@jakobmoellerdev
Copy link
Contributor

The behavior of v0.18.0 cannot be restored for component versions which already have the defaults set (already transported component versions which were either transported with v0.19.0 like in this example, or that were transported before but with explicit extra identity attributes matching the version defaulting done in v0.18.0), no matter if explicit or not, because that is part of the normalization.

@jakobmoellerdev
Copy link
Contributor

To be clear, Im not arguing against not defaulting in the transport with a flag (the change you introduced) as this will fix it defaulting on the transport in the first place. But still we cannot change the hash calculation for old component versions.

@mandelsoft
Copy link
Contributor

mandelsoft commented Jan 6, 2025

I don't understand what you mean.

If you transport something with version 19, which has been created with v18, is signed AND has been based on the implicit extraIdentity handling for resources with the same name, then the resulting version in the target is corrupted, you cannot verify it anymore without fixing the defaulted version in the extraIdentity of the concerned resources in the CD. Therefore, everything transported with v19 is potentially corrupted and should be re-transported with a fixed version, before it can be used.

This has nothing to do with the normalization.

We have to fix the transport handling, not the normalization.

@jakobmoellerdev
Copy link
Contributor

I don't understand what you mean.

Assume you have a component that already contains an extra identity with the version defaulting in place, as well as a component without it. In this case, v1 and v2 of the jsonNormalization will equalize them and will be equal in v0.18.0 and below, whereas in v0.19.0 they are not. While in itself a good thing, this still is a backwards incompatible change.

You are surely right that we have to fix the transport handling (too) to avoid future component verisons having no extraIdentity set in the component descriptor. Nevertheless, the old hash calculation method must be preserved and versioned off correctly, because this could also have happened intentionally.

For sure, the transport handling adjustment will allow to actually not have the defaulting in place, which was buggy in 0.19.0. Nevertheless, the hash behavior of 0.19.0 must not change for old component versions which were not modified and for which there is no new signature. It does not matter if the signature methodology is actually the correct one, it needs to be consistent. To be correct, we should urge to resign / normalize with v3.

Thus we need:

  1. No Transport should touch the extra identity (or any other signature relevant attribute) without explicitly being allowed to (your fix)
  2. No existing component version should ever have its hash value changed for jsonNormalisation/v1 & jsonNormalisation/v2 even if the method is flawed due to implicit defaulting between old and new CLI versions
  3. jsonNormalisation/v3 must exist to encourage and eventually enforce moving away from v1/v2 and to get rid of the flawed defaulting.

@mandelsoft
Copy link
Contributor

mandelsoft commented Jan 6, 2025

Assume you have a component that already contains an extra identity with the version defaulting in place, as well as a component without it. In this case, v1 and v2 of the jsonNormalization will equalize them and will be equal in v0.18.0 and below, whereas in v0.19.0 they are not. While in itself a good thing, this still is a backwards incompatible change.

No, they don't!

The normalizations as well as the hashes are different for both CVs, but identical each for v18 and v19.

If you have two component versions:

name: github.com/mandelsoft/test1
version: 1.0.0
provider:
  name: mandelsoft.org
resources:
  - name: multi
    type: plainText
    version: v1
    input:
      type: utf8
      text: test data
  - name: multi
    type: plainText
    version: v2
    input:
      type: utf8
      text: extended test data

and

name: github.com/mandelsoft/test1
version: 1.0.0
provider:
  name: mandelsoft.org
resources:
  - name: multi
    type: plainText
    version: v1
    extraIdentity:
      version: v1
    input:
      type: utf8
      text: test data
  - name: multi
    type: plainText
    version: v2
    extraIdentity:
      version: v2
    input:
      type: utf8
      text: extended test data

The CD for the first one does NOT contain the extraIdentty if created with v18:

omponent:
  componentReferences: []
  creationTime: "2025-01-06T17:20:36Z"
  name: github.com/mandelsoft/test1
  provider: mandelsoft.org
  repositoryContexts: []
  resources:
  - access:
      localReference: sha256:916f0027a575074ce72a331777c3478d6513f786a591bd892da1a577bf2335f9
      mediaType: application/octet-stream
      type: localBlob
    digest:
      hashAlgorithm: SHA-256
      normalisationAlgorithm: genericBlobDigest/v1
      value: 916f0027a575074ce72a331777c3478d6513f786a591bd892da1a577bf2335f9
    name: multi
    relation: local
    type: plainText
    version: v1
  - access:
      localReference: sha256:920ce99fb13b43ca0408caee6e61f6335ea5156d79aa98e733e1ed2393e0f649
      mediaType: application/octet-stream
      type: localBlob
    digest:
      hashAlgorithm: SHA-256
      normalisationAlgorithm: genericBlobDigest/v1
      value: 920ce99fb13b43ca0408caee6e61f6335ea5156d79aa98e733e1ed2393e0f649
    name: multi
    relation: local
    type: plainText
    version: v2
  sources: []
  version: 1.0.0
meta:
  schemaVersion: v2

Therefore, the hashes are different. The normalization never changes the content, neither in v18 nor v19.

$ ../../tmp/ocm.v18 hash cv ./ctf-impl
COMPONENT                   VERSION HASH
github.com/mandelsoft/test1 1.0.0   78d71c1c582795de6e26429c47aeeaf7bf3131d4ceb787c17c426bc8f35af55d
$ ../../tmp/ocm.v18 hash cv ./ctf-expl
COMPONENT                   VERSION HASH
github.com/mandelsoft/test1 1.0.0   82f1c93d2b93442be75058d633e77605d4226f8be59d74820d985da3a70b23c3
$ ocm hash cv ./ctf-expl
COMPONENT                   VERSION HASH
github.com/mandelsoft/test1 1.0.0   82f1c93d2b93442be75058d633e77605d4226f8be59d74820d985da3a70b23c3
$ ocm hash cv ./ctf-impl
COMPONENT                   VERSION HASH
github.com/mandelsoft/test1 1.0.0   78d71c1c582795de6e26429c47aeeaf7bf3131d4ceb787c17c426bc8f35af55d

If you create the implicit version with v19, you get indeed the same as for the explicit one.

ocm hash cv ./ctf-expl ./ctf-impl-v19/
COMPONENT                   VERSION HASH
github.com/mandelsoft/test1 1.0.0   82f1c93d2b93442be75058d633e77605d4226f8be59d74820d985da3a70b23c3
github.com/mandelsoft/test1 1.0.0   82f1c93d2b93442be75058d633e77605d4226f8be59d74820d985da3a70b23c3

but this is correct, because the resource meta datas are identical because of the defaulting done by v19 during the COMPOSITION.

So, the normalization is correct and not the problem, the problem is the wrong behavior during the transport, switching from the implicit to the explicit representation.

@jakobmoellerdev
Copy link
Contributor

jakobmoellerdev commented Jan 6, 2025

Im not gonna continue going back and forth with you here. The change of the algo is not about the case you described above. Its about component versions that already contain the extra identity. youre looking at components that do not yet contain the extra identity (which is fine for new components, but not already existing ones).

The issue is that you assume component versions transported with v0.19.0 are "corrupt". However, because its part of a GA delivery, this is now part of the contract. Its GA for a reason.

It is simply not acceptable that there is a component version in repositories now (which btw could have been generated in other ways than through the transport command) that can output different results based on which ocm cli version you use.

Of course we could issue a warning that new component versions after transport might contain the extra identity field or not, but the issue is still that the component descriptor itself DID contain the extra identity field before. you simply cannot remove this behavior.

@mandelsoft
Copy link
Contributor

GA or not GA. I introduced a bug, which must be fixed. The behavior is NOT correct.

If you transport such a CV created and signed with v18 and transport it with v19 the target is definately corrupted.
It cannot be fixed by a new tool version, because you cannot distinguish this case from an explicitly given extraIdentity.

Even your fix, which by the way violated the basic OCM assumption to never change signature relevant content (especially during signing) cannot change this fact, because both cases (implicitly added by the erroneous transport tool or explicitly given) are using the same normalization method.

@jakobmoellerdev
Copy link
Contributor

First of all. Your test case is wrong. you are testing with a component constructor and you need to test with existing component versions with extra Identity set to {}, for example:

component:
  componentReferences: []
  creationTime: "2025-01-07T11:35:56Z"
  name: github.com/jakobmoellerdev/hash
  provider: github.com/jakobmoellerdev
  repositoryContexts: []
  resources:
  - access:
      imageReference: ghcr.io/stefanprodan/podinfo:6.7.1
      type: ociArtifact
    name: podinfo
    extraIdentity: {}
    relation: external
    type: ociImage
    version: 6.7.1
  - access:
      imageReference: ghcr.io/stefanprodan/podinfo:6.3.1
      type: ociArtifact
    name: podinfo
    extraIdentity: {}
    relation: external
    type: ociImage
    version: 6.3.1
  sources: []
  version: 1.0.0
meta:
  schemaVersion: v2

This you can then use to test your transfer and also the hash command. Also you can not depend on get cv because that will also run through the defaulting.

You will observe that not only does #1223 achieve nothing (because v0.19.0 does not actually default the default identity, as intended before), v0.18 and below (we tested down until v0.10) DO the defaulting of the extra Identity during transport.

As such the only behavior that can work, is to introduce a break in the normalization and move on. The defaulting behavior introduced way back was implicitly added to the contract

@github-project-automation github-project-automation bot moved this from 🔍 Review to 🍺 Done in OCM Backlog Board Jan 8, 2025
jakobmoellerdev added a commit to jakobmoellerdev/ocm that referenced this issue Jan 8, 2025
…ion/v1 as well as jsonNormalisation/v2 (open-component-model#1218)

<!-- markdownlint-disable MD041 -->
#### What this PR does / why we need it

Package jsonv3 provides a normalization which is completely based on the
abstract (internal) version of the component descriptor and is therefore
agnostic of the final serialization format. Signatures using this
algorithm can be transferred among different schema versions, as long as
is able to handle the complete information using for the normalization.
jsonv2 is the predecessor of this version but had internal defaulting
logic that is no longer included as part of this normalization. Thus v3
should be preferred over v2. Note that between v2 and v3 differences can
occur mainly if the "extra identity" field is not unique, in which case
the v2 normalization opinionated on how to differentiate these items.
This no longer happens in v3, meaning the component descriptor is
normalized as is.

v2 and v1 were adjusted to accomodate the old(but new because forgotten)
legacy behavior in legacy.go. Without this, old signatures would not
work. This means this should be (at least partially) back-ported to the
last minor versioned released without this correction.

#### Which issue(s) this PR fixes
<!--
Usage: `Fixes #<issue number>`, or `Fixes (paste link of issue)`.
-->
This issue fixes open-component-model#1214
and supercedes open-component-model#1217 as
a better solution longterm (by getting rid of the old normalization) and
shortterm (by achieving full backwards compatibility + introducing a
simple test case)

Note that this changes the default normalization algorithm to be
`jsonNormalisation/v3` instead of `jsonNormalisation/v1` as it is
important for users to migrate as soon as possible.
@jakobmoellerdev
Copy link
Contributor

We will soon release v0.19.1 as a release candidate so that this can be tested. To hash existing component versions, please use

 ocm hash componentversions \
 --output yaml \
 --lookup OCIRegistry::europe-docker.pkg.dev/sap-se-gcp-k8s-delivery/releases-private \
  github.com/gardener/gardener:v1.110.1
 -N jsonNormalisation/v1

this will return the same output as previous CLIs.

please switch to the new normalization algorithm jsonNormalisation/v3 at your earliest convenience for hashing and validating (the new default)

ikhandamirov pushed a commit that referenced this issue Jan 8, 2025
…ion/v1 as well as jsonNormalisation/v2 (#1218) (#1230)

Note that this introduces a json normalization algorithm v3 which
technically will break a minor release. however this is needed as we
currently broke hashing and need to offer a working alternative. thus it
should have been part of v0.19.0 already:
#1214

<!-- markdownlint-disable MD041 -->
#### What this PR does / why we need it

Package jsonv3 provides a normalization which is completely based on the
abstract (internal) version of the component descriptor and is therefore
agnostic of the final serialization format. Signatures using this
algorithm can be transferred among different schema versions, as long as
is able to handle the complete information using for the normalization.
jsonv2 is the predecessor of this version but had internal defaulting
logic that is no longer included as part of this normalization. Thus v3
should be preferred over v2. Note that between v2 and v3 differences can
occur mainly if the "extra identity" field is not unique, in which case
the v2 normalization opinionated on how to differentiate these items.
This no longer happens in v3, meaning the component descriptor is
normalized as is.

v2 and v1 were adjusted to accomodate the old(but new because forgotten)
legacy behavior in legacy.go. Without this, old signatures would not
work. This means this should be (at least partially) back-ported to the
last minor versioned released without this correction.

#### Which issue(s) this PR fixes
<!--
Usage: `Fixes #<issue number>`, or `Fixes (paste link of issue)`. -->
This issue fixes #1214
and supercedes #1217 as
a better solution longterm (by getting rid of the old normalization) and
shortterm (by achieving full backwards compatibility + introducing a
simple test case)

Note that this changes the default normalization algorithm to be
`jsonNormalisation/v3` instead of `jsonNormalisation/v1` as it is
important for users to migrate as soon as possible.

This is a backport of
a93ecca
in #1218
@mandelsoft
Copy link
Contributor

The problem is even much more complicated.

It consists of two problems:

  • the one I observed with the transport logic, which was introduced with v0.19.0. This is hopefully completely fixed with fix: target option to disable extra identity defaulting in transport tool #1223
  • a very very old problem with the conversion from the serialization format to the internal format, which was uncovered with v0.19.0, which I didn't saw. But this is unfortunately not a problem with the normalization.

The problem is caused by a typical error using the Golang range iterator in the default.go sources of the serialization versions.

func DefaultResources(component *ComponentDescriptor) {
	for i, res := range component.Resources {
		if res.Relation == v1.LocalRelation && len(res.Version) == 0 {
			component.Resources[i].Version = component.GetVersion()
		}

		id := res.GetIdentity(component.Resources)
		if v, ok := id[SystemIdentityVersion]; ok {
			if res.ExtraIdentity == nil {
				res.ExtraIdentity = v1.Identity{
					SystemIdentityVersion: v,
				}
			} else {
				if _, ok := res.ExtraIdentity[SystemIdentityVersion]; !ok {
					res.ExtraIdentity[SystemIdentityVersion] = v
				}
			}
		}
	}
}

This method was called in v0.18.0 and removed in v0.19.0.
This was the first problem, because this method changed the CD when it was transformed back to a serialization format.
This may be fine (although is not really nice, but we cannot change it because it is in since the very first beginning).
Because signing always means to rewrite the CD it is basically not a problem, and it is not bothering the transport logic.
The idea was to get rid of this crazy implicit defaulting of the version extraIdentity as soon as possible. It SHOULD adapt the extraIdentity to always contain the version as explicit field if required. So far, so good. If it would do this, everything would be perfect if add the defaulting again as fix.

Unfortunately it does NOT do this. Because the loop variable res is always a copy AND the array fields are structs and not pointers AND the extra identity is a map (which is no value in the sense of Go, but a pointer), the initialization of the extraIdentity field never reached the original CD. Therefore, the defaulting was only really done, if the field originally contained a (potentially) empty map.

Because the ocm lib and the CLI never created an empty map, everything was fine in our tests, but Christian uses a python implementation, which always serializes an empty map. So the defaulting was never done for CV created with the ocm lib, and always when signing a CV created with python. If there are some resources with a nil map and others with an empty map, then the result will be even more confusing, and I think the fix with #1218 would not work correctly.

The default itself would be a good idea to be prepared for getting rid of this implicit version handling in the future. However, the fact that is was rarely done, destroys this effect again.

So, a compatible fix would be just to add this function again, but with the current behavior (fixing the code to make this obvious).

This is the first thing we should do. And then we could clarify how we could improve the messy situation.
But I'm still not convinced, that corrupting the normalization method is really a good idea, because it introduces an error at a second place to fix the original error.

@jakobmoellerdev
Copy link
Contributor

I appreciate your thoughts here but I really have to disagree.

  1. We can and should not change the behavior again to reintroduce defaulting on the descriptor. Your plan was to remove it and it should be kept like that
  2. This change and also the behavior change on old descriptors should just be versioned off and kept in the old versions so that new versions dont do this.
  3. The fact that you want to reintroduce a defaulting to get rid of implicit handling is a nightmare to code and understand against, I rather have a normalization that does not do any of this defaulting, and keeps the code clean.

The decision will ultimately lay with the team here, and I will bring it up for discussion. Expect an outcome tomorrow.

One more thing: If it turns out that we will stay with the new normalization algorithm as intended we can no longer accept #1223 because it now contains an additional commit you added to fix "two things in one PR" again, even after the original was approved. Please refrain from doing this.

@mandelsoft
Copy link
Contributor

mandelsoft commented Jan 8, 2025

Unfortunately, I see only two possibilities, may be you can find better ones:

  1. we stick to the faulty defaulting done in the format conversion, then we don't have to change anything else (for this error flavor).
  2. we follow your approach to error-change the existing normalization methods with changing the component descriptor to compensate the change to a conversion without defaulting and introduce a new default normalization.
    a. we could do this with change of the CV as proposed
    b. we could also do this without a change to the CV, but calling the final norm method on a defaulted copy of the CD (may be this would be better because it does not violate the normalization contract)

I agree with you, that on the long term solution 1 seems to be no good idea. BUT:

  • On the long term we want to get rid of the defaulting at all, this must be the final goal.
  • So far, we have to keep the defaulting in the meta data API querying the effective identity (the most ugly part of the defaulting), to handle the old content. In parallel we want to assure that new CVs will be enforced to always use explicit extraIdentities. As long as we have to keep the defaulting elsewhere, there should be no problem to keep it in the conversion, also.
  • For CVs with always complete extraIdentities, the defaulting in the conversion has no effect anymore, therefore it does not bother if it is still there, or not.
  • To really get rid of the complete defaulting coding we have to live with not being able to handle very old CVs anymore. It is not covered by any available version information. This may be done after a long time of creating complete identities for new CVs in parallel with still using the defaulting (at least everywhere else). Unfortunately we cannot distinguish between the old and the new behavior if CVs are not created with this library (normalization is not necessarily involved). This might be fine, if Christian's python implementation is fixed accordingly, because he is the only one using a non-Go implementation.
  • So, if we finally deprecate the old behavior at all, we could remove both defaultings and the rest of the ugly code at the same time. The very old CVs cannot be handled correctly anymore after removing the defaulting code. (And it cannot be kept as fallback anyway, because the old cases are not covered by any available version information).

The consequence is just, that under the desired goal and the path to reach it, keeping the defaulting in the conversion does not disturb; it is a minor change and makes it not more complicated to reach the goal. The amount of coding is negligible compared with the rest of the still required defaulting-related coding. But we could avoid ugly coding in the existing normalization, the deprecation of the old normalization methods and the need to introduce a new one.

Under these points of view keeping the old conversion until we finally can get rid of the defaulting at all seems to me the very best solution.

If you really want to follow the more disrupting path of changing the complete normalization, then I would at least prefer option 2b without changing the CV. >Anyway, if we follow this path we also have to assure that mixed scenarios are working correctly, so we should fix the defaulting (2a and 2b) to do it for non-nil maps in the original CD, only, as has been done in the conversion logic before (because of this faulty handling of the iterator). I'm not sure, that this is done correctly in the currently merged fix. BTW. solution 2 just moves coding (you call correctly a nightmare) from the conversion to this normalization, is this really much better, here, you don't get rid of it, anyway, until the defaulting can be removed completely.

@jakobmoellerdev
Copy link
Contributor

On the long term we want to get rid of the defaulting at all, this must be the final goal.

On the long term, we do not want to stick with OCM in this library form at all. Thats why my final goal is to stabilize existing behavior, and make breaking changes abundantly clear.

To really get rid of the complete defaulting coding we have to live with not being able to handle very old CVs anymore. It is not covered by any available version information.

The defaulting coding has to stay in versions that are still GA and are deprecated (this is happening now). It should not be removed completely so as to not break anyone.

So, if we finally deprecate the old behavior at all, we could remove both defaultings and the rest of the ugly code at the same time. The very old CVs cannot be handled correctly anymore after removing the defaulting code.

This is not deprecating anything, this is called a breaking change. Please try to stick within the constraint of not introducing breaking changes without explicit version increments.

If you really want to follow the more disrupting path of changing the complete normalization, then I would at least prefer option 2b without changing the CV.

You made this abundantly clear, I still disagree.

BTW. solution 2 just moves coding (you call correctly a nightmare) from the conversion to this normalization, is this really much better, here, you don't get rid of it, anyway, until the defaulting can be removed completely.

Yes it is because it is versioned off. This is what versions are meant for, to break off unintended behavior and introduce a new version without breaking anyone.

so we should fix the defaulting [...] to do it for non-nil maps in the original CD, only, as has been done in the conversion logic before (because of this faulty handling of the iterator). I'm not sure, that this is done correctly in the currently merged fix.

Please take a look at the coding with the weird defaulting (

func (o *ElementMeta) GetIdentity(accessor ElementListAccessor) metav1.Identity {
):


// DefaultResources defaults a list of resources.
// The version of the component is defaulted for local resources that do not contain a version.
// adds the version as identity if the resource identity would clash otherwise.
// The version is added to an extraIdentity, if it is not unique without it.
func DefaultResources(component *ComponentDescriptor) {
	for i, res := range component.Resources {
		if res.Relation == v1.LocalRelation && len(res.Version) == 0 {
			component.Resources[i].Version = component.GetVersion()
		}

		id := res.GetIdentity(component.Resources)
		if v, ok := id[SystemIdentityVersion]; ok {
			if res.ExtraIdentity == nil {
				component.Resources[i].ExtraIdentity = v1.Identity{
					SystemIdentityVersion: v,
				}
			} else {
				if _, ok := res.ExtraIdentity[SystemIdentityVersion]; !ok {
					res.ExtraIdentity[SystemIdentityVersion] = v
				}
			}
		}
	}
}

// Copy copies identity.
func (i Identity) Copy() Identity {
	if i == nil {
		return nil
	}
	n := Identity{}
	for k, v := range i {
		n[k] = v
	}
	return n
}


// GetIdentity returns the identity of the object.
func (o *ElementMeta) GetIdentity(accessor ElementListAccessor) metav1.Identity {
	identity := o.ExtraIdentity.Copy()
	if identity == nil {
		identity = metav1.Identity{}
	}
	identity[SystemIdentityName] = o.Name
	if identity.Get(SystemIdentityVersion) == "" && accessor != nil {
		found := false
		l := accessor.Len()
		for i := 0; i < l; i++ {
			m := accessor.Get(i).GetMeta()
			if m.GetName() == o.Name {
				mid := m.GetExtraIdentity()
				mid.Remove(SystemIdentityVersion)
				if mid.Equals(o.ExtraIdentity) {
					if found {
						identity[SystemIdentityVersion] = o.Version
						break
					}
					found = true
				}
			}
		}
	}
	return identity
}

You can see that it does not matter if the map is nil or not for this specific part of the normalization because there is an if statement that catches this and creates an empty identity anyways.

You can test this with

package compdesc

import (
	"strconv"
	"testing"

	"github.com/stretchr/testify/assert"
)

func TestDefaultResources(t *testing.T) {

	descs := []any{
		[]byte(`
component:
  componentReferences: []
  creationTime: "2025-01-07T11:35:56Z"
  name: github.com/jakobmoellerdev/hash
  provider: github.com/jakobmoellerdev
  repositoryContexts: []
  resources:
  - access:
      imageReference: ghcr.io/stefanprodan/podinfo:1.0.0
      type: ociArtifact
    name: podinfo
    extraIdentity: {}
    relation: external
    type: ociImage
    version: 1.0.0
  - access:
      imageReference: ghcr.io/stefanprodan/podinfo:2.0.0
      type: ociArtifact
    name: podinfo
    extraIdentity: {}
    relation: external
    type: ociImage
    version: 2.0.0
  sources: []
  version: 1.0.0
meta:
  schemaVersion: v2
`),
		[]byte(`
component:
  componentReferences: []
  creationTime: "2025-01-07T11:35:56Z"
  name: github.com/jakobmoellerdev/hash
  provider: github.com/jakobmoellerdev
  repositoryContexts: []
  resources:
  - access:
      imageReference: ghcr.io/stefanprodan/podinfo:1.0.0
      type: ociArtifact
    name: podinfo
    relation: external
    type: ociImage
    version: 1.0.0
  - access:
      imageReference: ghcr.io/stefanprodan/podinfo:2.0.0
      type: ociArtifact
    name: podinfo
    relation: external
    type: ociImage
    version: 2.0.0
  sources: []
  version: 1.0.0
meta:
  schemaVersion: v2
`),
		&ComponentDescriptor{
			ComponentSpec: ComponentSpec{
				Resources: []Resource{
					{
						ResourceMeta: ResourceMeta{
							ElementMeta: ElementMeta{
								Name:          "bla",
								Version:       "1.0.0",
								ExtraIdentity: nil,
							},
						},
					},
					{
						ResourceMeta: ResourceMeta{
							ElementMeta: ElementMeta{
								Name:          "bla",
								Version:       "2.0.0",
								ExtraIdentity: nil,
							},
						},
					},
				},
			},
		},
		&ComponentDescriptor{
			ComponentSpec: ComponentSpec{
				Resources: []Resource{
					{
						ResourceMeta: ResourceMeta{
							ElementMeta: ElementMeta{
								Name:          "bla",
								Version:       "1.0.0",
								ExtraIdentity: Identity{},
							},
						},
					},
					{
						ResourceMeta: ResourceMeta{
							ElementMeta: ElementMeta{
								Name:          "bla",
								Version:       "2.0.0",
								ExtraIdentity: Identity{},
							},
						},
					},
				},
			},
		},
	}

	for i, raw := range descs {
		t.Run(strconv.Itoa(i), func(t *testing.T) {
			a := assert.New(t)
			desc, isParsed := raw.(*ComponentDescriptor)

			if !isParsed {
				var err error
				desc, err = Decode(raw.([]byte))
				a.NoError(err)
			}

			DefaultResources(desc)

			a.Equal("1.0.0", desc.Resources[0].Version)
			a.Equal("1.0.0", desc.Resources[0].ExtraIdentity[SystemIdentityVersion])
		})
	}
}

@ccwienk
Copy link
Contributor Author

ccwienk commented Jan 14, 2025

@jakobmoellerdev : let's have another round of follow-up discussion, please. disregarding the code, I think @mandelsoft has a valid point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment