Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extraction of license text from files. #193

Merged
merged 18 commits into from
Dec 9, 2024

Conversation

AugustusKling
Copy link
Contributor

@AugustusKling AugustusKling commented Oct 29, 2024

Untested code for license text extraction.

fixes #33

@jkowalleck
Copy link
Member

jkowalleck commented Oct 29, 2024

we have a basic implementation here: https://github.com/CycloneDX/cyclonedx-webpack-plugin/blob/72700f06d00eac79fa3b91fe838bd78c583346a2/src/extractor.ts#L135

I was thinking of pulling this one into the CDX library,. so it is available for every downstream user - like here...
Could you review that other implementation and see if it suits your case here?

PS: I found that your implementation is basically a copy/past from the mentioned implementation. So i guess it is any good -- so better pull it over to the library, than copy/pasting it here.

src/evidence.ts Outdated Show resolved Hide resolved
@AugustusKling
Copy link
Contributor Author

Taking a file name + a data blob and converting it to an instance of CDX.Models.License would make sense in the CDX library. Assuming specific loggers or that packages contents would be kept in ordinary files should not be part of the library.

Since you explicitly referred to the other implementation in ticket #33, the implementation here matches it. One would not want different behavior for different libraries under the CycloneDX umbrella.

Before this PR has an chance of moving forward, this repo needs to be fixed. The current ESLint setup does not work with the present TypeScript version. You can do a yarn up 'typescript@<5.4.0' to change to a compatible version by the way.

Also, the tests in the repo depend on yarn install for the test projects as shown by tests/integration/setup.js. This was not necessary in the original code and should not be necessary, now. The plugin needs to retrieve the packages to be able to read their manifests (package.json) and now the license texts. But it must not execute an yarn install ... or similar in order to produce an SBOM as the dependency resolution is retained in the yarn.lock file and installations are risky security-wise.

@jkowalleck
Copy link
Member

jkowalleck commented Oct 30, 2024

Since you explicitly referred to the other implementation in ticket #33, the implementation here matches it. One would not want different behavior for different libraries under the CycloneDX umbrella.

Exactly.

here is the (WIP/draft) PR to bring the functionality to the library: CycloneDX/cyclonedx-javascript-library#1158

@jkowalleck
Copy link
Member

Before this PR has an chance of moving forward, this repo needs to be fixed. The current ESLint setup does not work with the present TypeScript version. You can do a yarn up 'typescript@<5.4.0' to change to a compatible version by the way.

Also, the tests in the repo depend on yarn install for the test projects as shown by tests/integration/setup.js. This was not necessary in the original code and should not be necessary, now. The plugin needs to retrieve the packages to be able to read their manifests (package.json) and now the license texts. But it must not execute an yarn install ... or similar in order to produce an SBOM as the dependency resolution is retained in the yarn.lock file and installations are risky security-wise.

Please discuss these things in individual extra tickets. Thank you in advance.

@jkowalleck
Copy link
Member

CycloneDX/cyclonedx-javascript-library#1158 was postponed and will not ship any soon.

Please continue your work crafting a yarn=specific implementation.
This will help understand opportunities to generalize in the upstream-library later.

@jkowalleck
Copy link
Member

i had to fix one of the github workflows.
Please rebase/merge the latest main branch.

@AugustusKling
Copy link
Contributor Author

Note that despite the build passing in the PR validations, the existing tests fail for me. This is with the yarn.lock from the latest release at https://github.com/CycloneDX/cyclonedx-node-yarn/releases/tag/v1.0.2

Example with similar changes in many test cases:

[test:node    ]       + expected - actual
[test:node    ]
[test:node    ]            </properties>
[test:node    ]          </metadata>
[test:node    ]          <components>
[test:node    ]            <component type="library" bom-ref="in-array@npm:0.1.2">
[test:node    ]       -      <author>Jon Schlinkert (https://github.com/jonschlinkert)</author>
[test:node    ]       +      <author>Jon Schlinkert</author>
[test:node    ]              <name>in-array</name>
[test:node    ]              <version>0.1.2</version>
[test:node    ]              <description>Return true if a value exists in an array. Faster than using indexOf and won't blow up on null values.</description>
[test:node    ]              <licenses>
[test:node    ] --
[test:node    ]                  <url>https://github.com/jonschlinkert/in-array/issues</url>
[test:node    ]                  <comment>as detected from PackageJson property "bugs.url"</comment>
[test:node    ]                </reference>
[test:node    ]                <reference type="vcs">
[test:node    ]       -          <url>jonschlinkert/in-array</url>
[test:node    ]       -          <comment>as detected from PackageJson property "repository"</comment>
[test:node    ]       +          <url>git+https://github.com/jonschlinkert/in-array.git</url>
[test:node    ]       +          <comment>as detected from PackageJson property "repository.url"</comment>
[test:node    ]                </reference>
[test:node    ]                <reference type="website">
[test:node    ]                  <url>https://github.com/jonschlinkert/in-array</url>
[test:node    ]                  <comment>as detected from PackageJson property "homepage"</comment>
[test:node    ] --
[test:node    ]                  <url>https://github.com/dcousens/is-sorted/issues</url>
[test:node    ]                  <comment>as detected from PackageJson property "bugs.url"</comment>
[test:node    ]                </reference>
[test:node    ]                <reference type="vcs">
[test:node    ]       -          <url>https://github.com/dcousens/is-sorted.git</url>
[test:node    ]       +          <url>git+https://github.com/dcousens/is-sorted.git</url>
[test:node    ]                  <comment>as detected from PackageJson property "repository.url"</comment>
[test:node    ]                </reference>
[test:node    ]                <reference type="website">
[test:node    ]                  <url>https://github.com/dcousens/is-sorted</url>
[test:node    ]
[test:node    ]       at runTest (tests/integration/index.test.js:156:12)

@AugustusKling AugustusKling marked this pull request as ready for review November 6, 2024 19:55
@AugustusKling AugustusKling requested a review from a team as a code owner November 6, 2024 19:55
@jkowalleck
Copy link
Member

jkowalleck commented Nov 7, 2024

a lot of dependencies and other things were bumped lately.
I will postpone all dependency bumps until this very feature is merged 👍
please continue, all your efforts are appreciated very much.

Better rebase/merge master, delete your local .pnp.*/.yarn/yarn.lock, and run yarn install again.


regarding your failing tests - #193 (comment)

the underlying test beds ship own lock files, they are not affected by your project lock file.
anyway, the lock file from v1.0.2 is an outdated static one that is not carries over by maintenance. Of course, it makes no sense to pull it in the project when developing new features.

Anyway, do you maybe have old build artifacts, that need to be removed before testing?

  • ./yarn-plugin-cyclonedx.cjs
  • dist/yarn-plugin-cyclonedx.cjs

@AugustusKling
Copy link
Contributor Author

a lot of dependencies and other things were bumped lately. I will postpone all dependency bumps until this very feature is merged 👍 please continue, all your efforts are appreciated very much.

Just for your info the part of the install where node-gyp is used to build libxmljs2 is failing in case you run with Node 22 (current LTS). It's header files are incompatible with the referenced libxml. Updating libxmljs2 to 0.35.0 in CycloneDX/cyclonedx-javascript-library should solve this.

Building on Node 20 is not showing the incompatibility and should work with libxmljs2 version 0.33.0 as well as 0.35.0.

Anyway, do you maybe have old build artifacts, that need to be removed before testing?

* `./yarn-plugin-cyclonedx.cjs`

* `dist/yarn-plugin-cyclonedx.cjs`

Thanks for the hints. It turned out that test errors were the result of a bug in my changed code. It's fixed with correcting the invocation of the normalize-package-data library in my last commit.

Please try out the changes and check if the license evidence is included as you would expect in the resulting SBOMs.

src/builders.ts Outdated Show resolved Hide resolved
@jkowalleck
Copy link
Member

Just for your info the part of the install where node-gyp is used to build libxmljs2 is failing in case you run with Node 22 (current LTS). It's header files are incompatible with the referenced libxml. Updating libxmljs2 to 0.35.0 in CycloneDX/cyclonedx-javascript-library should solve this.

Building on Node 20 is not showing the incompatibility and should work with libxmljs2 version 0.33.0 as well as 0.35.0

this libxmljs2 is a transitive optional dependency. I would love to omit it.
Do you see a way to do so, @AugustusKling ?

@AugustusKling
Copy link
Contributor Author

this libxmljs2 is a transitive optional dependency. I would love to omit it. Do you see a way to do so, @AugustusKling ?

I do not think this is doable from cyclonedx-node-yarn.

Yarn and probably other package managers, too, would download all optional dependencies in the dependency tree, transitively. Then they should compare the current system environment with the compatibility info from the packages' manifests; that is the fields os, cpu, libc. If these are compatible or absent, the package manager needs to invoke the packages build script. If the build succeeds, the package will be used, otherwise it will be excluded from the installation.

In the specific case here the build of libxmljs2 fails and Yarns warns about it. This failure with the libxmljs2 packages does however not abort the installation of cyclonedx-node-yarn. Instead, Yarn excludes libxmljs2 automatically since it's only part of the dependency tree as an optional dependency. You'll lose whatever optional feature or optimization libxmljs2 should have contributed to @cyclonedx/cyclonedx-library.

If you really wanted to exclude libxmljs2, you'd need to remove it from @cyclonedx/cyclonedx-library but I guess it is there for a reason. Instead, update libxmljs2 to 0.35.0 to allow for a successful build with more Node versions.

The following is Yarn's way to notify about the build failure or an optional dependency. It would be nicer if the wording was more explicit and it said something along the lines of "excluding libxmljs2 which is an optional dependency".

➤ YN0000: · Yarn 4.5.1
➤ YN0000: ┌ Resolution step
➤ YN0000: └ Completed
➤ YN0000: ┌ Post-resolution validation
➤ YN0086: │ Some peer dependencies are incorrectly met by dependencies; run yarn explain peer-requirements for details.
➤ YN0000: └ Completed
➤ YN0000: ┌ Fetch step
➤ YN0000: └ Completed in 0s 307ms
➤ YN0000: ┌ Link step
➤ YN0000: │ ESM support for PnP uses the experimental loader API and is therefore experimental
➤ YN0007: │ libxmljs2@npm:0.33.0 must be built because it never has been before or the last one failed
➤ YN0009: │ libxmljs2@npm:0.33.0 couldn't be built successfully (exit code 1, logs can be found here: /tmp/xfs-8914cf2a/build.log)
➤ YN0000: └ Completed in 18s 797ms
➤ YN0000: · Done with warnings in 19s 425ms

@jkowalleck
Copy link
Member

re #193 (comment)

I expected that, since - according to the docs I've read - yarn's concept of optional is a very well-thought one. But since I am not a heavy user of yarn, I thought I should ask, before assuming something.
Thank you for the detailed explanation. 👍

The features provided by this optional dependency are not used in the code at the moment, so they are not bundled in the final product, and no related code is generated in the build artifact.

From developer experience, the warning on setup/install looks ugly, true. But it's fine, it is not a blocker. If it was, please open an issue for that.

@AugustusKling
Copy link
Contributor Author

From developer experience, the warning on setup/install looks ugly, true. But it's fine, it is not a blocker. If it was, please open an issue for that.

The warning is easily misunderstood for a problem but can be safely ignored.

Signed-off-by: Jan Kowalleck <[email protected]>
Signed-off-by: Jan Kowalleck <[email protected]>
Signed-off-by: Jan Kowalleck <[email protected]>
Signed-off-by: Jan Kowalleck <[email protected]>
@jkowalleck
Copy link
Member

jkowalleck commented Dec 4, 2024

followups and concerns that need to be taken into account: AugustusKling#1 (comment)

@AugustusKling ,
how do you see the current state of this PR? Would you be okay with merging it and have it published and tested downstream?
I mean, this software is not meant to be final, it is intended to be improved step-by-step over time, right?

Firstly, sorry for leaving this for many days. I would like to spend more time to improve this tool but time is sadly limited.

We can merge this in my opinion as it is. When I compare with my PR to your repo https://github.com/CycloneDX/cyclonedx-node-yarn/pull/193/files then I think these things should be addressed soon (before or after merge):

  • Add --gather-license-texts to /README.md file, maybe with a warning that it's not reliable yet

  • Take over changes from /tests/README.md file

These improvement could be addressed in follow-up PRs:

Just make sure to call the license extraction experimental until we can trust it does not silently skip over errors.


tracking of tasks that need to ke tackled immediately:

  • Add --gather-license-texts to /README.md file, maybe with a warning that it's not reliable yet
  • Take over changes from /tests/README.md file -- nothing to do, they were part of the original PR already, and they stayed untouched.
  • make sure to call the license extraction "experimental" until we can trust it does not silently skip over errors. -- call out in README and in CLI help page

README.md Show resolved Hide resolved
src/commands.ts Outdated Show resolved Hide resolved
@jkowalleck
Copy link
Member

@AugustusKling,

I've suggested the docs regarding the "experimental" nature of the new feature.
please review and apply the suggestions, if they appeal.

@AugustusKling
Copy link
Contributor Author

@jkowalleck, thanks for the suggestions. I've merged them and also the main branch into this. The build/tests pass locally.

Can you please check why the workflows for the PR validations do not start to run?

@jkowalleck
Copy link
Member

jkowalleck commented Dec 8, 2024

Can you please check why the workflows for the PR validations do not start to run?

the CI does not start automatically for PRs, it needs to be triggered by a maintainer - after reviewing the changes. (This s a security-feature by github, taken for workflows that were not migrated to github's modern permission system.
I might have time to modernize workflows during the course of next year, but for now this is our solution)

Anyway, I had the CI running, and it passed - as expected 🥳

I'd consider this PR as ready for merging.
@AugustusKling, do you agree? If so, I would merge this PR, and you could create all the follow-up tickets needed.

PS: I'd plan a release of this feature for early January 2025. Until then, I will work on some chores and other maintenance tasks.

@jkowalleck jkowalleck changed the title Extraction of license text from files. #33 Extraction of license text from files. Dec 8, 2024
@AugustusKling
Copy link
Contributor Author

Hi @jkowalleck , I created the tickets #228 and #229 for the improvements after merge of this pull request.

I agree that this is ready to be merged since the README file already mentions license text extraction is experimental.

@jkowalleck
Copy link
Member

@AugustusKling ,

i will merge this PR today,
and might come back to you for discussions of #228 and #229

@jkowalleck jkowalleck merged commit 1d4dbd8 into CycloneDX:main Dec 9, 2024
25 checks passed
jkowalleck pushed a commit to CycloneDX/cyclonedx-node-npm that referenced this pull request Dec 17, 2024
fixes #256 

This PR is based on #427, and uses the same implementation as it was
implemented in yarn (See
CycloneDX/cyclonedx-node-yarn#193)
closes #427

---------

Signed-off-by: Matthias Schiebel <[email protected]>
Signed-off-by: Christoph Uhland <[email protected]>
Signed-off-by: Christoph Uhland <[email protected]>
Co-authored-by: Matthias Schiebel <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: Add complete License-Text to SBOM result
2 participants