Skip to content

Conversation

mnonnenmacher
Copy link
Member

@mnonnenmacher mnonnenmacher commented Jun 14, 2025

A draft for license text providers, created for early feedback. The plan is to convert the existing logic into separate providers that are queried in the order defined by the user, so two more will be added:

  • ScanCode license text provider, to provider license texts from a local ScanCode installation.
  • A provider for a directory containing custom license texts.
    I would also like to add on that downloads license texts on demand from https://scancode-licensedb.aboutcode.org.

@sschuberth I just discovered #9620, so should this API be renamed to LicenseDataProvider already and the metadata model and functions could be added later?

TODO:

  • Take new API into use
  • Remove old API
  • Generate plugin docs
  • Write docs for license fact providers
  • Fix failing tests.

Copy link

codecov bot commented Jun 14, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 57.40%. Comparing base (d47c9ac) to head (2326dbf).
⚠️ Report is 12 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #10479      +/-   ##
============================================
+ Coverage     57.38%   57.40%   +0.01%     
- Complexity     1673     1674       +1     
============================================
  Files           346      346              
  Lines         12759    12763       +4     
  Branches       1209     1209              
============================================
+ Hits           7322     7326       +4     
  Misses         4972     4972              
  Partials        465      465              
Flag Coverage Δ
funTest-docker 71.11% <ø> (ø)
funTest-non-docker 33.01% <0.00%> (-0.01%) ⬇️
test-ubuntu-24.04 41.92% <100.00%> (+0.02%) ⬆️
test-windows-2025 41.90% <100.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@sschuberth
Copy link
Member

@sschuberth I just discovered #9620, so should this API be renamed to LicenseDataProvider already and the metadata model and functions could be added later?

If we agree that it should be the same interface / plugin type that returns more license (meta)data later (and not a different interface / plugin type), then I believe it would be good to signal that intention by already now generalizing the interface and class names to LicenseDataProvider or e.g. LicenseMetadataProvider.

As a side note, my old license-classification-interface branch contains a separate LicenseClassifications interface that could later be merged into the interface defined here (again, if we decide that texts and classifications should be provided by the same plugin type).

@mnonnenmacher mnonnenmacher force-pushed the license-text-provider-plugins branch from fb1bec9 to aadd798 Compare June 26, 2025 21:04
@sschuberth
Copy link
Member

sschuberth commented Jun 30, 2025

Also interesting in this context: A new API for OSI Approved Licenses.

@mnonnenmacher mnonnenmacher force-pushed the license-text-provider-plugins branch from aadd798 to fb2ae75 Compare July 3, 2025 20:57
@mnonnenmacher mnonnenmacher force-pushed the license-text-provider-plugins branch from fb2ae75 to 2554e5a Compare July 3, 2025 21:06
@mnonnenmacher
Copy link
Member Author

The providers should now be on par with the currently used logic for reading license texts, but the integration is still missing. For ScanCode I have not copied the various fallbacks because I wasn't sure if they are still required.

@sschuberth
Copy link
Member

For ScanCode I have not copied the various fallbacks because I wasn't sure if they are still required.

I believe we anyway should simplify the logic, and instead of looking in the various locations depending on how ScanCode was installed, we should use the scancode-license-data tool. A provider for it could call the tool once at runtime / initialization, and then serve the license texts from the location scancode-license-data was instructed to copy them to.

@mnonnenmacher mnonnenmacher force-pushed the license-text-provider-plugins branch from 7cad831 to 4cbdae2 Compare July 22, 2025 22:25
@mnonnenmacher mnonnenmacher force-pushed the license-text-provider-plugins branch from 4cbdae2 to b7e40cb Compare August 3, 2025 11:05
@mnonnenmacher mnonnenmacher force-pushed the license-text-provider-plugins branch 2 times, most recently from 4ed637c to 86c5219 Compare August 3, 2025 19:57
@mnonnenmacher
Copy link
Member Author

@sschuberth Some tests are still failing but you could already make another review if you want.

@mnonnenmacher mnonnenmacher force-pushed the license-text-provider-plugins branch 2 times, most recently from 0a06094 to a7c4d9d Compare August 5, 2025 22:26
@mnonnenmacher mnonnenmacher marked this pull request as ready for review August 5, 2025 22:26
@mnonnenmacher mnonnenmacher requested review from a team as code owners August 5, 2025 22:26
@mnonnenmacher mnonnenmacher force-pushed the license-text-provider-plugins branch 2 times, most recently from 3b218ba to 605ec33 Compare August 5, 2025 22:30
@mnonnenmacher mnonnenmacher force-pushed the license-text-provider-plugins branch 2 times, most recently from 688e4ef to d2bf9a8 Compare August 30, 2025 15:33
@mnonnenmacher
Copy link
Member Author

mnonnenmacher commented Aug 30, 2025

@sschuberth In 8b19f27 I have changed the old license text provider to ignore the "+" in SPDX licenses. I have not done this in the new implementation which makes the OpossumReporterFunTest fail because:

  • For some reason it includes ALL SPDX license texts in the output file.
  • While the content of the license text resources with and without "+" seems to be identical, it is formatted differently (for example, compare LGPL-3.0 and LGPL-3.0+).

The reason for the commit was to fix #5004, however, in the meantime we have added license text resources for the licenses with "+" (I didn't check when). What's your opinion, should I update the expected results for the Opossum reporter or revert the SPDX provider to the previous behavior and ignore the "+"? Another option is to keep the current implementation and ignore the "+" licenses only in the Opossum Reporter when creating the frequentLicenses list, this would be less tedious than updating the expected output, but maybe that would break something in Opossum.

It's also weird that the formatting of the license texts for the "+" licenses is different to that of the according licenses without "+".

@sschuberth
Copy link
Member

which makes the OpossumReporterFunTest fail because:

* For some reason it includes ALL SPDX license texts in the output file.

This has bothered me in the past in the context of other changes as well.

This formatting difference seems to come from the ScanCode license DB (compare lgpl-3.0.html to lgpl-3.0-plus.html), not from the SPDX list (compare LGPL-3.0-only.txt to LGPL-3.0-or-later.txt), as historically the former had better formatting than the latter. Maybe it's time by now to really only use SPDX license texts in our resources. Or not use any resources / bundled license texts at all anymore, and least not in the core, but exclusively move this data to license fact provider plugins.

What's your opinion, should I update the expected results for the Opossum reporter or revert the SPDX provider to the previous behavior and ignore the "+"?

I'd not ignore the "+" anymore; after all, the license texts for these variants could be different. But I believe we should always fall back to the non-"+" text if there is no dedicated "+" text. The question is whether e should leave this for each provider to implement, or guarantee this fallback somehow via the API.

So regarding OpossumReporterFunTest, I'd either update the expected results, or even reduce the test case to at least not test any "+" variants, or maybe even trim it down more to only check a few selected license texts.

This is a fixup for cd40dd1.

Signed-off-by: Martin Nonnenmacher <martin.nonnenmacher@bosch.com>
Add the API for license fact provider plugins. The existing license text
provider code will be migrated to this API in the following commits.

The API is called "license fact provider" instead of "license text
provider" in preparation for providing more facts about licenses than
just the text.

Signed-off-by: Martin Nonnenmacher <martin.nonnenmacher@bosch.com>
Add a license fact provider for SPDX licenses. The provider uses the
license texts from the resources of the `:utils:spdx-utils` module.

While the `getLicenseText` function from `spdx-utils` made the
resolution of license exceptions optional, this is always enabled for
the provider. The reason is that the reporter as the only consumer of
license texts always enables that option.

Signed-off-by: Martin Nonnenmacher <martin.nonnenmacher@bosch.com>
@mnonnenmacher mnonnenmacher force-pushed the license-text-provider-plugins branch from d2bf9a8 to f0830fb Compare September 1, 2025 13:49
Add a license fact provider that reads license information from a local
ScanCode installation.

Signed-off-by: Martin Nonnenmacher <martin.nonnenmacher@bosch.com>
Add a license fact provider that reads license texts from a configurable
directory. This includes a provider for the default directory, similar
to the default providers for package configurations and package
curations.

Signed-off-by: Martin Nonnenmacher <martin.nonnenmacher@bosch.com>
Add a property to `OrtConfiguration` to configure the license fact
providers to use. This will be taken into use in the following commits.

The property is added to the top-level because even though license texts
are currently only used by the reporter, license facts might in future
also be used by other tools.

Signed-off-by: Martin Nonnenmacher <martin.nonnenmacher@bosch.com>
Change the `ReporterInput` to use a `LicenseFactProvider` instead of a
`LicenseTextProvider`.

The `ReporterCommand` now creates the license fact providers based on
the ORT configuration and support for the `--custom-license-texts-dir`
option is dropped. Instead, a custom license text directory now has to
be configured in the `config.yml`.

Templates for the Freemarker or AsciiDoc reporters must also be updated
to use `licenseFactProvider` instead of `licenseTextProvider`.

The new `SpdxLicenseFactProvider` does now return the license texts for
SPDX licenses with "+" instead of falling back to the texts for the
licenses without "+". This requires updating the expected results for
the `OpossumReporterFunTest`.

Signed-off-by: Martin Nonnenmacher <martin.nonnenmacher@bosch.com>
Remove the old API for getting license texts in favor of the new
`LicenseFactProvider` plugins.

The `RequirementsCommand` does not check anymore if the ScanCode license
directory can be found. Instead, this check will now only happen if the
`ScanCodeLicenseFactProvider` is used.

Signed-off-by: Martin Nonnenmacher <martin.nonnenmacher@bosch.com>
@mnonnenmacher mnonnenmacher force-pushed the license-text-provider-plugins branch from f0830fb to b53816a Compare September 1, 2025 17:13
@mnonnenmacher
Copy link
Member Author

which makes the OpossumReporterFunTest fail because:

* For some reason it includes ALL SPDX license texts in the output file.

This has bothered me in the past in the context of other changes as well.

This formatting difference seems to come from the ScanCode license DB (compare lgpl-3.0.html to lgpl-3.0-plus.html), not from the SPDX list (compare LGPL-3.0-only.txt to LGPL-3.0-or-later.txt), as historically the former had better formatting than the latter. Maybe it's time by now to really only use SPDX license texts in our resources. Or not use any resources / bundled license texts at all anymore, and least not in the core, but exclusively move this data to license fact provider plugins.

What's your opinion, should I update the expected results for the Opossum reporter or revert the SPDX provider to the previous behavior and ignore the "+"?

I'd not ignore the "+" anymore; after all, the license texts for these variants could be different. But I believe we should always fall back to the non-"+" text if there is no dedicated "+" text. The question is whether e should leave this for each provider to implement, or guarantee this fallback somehow via the API.

So regarding OpossumReporterFunTest, I'd either update the expected results, or even reduce the test case to at least not test any "+" variants, or maybe even trim it down more to only check a few selected license texts.

I have updated the expected results because that was the easiest solution.

@mnonnenmacher mnonnenmacher enabled auto-merge (rebase) September 1, 2025 19:34
Comment on lines +92 to +93
"SPDX" to PluginConfig.EMPTY,
"ScanCode" to PluginConfig.EMPTY,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a remark, nothing to be solved for this PR already: As the "SPDX" license fact provider uses resources that preferably are taken from ScanCode's LicenseDB, see

// Prefer the texts from ScanCode as these have better formatting than those from SPDX.
val providers = sequenceOf(ScanCodeLicenseTextProvider(), SpdxLicenseListDataProvider())

this now creates a bit of a weird situation of the "SPDX" provider mostly returning "ScanCode" licenses. I'd like to clean that up after the merge of this PR so that the "SPDX" provider really only provides SPDX license texts, and then eventually switch the order of precedence here. That would also simplify the build system code, which currently has its own SpdxLicenseTextProvider interface:

fun interface SpdxLicenseTextProvider {
fun getLicenseUrl(info: LicenseInfo): URI?
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is the argument that the formatting of the ScanCode license texts is better not valid anymore?

Copy link
Member

@sschuberth sschuberth Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to tell precisely without investing more time. Sometimes it's still better, sometimes just different (maybe with SPDX even being a bit nicer, depending on taste), but probably never clearly worse.

Anyway, that's not the right question to ask anymore, IMO. We can still prefer ScanCode over SPDX license text, but that should not be done at the level of creating these (SPDX!) resources, but on the new LicenseFactProvider level.

Signed-off-by: Martin Nonnenmacher <martin.nonnenmacher@bosch.com>
Update the documentation for the new license fact provider plugins and
add a section about license file archives.

Signed-off-by: Martin Nonnenmacher <martin.nonnenmacher@bosch.com>
`fail_on_error` is deprecated, see [1].

[1]: https://github.com/UmbrellaDocs/action-linkspector?tab=readme-ov-file#fail_on_error

Signed-off-by: Martin Nonnenmacher <martin.nonnenmacher@bosch.com>
Exclude the website from linkspector checks because Docusaurus validates
links on its own and linkspector cannot validate some link formats from
Docusaurus correctly.

Signed-off-by: Martin Nonnenmacher <martin.nonnenmacher@bosch.com>
@mnonnenmacher mnonnenmacher force-pushed the license-text-provider-plugins branch from b53816a to 2326dbf Compare September 2, 2025 07:18
@mnonnenmacher mnonnenmacher merged commit d70475b into main Sep 2, 2025
33 of 34 checks passed
@mnonnenmacher mnonnenmacher deleted the license-text-provider-plugins branch September 2, 2025 08:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants