Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate license values #296

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

shr-project
Copy link

@shr-project shr-project commented Sep 18, 2020

This PR was motivated by this issue in superflore:
ros-infrastructure/superflore#271
superflore tries to unify at least some values to more common license identifier, but because https://www.ros.org/reps/rep-0149.html#license-multiple-but-at-least-one isn't very strict about the values and people got quite creative here and the regular expressions in superflore are in some cases "inventing" extra information like mapping "LGPL" to "LGPL-2" and in worse cases plain wrong mapping "LGPLv3" to "LGPL-2".

Superflore or any other tool, cannot guess what LGPL version developer had in mind when it isn't specified in the package.xml. Because it would need to at least investigate the actual sources for license headers there.

Instead of trying to improve the license value later, we need to warn the developer when the provided value cannot be unambiguously parsed and ask him to improve it in the package.xml in the source repo. To help with that we can show the correct value (of corresponding SPDX identifier) we've assigned from the provided license value.

And to make sure that these assigned SPDX identifiers don't do any guessing or match to too wide range of values (like in superflore) lets map them by exact string from the values currently being used in any component currently listed in rosdistro index. That way these assignments are easy to maintain and not error-prone.

Then we can take the same set of exact assignments from here and add it "temporarily" in superflore, that way the licenses which are currently incorrectly guessed will be fixed while not regressing to completely free form of license values. This is implemented in ros-infrastructure/superflore#279

rosdistro CI validation already checks for license tag existence, we can easily extend it to call this validation as well as implemented in: ros/rosdistro#26601 I've used this validation to easily get a list of all license values from all components currently listed in rosdistro index.

Some commonly used values (and some more creative):

  • 4711x "BSD" unfortunately without a version, (most people who don't care about license in package.xml, probably just left the default value)
  • 741x "Apache 2.0" almost SPDX, just the dash missing, possibly because it's shown as first example in https://www.ros.org/reps/rep-0149.html#license-multiple-but-at-least-one
  • 81x "LGPL" unfortunately without a version
  • 77x "LGPLv3"
  • 75x "GPLv3"
  • 73x "BSD 3-Clause"
  • 63x "GPL" unfortunately without a version
    ...
  • 31x "TODO"
  • "WTF", "Version 2.0", "T.D.B", "See license.txt", "Check author's website", "N/A", ...
  • "free for research or education purpose, all rights maintained by David Applegate, William Cook, Sanjeeb Dash, and Monika Mevenkamp"
  • "TERMS OF USE FOR GUNDAM RESEARCH OPEN SIMULATOR Attribution-NonCommercial-ShareAlike"
  • "United States Government Purpose"

Copy link
Contributor

@130s 130s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great functionality! I was/am irritated by license values inconsistently named when I'm working on things like ros-infrastructure/rospkg#201. I like the direction of the feature is going. I'm not a maintainer and not so sure about the direction of the implementation though.

The list was created from https://spdx.org/licenses/ with:
cat doc/spdx-3.10-2020-08-03.csv | cut -f 2 | grep -v ^Identifier$
"""
return lic in ['0BSD', 'AAL', 'Abstyles', 'Adobe-2006', 'Adobe-Glyph', 'ADSL', 'AFL-1.1', 'AFL-1.2', 'AFL-2.0', 'AFL-2.1', 'AFL-3.0', 'Afmparse', 'AGPL-1.0-only', 'AGPL-1.0-or-later',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have an answer, but I'm afraid embedding a (huge) list of license names can be a maintenance burden (I saw you described how you made it, but still would require manual invocation in the future when updating the list is needed. Also as a disclaimer, I'm not a maintainer on this repo). Wish there's a standalone tool that provides such a list.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, my hope is that this list is only temporary and component owners will start using valid identifiers in package.xml and eventually only the validation function will stay and all the replacements won't be needed. So the only maintenance should be to drop it in some far future.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the list isn't complete (and cannot be), because on of the most common issues is using license like LGPL without a version, but here without checking the actual component source we cannot replace it with anything more accurate, all we can do is to show and warning and hope that the owner will improve the package.xml.

I still plan to extend the list to cover most common cases of listing multiple licenses in one tag instead of using multiple of them - just using & as separator - together with slightly different warning message suggesting to split them into multiple tags.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, my hope is that this list is only temporary and component owners will start using valid identifiers in package.xml and eventually only the validation function will stay and all the replacements won't be needed. So the only maintenance should be to drop it in some far future.

that would be great, but I would say this is not something we can expect (ie, the: "component owners will start using valid identifiers in package.xml and eventually only the validation function will stay and all the replacements won't be needed" part).

As of today, there are 5946 packages in the ros/rosdistro db alone, with many more not registered anywhere (ie: hosted somewhere publicly but not part of a distribution). Many of those packages are old and will not be updated, but even the more recent ones will have a significant nr which will not be updated.

If you add this list, I expect it to be necessary for at least until Noetic EOLs, and probably also for a few ROS 2 versions (if you intend to add this functionality to a ROS 2 tool as well). We're talking 5+ years here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's the far future for removal, but in the meantime we shouldn't need to update the list.

The replacements are just to have some immediate improvements where possible, while showing the corresponding SPDX in warning message. And not to regress from what superflore currently does with regexp replacements.

It's not meant to be perfect and it cannot be without parsing the actual source files with something like scancode-toolkit, because in some cases the valid identifier in package.xml doesn't match with the actual license and catkin_pkg nor superflore could detect that (even to show another warning).

@cottsay
Copy link
Member

cottsay commented Sep 29, 2020

This seems like it might be better suited for inclusion in catkin_lint. Any thoughts on that?

@shr-project
Copy link
Author

This seems like it might be better suited for inclusion in catkin_lint. Any thoughts on that?

I've added it to catkin_pkg only because rosdistro CI github action uses it to validate version, so I've used the same mechanism for license validation, but if more people will notice the warnings from catkin_lint then definitely it should be there (as well).

@dirk-thomas
Copy link
Member

@shr-project Atm the PR is in draft state and has no description. Therefore I am not going to review the patch. Please add a very detailed description about what the patch is doing, why it is doing so, what the side effects are, why you think those are justified, etc.

On a first look this adds new warnings to a lot of existing packages. This on its own for already released distros is an almost guaranteed no-go.

* catch ambiguous and invalid values while validating package.xml instead
  of trying to fix the values in superflore
  ros-infrastructure/superflore#271

Signed-off-by: Martin Jansa <martin.jansa@lge.com>
…AGE-LICENSE

Signed-off-by: Martin Jansa <martin.jansa@lge.com>
…statistics

* see the license value statistics in:
  ros-infrastructure/superflore#271 (comment)

* with this applied, there are following statistics across all currently used ROS
  distributions in rosdistro:

* License values which were unambiguously mapped to one of SPDX identifiers:
   1064 WARNING: The license value "Apache License 2.0" is not valid SPDX identifier, please use "Apache-2.0" instead
    741 WARNING: The license value "Apache 2.0" is not valid SPDX identifier, please use "Apache-2.0" instead
     77 WARNING: The license value "LGPLv3" is not valid SPDX identifier, please use "LGPL-3.0-only" instead
     75 WARNING: The license value "GPLv3" is not valid SPDX identifier, please use "GPL-3.0-only" instead
     73 WARNING: The license value "BSD 3-Clause" is not valid SPDX identifier, please use "BSD-3-Clause" instead
     34 WARNING: The license value "GPLv2" is not valid SPDX identifier, please use "GPL-2.0-only" instead
     34 WARNING: The license value "BSD-3" is not valid SPDX identifier, please use "BSD-3-Clause" instead
     26 WARNING: The license value "Apache 2" is not valid SPDX identifier, please use "Apache-2.0" instead
     23 WARNING: The license value "Apache License, Version 2.0" is not valid SPDX identifier, please use "Apache-2.0" instead
     21 WARNING: The license value "Apache2" is not valid SPDX identifier, please use "Apache-2.0" instead
     14 WARNING: The license value "zlib" is not valid SPDX identifier, please use "Zlib" instead
     10 WARNING: The license value "APACHE2.0" is not valid SPDX identifier, please use "Apache-2.0" instead
      8 WARNING: The license value "GNU Lesser Public License 2.1" is not valid SPDX identifier, please use "LGPL-2.1-only" instead
      6 WARNING: The license value "LGPLv2.1" is not valid SPDX identifier, please use "LGPL-2.1-only" instead
      6 WARNING: The license value "CC BY-NC-SA 4.0" is not valid SPDX identifier, please use "CC-BY-NC-SA-4.0" instead
      6 WARNING: The license value "BSD2" is not valid SPDX identifier, please use "BSD-2-Clause" instead
      5 WARNING: The license value "LGPL-2.1" is not valid SPDX identifier, please use "LGPL-2.1-only" instead
      5 WARNING: The license value "Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License" is not valid SPDX identifier, please use "CC-BY-NC-ND-4.0" instead
      4 WARNING: The license value "zlib License" is not valid SPDX identifier, please use "Zlib" instead
      4 WARNING: The license value "LGPL v2.1" is not valid SPDX identifier, please use "LGPL-2.1-only" instead
      4 WARNING: The license value "GNU General Public License v2.0" is not valid SPDX identifier, please use "GPL-2.0-only" instead
      4 WARNING: The license value "Eclipse Public License 2.0" is not valid SPDX identifier, please use "EPL-2.0" instead
      4 WARNING: The license value "Creative Commons BY-NC-ND 3.0" is not valid SPDX identifier, please use "CC-BY-NC-ND-3.0" instead
      4 WARNING: The license value "Boost Software License" is not valid SPDX identifier, please use "BSL-1.0" instead
      3 WARNING: The license value "Mozilla Public License Version 1.1" is not valid SPDX identifier, please use "MPL-1.1" instead
      3 WARNING: The license value "CreativeCommons-by-nc-sa-2.0" is not valid SPDX identifier, please use "CC-BY-NC-SA-2.0" instead
      3 WARNING: The license value "Boost Software License, Version 1.0" is not valid SPDX identifier, please use "BSL-1.0" instead
      2 WARNING: The license value "LGPL3" is not valid SPDX identifier, please use "LGPL-3.0-only" instead
      2 WARNING: The license value "ECL2.0" is not valid SPDX identifier, please use "EPL-2.0" instead
      2 WARNING: The license value "CreativeCommons-by-nc-4.0" is not valid SPDX identifier, please use "CC-BY-NC-4.0" instead
      2 WARNING: The license value "Boost" is not valid SPDX identifier, please use "BSL-1.0" instead
      2 WARNING: The license value "Boost Software License 1.0" is not valid SPDX identifier, please use "BSL-1.0" instead
      2 WARNING: The license value "BSL1.0" is not valid SPDX identifier, please use "BSL-1.0" instead
      2 WARNING: The license value "BSD 2-Clause License" is not valid SPDX identifier, please use "BSD-2-Clause" instead
      2 WARNING: The license value "Apache2.0" is not valid SPDX identifier, please use "Apache-2.0" instead
      2 WARNING: The license value "Apache v2.0" is not valid SPDX identifier, please use "Apache-2.0" instead
      2 WARNING: The license value "Apache v2" is not valid SPDX identifier, please use "Apache-2.0" instead
      2 WARNING: The license value "Apache License, Version 2.0 (http://www.apache.org/licenses/LICENSE-2.0)" is not valid SPDX identifier, please use "Apache-2.0" instead
      1 WARNING: The license value "MIT License" is not valid SPDX identifier, please use "MIT" instead
      1 WARNING: The license value "LGPL v2.1 or later" is not valid SPDX identifier, please use "LGPL-2.1-or-later" instead
      1 WARNING: The license value "LGPL v2" is not valid SPDX identifier, please use "LGPL-2.0-only" instead
      1 WARNING: The license value "GPL-2.0" is not valid SPDX identifier, please use "GPL-2.0-only" instead
      1 WARNING: The license value "GPL v3" is not valid SPDX identifier, please use "GPL-3.0-only" instead
      1 WARNING: The license value "GNU GPL v3.0" is not valid SPDX identifier, please use "GPL-3.0-only" instead
      1 WARNING: The license value "CreativeCommons-Attribution-NonCommercial-ShareAlike-4.0-International" is not valid SPDX identifier, please use "CC-BY-NC-SA-4.0" instead
      1 WARNING: The license value "CreativeCommons-Attribution-NonCommercial-NoDerivatives-4.0" is not valid SPDX identifier, please use "CC-BY-NC-ND-4.0" instead
      1 WARNING: The license value "BSD 3-clause. See license attached" is not valid SPDX identifier, please use "BSD-2-Clause" instead
      1 WARNING: The license value "BSD 3-clause Clear License" is not valid SPDX identifier, please use "BSD-2-Clause" instead
      1 WARNING: The license value "Apachi 2" is not valid SPDX identifier, please use "Apache-2.0" instead
      1 WARNING: The license value "Apache License Version 2.0" is not valid SPDX identifier, please use "Apache-2.0" instead

* License texts which were replaced with more common version
  Biggest issue is clearly the "TODO" string from catkin package template which people forget to update
     31 WARNING: The license value "TODO" is not valid SPDX identifier, and it is usually used as "TODO-CATKIN-PACKAGE-LICENSE"
      6 WARNING: The license value "proprietary" is not valid SPDX identifier, and it is usually used as "Proprietary"
      6 WARNING: The license value "Public domain" is not valid SPDX identifier, and it is usually used as "PD"
      5 WARNING: The license value "Public Domain" is not valid SPDX identifier, and it is usually used as "PD"

* License texts which weren't mapped to SPDX, usually because the license version wasn't specified
  or when some more creative form of license description was used
  Biggest issue is clearly the "BSD" without Clause specification
  followed by recipes using multiple licenses while not using
  clear separator between them (e.g. OpenEmbedded supports '&' '|' '(' ')':
  http://git.openembedded.org/openembedded-core/tree/meta/lib/oe/license.py?id=8e2d0575e4e7036b5f60e632f377a8ab2b96ead8#n42 )

   4711 WARNING: The license value "BSD" cannot be mapped to valid SPDX identifier
     81 WARNING: The license value "LGPL" cannot be mapped to valid SPDX identifier
     63 WARNING: The license value "GPL" cannot be mapped to valid SPDX identifier
     31 WARNING: The license value "TODO" cannot be mapped to valid SPDX identifier
     20 WARNING: The license value "United States Government Purpose" cannot be mapped to valid SPDX identifier
     20 WARNING: The license value "SwRI Proprietary" cannot be mapped to valid SPDX identifier
     18 WARNING: The license value "Apache" cannot be mapped to valid SPDX identifier
     16 WARNING: The license value "ASL 2.0" cannot be mapped to valid SPDX identifier
     14 WARNING: The license value "EPL" cannot be mapped to valid SPDX identifier
     10 WARNING: The license value "GNU Lesser General Public License (LGPL)" cannot be mapped to valid SPDX identifier
      8 WARNING: The license value "Proprietary" cannot be mapped to valid SPDX identifier
      7 WARNING: The license value "BSD,LGPL,Apache 2.0" cannot be mapped to valid SPDX identifier
      7 WARNING: The license value "BSD, LGPL" cannot be mapped to valid SPDX identifier
      7 WARNING: The license value "BSD, Apache 2.0" cannot be mapped to valid SPDX identifier
      6 WARNING: The license value "proprietary" cannot be mapped to valid SPDX identifier
      6 WARNING: The license value "Public domain" cannot be mapped to valid SPDX identifier
      6 WARNING: The license value "Creative Commons" cannot be mapped to valid SPDX identifier
      6 WARNING: The license value "BSD, GPL" cannot be mapped to valid SPDX identifier
      5 WARNING: The license value "Public Domain" cannot be mapped to valid SPDX identifier
      4 WARNING: The license value "TBD" cannot be mapped to valid SPDX identifier
      4 WARNING: The license value "CC-BY-SA" cannot be mapped to valid SPDX identifier
      4 WARNING: The license value "BSD License 2.0" cannot be mapped to valid SPDX identifier
      3 WARNING: The license value "N/A" cannot be mapped to valid SPDX identifier
      3 WARNING: The license value "HOYA License" cannot be mapped to valid SPDX identifier
      3 WARNING: The license value "HEBI C++ Software License (https://www.hebirobotics.com/softwarelicense)" cannot be mapped to valid SPDX identifier
      3 WARNING: The license value "GPLv2 with linking exception" cannot be mapped to valid SPDX identifier
      3 WARNING: The license value "BSD,LGPL,LGPL (amcl)" cannot be mapped to valid SPDX identifier
      3 WARNING: The license value "BSD, some icons are licensed under the GNU Lesser General Public License (LGPL) or Creative Commons Attribution-Noncommercial 3.0 License" cannot be mapped to valid SPDX identifier
      3 WARNING: The license value "ALv2" cannot be mapped to valid SPDX identifier
      2 WARNING: The license value "Yujin Robot" cannot be mapped to valid SPDX identifier
      2 WARNING: The license value "TERMS OF USE FOR GUNDAM RESEARCH OPEN SIMULATOR Attribution-NonCommercial-ShareAlike" cannot be mapped to valid SPDX identifier
      2 WARNING: The license value "Southwest Research Institute Proprietary" cannot be mapped to valid SPDX identifier
      2 WARNING: The license value "KHI CAD license (mesh data, see readme)" cannot be mapped to valid SPDX identifier
      2 WARNING: The license value "GPL for sigblock" cannot be mapped to valid SPDX identifier
      2 WARNING: The license value "GPL because of list.h; other files released under BSD" cannot be mapped to valid SPDX identifier
      2 WARNING: The license value "Eclipse Distribution License 1.0" cannot be mapped to valid SPDX identifier
      2 WARNING: The license value "Commercial" cannot be mapped to valid SPDX identifier
      2 WARNING: The license value "Check author's website" cannot be mapped to valid SPDX identifier
      2 WARNING: The license value "Binary Only" cannot be mapped to valid SPDX identifier
      2 WARNING: The license value "BSD,GPL because of list.h; other files released under BSD,GPL" cannot be mapped to valid SPDX identifier
      2 WARNING: The license value "APLv2" cannot be mapped to valid SPDX identifier
      1 WARNING: The license value "specified in-file" cannot be mapped to valid SPDX identifier
      1 WARNING: The license value "see license.txt" cannot be mapped to valid SPDX identifier
      1 WARNING: The license value "see License.txt" cannot be mapped to valid SPDX identifier
      1 WARNING: The license value "free for research or education purpose, all rights maintained by David Applegate, William Cook, Sanjeeb Dash, and Monika Mevenkamp" cannot be mapped to valid SPDX identifier
      1 WARNING: The license value "free for academic research, for further licensing contact Wiliam Cook" cannot be mapped to valid SPDX identifier
      1 WARNING: The license value "WTF" cannot be mapped to valid SPDX identifier
      1 WARNING: The license value "Version 2.0" cannot be mapped to valid SPDX identifier
      1 WARNING: The license value "T.D.B" cannot be mapped to valid SPDX identifier
      1 WARNING: The license value "Slightech License" cannot be mapped to valid SPDX identifier
      1 WARNING: The license value "See license.txt" cannot be mapped to valid SPDX identifier
      1 WARNING: The license value "Lesser GPL and Apache License" cannot be mapped to valid SPDX identifier
      1 WARNING: The license value "LGPLv2.1, modified BSD" cannot be mapped to valid SPDX identifier
      1 WARNING: The license value "LGPL and Apache2" cannot be mapped to valid SPDX identifier
      1 WARNING: The license value "LGPL / BSD" cannot be mapped to valid SPDX identifier
      1 WARNING: The license value "GPL v2 with linking exception" cannot be mapped to valid SPDX identifier
      1 WARNING: The license value "GPL + runtime exception" cannot be mapped to valid SPDX identifier
      1 WARNING: The license value "FZI all rights reserved" cannot be mapped to valid SPDX identifier

Signed-off-by: Martin Jansa <martin.jansa@lge.com>
…s license

* "Check-author\'s-website" license value used by uwsim_bullet breaks the parsing
  of superflore generated recipe, at least use it without apostrophe

Signed-off-by: Martin Jansa <martin.jansa@lge.com>
@shr-project shr-project marked this pull request as ready for review November 6, 2020 18:38
@shr-project
Copy link
Author

@shr-project Atm the PR is in draft state and has no description. Therefore I am not going to review the patch. Please add a very detailed description about what the patch is doing, why it is doing so, what the side effects are, why you think those are justified, etc.

It was in draft state, because I wanted to add separate warning for license values which include multiple licenses (instead of using multiple separate license tags). I've added another commit for that now and added description.

Please let me know if it makes more sense to you now.

On a first look this adds new warnings to a lot of existing packages. This on its own for already released distros is an almost guaranteed no-go.

Are developers used to use https://github.com/fkie/catkin_lint ? Would showing warning about strange license texts in components from released distros be OK if done in catkin_lint?

@cottsay
Copy link
Member

cottsay commented Nov 6, 2020

Would showing warning about strange license texts in components from released distros be OK if done in catkin_lint?

It's certainly not a hill I'm willing to die on, but in my opinion, the warning and error messages coming out of catkin_pkg should be more along the lines of syntax, structure, or schema-level problems in the manifests - problems that could reasonably prevent accurate processing of the manifest by catkin_pkg itself, or another system that depends on catkin_pkg. It isn't a perfect parallel, but I'll note that many Linux distributions document what precise representation should be used to refer to specific licenses, but few enforce this in the packaging system itself (i.e. deb/rpm as opposed to linters and test automation).

To be clear: I think this is an excellent thing to do, I'm just not convinced catkin_pkg is the right tool for the job. I would love to see it in catkin_lint, and I would love to see widespread adoption of catkin_lint in a GitHub Action.

@shr-project
Copy link
Author

I'm working on very similar PR for catkin_lint as well now :).

I only work with ROS when working on meta-ros, I've never used ROS as a regular developer, so I don't know what's the usual work flow for people who write package.xml. If it's common that they run catkin_lint (or that Github Actions runs it for them), then I agreed that's definitely the place where it should be validated.

* when the license value lists multiple licenses (one of currenly used combinations)
* show a link to REP-0149 format definition which explicitly says:
  "For multiple licenses multiple separate tags must be used."
  but unfortunately it doesn't define how to express e.g.
  dual-license or other more complicated scheme, other than pointing
  to <description> tag.
  "For any explanatory text about licensing caveats, please use the <description> tag."

* e.g. OpenEmbedded supports '&' '|' '(' ')' to express more compilcated scheme:
  http://git.openembedded.org/openembedded-core/tree/meta/lib/oe/license.py?id=8e2d0575e4e7036b5f60e632f377a8ab2b96ead8#n42 )
  but that would require change to Package Manifest Format

Signed-off-by: Martin Jansa <martin.jansa@lge.com>
@gavanderhoorn
Copy link

@cottsay wrote:

I would love to see it in catkin_lint, and I would love to see widespread adoption of catkin_lint in a GitHub Action.

industrial_ci supports catkin_lint runs natively and can be used as a GH Action.

@shr-project wrote:

If it's common that they run catkin_lint

can't speak for everyone, but I have quite a few industrial_ci configurations which have CATKIN_LINT=true.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants