Skip to content

Conversation

@arshad-muhammad
Copy link

This commit introduces enhancements to the copyright detection logic within the ScanCode Toolkit. The following changes have been made:

Implemented the normalize_copyright_symbols function to standardize copyright symbols throughout the codebase, converting occurrences of [C] and [c] to (C) to ensure consistent copyright formatting.

Developed unit tests for both the normalize_copyright_symbols function and the detect_copyrights_from_text function to validate their functionality and edge cases.

Created a test suite located in tests/cluecode/test_copyright.py to ensure reliable detection and normalization of copyright statements, contributing to overall code quality and maintainability.

…ormalization to copyrights.py and unit tests passed
…ormalization to copyrights.py and unit tests passed
@arshad-muhammad
Copy link
Author

@pombredanne I have implemented the copyright normalization logic and added unit tests to ensure its reliability. Both tests have passed successfully. The changes are pushed to the develop branch. Please review at your convenience.

@pombredanne
Copy link
Member

You have all the good intentions, but this code is not something we can merge at all, unless integrated in the actual copyright detection flow.

Are you really trying to rewrite copyright detection with regex like in https://github.com/aboutcode-org/scancode-toolkit/pull/3940/files#diff-68f1966d414a58a3a75ae92a79baef800e2085b0b95404b0911483a0b022929dR38 ?

Please take the time to study the current copyright detection.

pass


build_tests(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You realize that this is removing thousands of high value tests?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants