An atempt to create a model exclusively to detect the literal copyright texts present in each source code.
copyright_detection_ner_model requires python v3.10+ , scancode v32.3.2 to run.
download multiple packages into the input folder and use extractcode to unpack the archive files
extractcode --shallow --replace-originals input/your_archive
python -m venv venv && source venv/bin/activate
git clone git@github.com:dineshr93/copyright_detection_ner_model.git && cd copyright_detection_ner_model && \
pip install -r requirements.txt
make b #starts the pipeline
Copyright (c) 2025 Dinesh Ravi