copyright_detection_ner_model

Basic pipeline to generate a copyright texts detection model from SPACY NER

An atempt to create a model exclusively to detect the literal copyright texts present in each source code.

Installation

copyright_detection_ner_model requires python v3.10+ , scancode v32.3.2 to run.

download multiple packages into the input folder and use extractcode to unpack the archive files

extractcode --shallow --replace-originals input/your_archive

python -m venv venv && source venv/bin/activate
git clone git@github.com:dineshr93/copyright_detection_ner_model.git && cd copyright_detection_ner_model && \
pip install -r requirements.txt
make b #starts the pipeline

NER Model Training Flow

License

AGPL-3.0+

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.devcontainer		.devcontainer
data		data
input		input
misc		misc
model		model
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
get_training_data_from_scancode.py		get_training_data_from_scancode.py
prepare_ner_data.py		prepare_ner_data.py
requirements.txt		requirements.txt
test_ner_model.py		test_ner_model.py
test_ner_model_UI.py		test_ner_model_UI.py
test_ner_model_UI_openai.py		test_ner_model_UI_openai.py
train_ner_model.py		train_ner_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

copyright_detection_ner_model

Basic pipeline to generate a copyright texts detection model from SPACY NER

Installation

NER Model Training Flow

License

About

Releases

Packages

Languages

dineshr93/copyright_detection_ner_model

Folders and files

Latest commit

History

Repository files navigation

copyright_detection_ner_model

Basic pipeline to generate a copyright texts detection model from SPACY NER

Installation

NER Model Training Flow

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages