Closed
Description
openedon Mar 18, 2021
While figuring out how the new cell id
fields work in format 4.5, I noticed that they are currently being generated from a word corpus saved in these two files:
- https://github.com/jupyter/nbformat/blob/master/nbformat/corpus/adjectives.txt
- https://github.com/jupyter/nbformat/blob/master/nbformat/corpus/nouns.txt
Does anyone know where these word lists came from? There are some entries in there that are going to produce surprising, problematic, or possibly offensive combinations. I don't know how many employers do random word scans on attachments, but someone can quite easily email a notebook to a colleague, and be unaware that the JSON could contain strings like:
special-marijuana
jewish-holocaust
naked-librarian
- [I'll stop there]
I would strongly vote for one of:
- Replace these lists with something more curated to avoid these issues
- Switch to UUIDs (also allowed by the spec)
- Copy a curated list (license permitting) from some other source, like Docker: https://github.com/moby/moby/blob/master/pkg/namesgenerator/names-generator.go
Activity