Where did the adjective/noun corpus come from?

While figuring out how the new cell `id` fields work in format 4.5, I noticed that they are currently being generated from a word corpus saved in these two files:

- https://github.com/jupyter/nbformat/blob/master/nbformat/corpus/adjectives.txt
- https://github.com/jupyter/nbformat/blob/master/nbformat/corpus/nouns.txt

Does anyone know where these word lists came from?  There are some entries in there that are going to produce surprising, problematic, or possibly offensive combinations.  I don't know how many employers do random word scans on attachments, but someone can quite easily email a notebook to a colleague, and be unaware that the JSON could contain strings like:

 - `special-marijuana`
 - `jewish-holocaust`
 - `naked-librarian`
 - [I'll stop there]

I would _strongly_ vote for one of:
  - Replace these lists with something more curated to avoid these issues
  - Switch to UUIDs (also allowed by the spec)
  - Copy a curated list (license permitting) from some other source, like Docker: https://github.com/moby/moby/blob/master/pkg/namesgenerator/names-generator.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Where did the adjective/noun corpus come from? #216

seibert
openedon Mar 18, 2021

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Where did the adjective/noun corpus come from? #216

Description

seibertopenedon Mar 18, 2021

Activity

Metadata