This project, developed by The Pataphysical Society of New York (https://patanyc.org), consists of a Nonword Generator and a Text Analyzer. These tools are designed to aid research into the nature of uselessness. Example datasets and texts for analysis are also included. Findings from this research are published at https://patanyc.org/nonwords.
- Generates nonwords of specified lengths (from 1 to 7 letters by default)
- Multithreaded for maximizing performance
- Progress indicator for real-time status
- Option to generate nonwords of a single length or a range of lengths
- Option to output to separate files for each length or a single combined file
- C++11 compatible compiler
- A dictionary file named
words.txt
in the same directory as the executable
Compile the program using the following command:
g++ -std=c++11 -pthread nonword_generator.cpp -o nonword_generator
Run the program with two or three command-line arguments:
./nonword_generator <min_length> <max_length> [single_file]
Examples:
- To generate 5-letter nonwords in a separate file:
./nonword_generator 5 5
- To generate nonwords from 3 to 6 letters in separate files:
./nonword_generator 3 6
- To generate nonwords from 1 to 7 letters in a single file:
./nonword_generator 1 7 single_file
By default, the program creates separate files for each word length, named nonwords_Xletter.txt
. If the single_file
option is used, all nonwords are written to nonwords_all.txt
.
The text_analyzer.py
script is designed to analyze large text files, including the output of the nonword generator or other examplars.
- Calculates the exact word count in the file
- Counts the number of unique words
- Determines the file size in bytes
- Measures and displays the script's runtime
- Python 3.x
-
Save the script as
text_analyzer.py
. -
Run the script from the command line:
python text_analyzer.py path/to/your/large_file.txt
-
The script will process the file and display the results.
- Sample outputs: Research quality datasets of English language nonwords, listed by length (1 to 5 letters), are included in this repository.
- Analysis findings: A complete analysis of the research findings is available at https://patanyc.org/nonwords.html.
- Classic example texts: Several large English texts referred to in the analysis are included for reference.
- The number of possible nonword combinations grows exponentially with word length. Be prepared for long processing times and large output files for longer word lengths.
- 7-letter nonwords can result in a file size of approximately 64GB.
The nonword generator can be customized by adjusting parameters in the code, such as maximum word length, number of threads, and dictionary file location.
The English wordlist used as the default is derived from https://github.com/dwyl/english-words/blob/master/words_alpha.txt and is freely licensable for all uses.
Pataphysics researchers, academics, and laypersons are welcome to contact the authors via the website linked below.
This project is open-source and available under the MIT License.
For more information, visit https://patanyc.org
Never not except when so; if thus, not or but and!
&&&