This script analyzes all the words in a file and makes a statistical analysis of it.
Linux:
Only need to download the script, the StopWords folder and then make it executable by writing in the terminal
"chmod u+x wordStats.sh"
Command:
./wordStats.sh <MODE> <INPUT> <ISO3166>
./word_stats.sh Cc|Pp|Tt INPUT [iso3166]
Ps: make sure you are in the same directory of the script and StopWords folder
Modes:
- C : performs the count of occurrences of each word with stop-words, saving the list in a text file
- c : performs the count of occurrences of each word without stop-words, saving the list in a text file
- P : performs the count of occurrences of each word, producing a bar chart with the N words that occur most frequently (stop-words included)
- p : performs the count of occurrences of each word, producing a bar chart with the N words that occur most frequently (without stop-words)
- T : performs the count of occurrences of each word showing the Top words with stop-words
- t : performs the count of occurrences of each word showing the Top words withot stop-words
The last 4 modes need an environment variable called WORD_STATS_TOP so you can define it in your machine with the value you want (digit) or if you dont the script will assume 10 by default.
Input:
- the file you want to be analyzed
ISO:
- pt : portuguese stop-words
- en : english stop-words
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.