Your goal is to implement a tool for lexical analysis of text data.
Here’s an example of the programs usage and output with descriptions:
| java Application text1.txt text2.txt
==text1.txt== **name of the analyzed file
Char count: 761674 **number of characters in the file (without white-spaces)**
Word count: 188912 **number of words in the file**
Dict size: 10762 **dictionary size: number of different words used**
Most used words (>1%): [a,and,had..<more>] **all words that make up for more than 1% of the text**
'love' count: 60 **number of times that the word ‘love’ occurred**
'hate' count: 4 **number of times that the word ‘hate’ occurred**
'music' count: 4 **number of times that the word ‘music’ occurred**
vowels %: 38 **what part of all characters are vowels (a,o,i,e,u)**
'a:e count ratio' : 1.54 **the ratio of ‘a’ to ‘e’ occurrences**
[ G -> 2.16] [ R -> 5.36] ....<and the rest> **% of in whole text of all the letters
==text2.txt== **another file… everything like above**
Char count: 955387
Word count: 219044
Dict size: 16957
Most used words (>1%): [a,and,his,in,it,of,that,the,to]
'love' count: 24
'hate' count: 9
'music' count: 4
vowels %: 37
'a:e count ratio' : 1.49
[ M -> 2.44] [ O -> 7.27] [ B -> 1.77] …. <and the rest>
Benchmark time: 1.419 secs **Time it took the program to run**
-
Private methods are not shown on the diagram. You are free to implement any necessary/helpful private methods in any class.
-
Use the Iterator interface from java.util. You don’t have to create your own.
-
The input files will only consist of alphabetic characters and white spaces.
-
The input file will consist both upper and lower case characters. You should use only one case to ensure that the results of the analysis will be consistent.
-
IterableTextinterface. Create the interface with required methods. -
FileContent- class responsible for reading text files and providing its content through implementing IterableText interface. The content will be provided through methodscharIterator()andwordIterator(). -
CharIteratorsuccessively iterates over alphabetic characters (single letters) of the text. It skips all other characters (like white-spaces). Implements next() and hasNext() fromIteratorinterface. NOTE: Chars are represented as Strings for the sake of simplicity. Remember that characters’ case (UPPER or lower) should be consistent in our analysis. -
WordIteratorsuccessively iterates over words of the text. Words in the text are separated with white-spaces. Implementsnext()andhasNext()fromIteratorinterface. -
StatisticalAnalysisis the dataset. It is initialized withIteratorand provides functions for statistical analysis of the text elements (Strings). NOTE: You may choose any data structure to store the processed data. The choice will affect the amount of code you will have to write.-
countOf(String… elems) function returns sum of all occurrences (in the dataset) of the elements given as arguments ( elems). Note: One string may be given as well. The three-dots operator will help you pass variable number of arguments to the function and access them with ease. https://docs.oracle.com/javase/tutorial/java/javaOO/arguments.html#varargs -
dictionarySize()function returns the number of unique elements of the dataset. It’s the size so-called Author’s Dictionary - the set of different words the author of the text uses. -
size()returns the total number of all elements in the dataset. This will be total number of words or total number of characters. -
occurMoreThan(Integer n )returns all the elements that occur more than n _times _ in the dataset
-
-
Viewthis class should be responsible for presenting the data to the user. You are free to implement any functions you find sensible. NOTE: In case of this class you don’t have to follow the UML diagram. -
Applicationmain class of the program. It should allow the user to pass to text files for analysis as CLI arguments. -
Handle only those exceptions which are required to be handled by the functions and methods you use. Custom error messages are not required (printStackTrace() can be used).
-
The structure of packages tree is not restricted by anything but clean code principles.
