Corpora used from Gutenberg- Speeches & Letters of Abraham Lincoln,1832-1865 and Mark Twain’s Letters and Speeches 1901-1906
Step 1- Preprocessing(Tokenization,Normalization,Stemming, Lemmatization)
Step 2- Frequency Distribution(Selected top 50 words,bigrams and trigrams)
Step 3- Comparison of author writing styles based on results found in step 2