Skip to content

This is the repo which helps in doing simple operations in a corpus or a text file. It covers the basic operations like word counts, word frequencies, alphabet counts, alphabetic word frequencies, punctuations, punctuation frequencies, sentence counts, length of longest sentence and shortest as well as the average length of the sentence and diff…

Notifications You must be signed in to change notification settings

Shraddhaduwal/NLP_task

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Task1

This is the repo which helps in doing simple operations in a corpus or a text file. It covers the basic operations like word counts, word frequencies, alphabet counts, alphabetic word frequencies, punctuations, punctuation frequencies, sentence counts, length of longest sentence and shortest as well as the average length of the sentence and different operations in a text file or corpus.

Data

The corpus used, can be found here

Results

All the results are stored in different csv files with sorted list and dictionaries and kept in Results folder. There are unit test functions to individually to confirm if the functions give accurate results. It also shows the execution time of each functions given as follows.

  • 'total_words' 0.00 ms
  • 'total_alphabet_and_punctuation' 1162.16 ms
  • 'word_frequency' 165.80 ms
  • 'alphabet_and_punctuation_frequencies' 352.30 ms
  • 'alphabet_and_punctuation_frequencies' 11.15 ms
  • 'alphabetic_word_frequencies' 132.92 ms
  • 'starting_and_ending_with_vowel' 106.04 ms
  • 'total_sentences' 0.01 ms
  • 'length_of_sentences' 9.96 ms

For unit testing, the execution time for each method is given as follows:

  • 'total_words' 0.00 ms
  • 'total_alphabet_and_punctuation' 0.06 ms
  • 'word_frequency' 0.01 ms
  • 'alphabet_and_punctuation_frequencies' 0.03 ms
  • 'alphabet_and_punctuation_frequencies' 0.00 ms
  • 'alphabetic_word_frequencies' 0.01 ms
  • 'starting_and_ending_with_vowel' 0.01 ms
  • 'total_sentences' 0.00 ms
  • 'length_of_sentences' 0.01 ms

About

This is the repo which helps in doing simple operations in a corpus or a text file. It covers the basic operations like word counts, word frequencies, alphabet counts, alphabetic word frequencies, punctuations, punctuation frequencies, sentence counts, length of longest sentence and shortest as well as the average length of the sentence and diff…

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages