Skip to content

Latest commit

 

History

History
30 lines (25 loc) · 4.03 KB

README.md

File metadata and controls

30 lines (25 loc) · 4.03 KB

Analysing Release Notes of Python Libraries

logo iittp

What's inside ReleaseNotesStudy Repository?

  1. We have a releaseNotesDS.ipynb, which helps in extracting GitHub urls of Python Libraries or Frameworks sorted based on maximum number of stars and releases and stores it in a csv file.
  2. We also have a dataset python_libraries - python_libraries.csv, which have columns [github link, Link to Release Notes present in Website, Link to GitHub Release Notes, XPath to the link of indivual version's Release Note]
  3. We then have AnalyseReleaseNotes.ipynb, which takes python_libraries - python_libraries.csv as input, and extracts the text from the Release Notes, after preprocessing, embedding, and computing similarity, initial list of relevant sentences to the query is stored in deprecated_information2.csv
  4. This is now again taken as input by postprocessing_results.ipynb to again seperate Deprecation and Replacement related information, and stored in files deprecated_information_post_processed_1.csv and replacement_information_post_processed_1.csv respectively

Steps to run ReleaseNotesStudy

After downloading the source code of this repository, you can create your own dataset by running releaseNotesDS.ipynb file. However you should manually extract relase notes of each library from their official website, which is a time consuming task, so it is recommended to test the approach of analysing release notes on the existing python_libraries - python_libraries.csv file.

Now using the dataset, you can use AnalyseReleaseNotes.ipynb script to collect intial set of relevant data, it will be stored in the file named deprecated_information.csv.

The generated csv file will be taken as an input to postprocessing_results.ipynb script, to further seperate Deprecation and Replacement related information, they will be stored in files deprecated_information_post_processed.csv and replacement_information_post_processed.csv respectively. These are files are further used to obtain insights about quality of Release Notes.

Dependencies

pip install -U selenium
pip install -U pip setuptools wheel
pip install -U spacy
python -m spacy download en_core_web_sm
pip install nltk
pip install tensorflow-hub
pip install -U word_forms

Run the above commands in the terminal to install all the dependencies