TitleWave

Overview

Stack Overflow is ubiquitous in the programming world as a place where people can ask questions to a community other programmers. In 2019, over 5,000 questions a day were asked, but only 70% of those were answered. To increase your chance of getting an answer it’s really important to have a compelling title so that people actually click on your question. But this can be tough, especially for new users who aren’t familiar with the conventions on the website. To solve this problem, I built TitleWave, a Chrome extension that integrates directly into the Stack Overflow website and helps improve your title.

The algorithm leverages deep learning and natural language processing (NLP) to summarize the key details of your question, and phrases it in a way that is consistent with previously successful questions on similar topics. The improved title quality will make it easier for experts to notice your question, contributing to faster, higher quality answers.

Install the Chrome extension

I am hoping to make this extension available on the Chrome web store in the near future. Until then, you can use the development version by following these instructions:

Clone this repository (really you just need the chrome_extension folder).
In Google Chrome, open the Extension Management page by navigating to chrome://extensions.
Enable Developer Mode by clicking the toggle switch in the top right corner.
Click the LOAD UNPACKED button and select the chrome_extension folder you downloaded in step 1.
Navigate to https://stackoverflow.com/questions/ask and if the extension is working you'll see two new buttons below the title entry box.

Try out the models on Huggingface

You can also try out the tool without downloading anything by using the Huggingface Inference API. Click HERE for the classification model (gives the probability of getting an answer), and click HERE for the summarization model (suggests a title, given the body of the question).

Retrain the models

If you're interested in retraining the models from scratch, see the Python scripts in the model_training folder. Here are the steps I took:

Download the dataset of Stack Overflow posts from https://archive.org/download/stackexchange/stackoverflow.com-Posts.7z (currently ~16 GB compressed, ~90 GB uncompressed).
Preprocess the text (removing HTML tags and codeblocks), and load the dataset into a MongoDB collection (see xml_to_mongo.py).
Partition the dataset into train, validation, and test sets for each of the two models (see partition_dataset.py).
Fine-tune a classification model, starting from bert-base-uncased (see train_classifier.py)
Fine-tune a summarization model, starting from t5-small (see train_summarizer.py)
Analyze the performance of these models on a test set (see test_classifier.ipynb and test_summarizer.ipynb)

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
chrome_extension		chrome_extension
model_training		model_training
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TitleWave

Overview

Install the Chrome extension

Try out the models on Huggingface

Retrain the models

About

Releases

Packages

Languages

License

cyjme/TitleWave

Folders and files

Latest commit

History

Repository files navigation

TitleWave

Overview

Install the Chrome extension

Try out the models on Huggingface

Retrain the models

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages