OneShot is all you need

Tired of creating text preprocessing pipelines all the time? Don't worry, we got you covered! OneShot is a python textual data preprocessing library made just for you. Just import and feel the power ⚡. Oneshot doesn't only provide you with functionality to clean your data, it also provides with numerical information present in the data. Many of the processes required for cleaning text is accessible. If you have any suggestions to make, you can connect with me by clicking the icons below. PR's are most welcome. ❤️

Dependencies 🚒

---------------------------------------
pip install spacy==2.2.3
python -m spacy download en_core_web_sm
pip install beautifulsoup4==4.9.1
pip install textblob==0.15.3
---------------------------------------

Install ⬇️

https://github.com/nakshatrasinghh/OneShot.git

Uninstall 👋

pip uninstall OneShot

How to use it? 🤔

Use this function to clean your dataset

----------------------------------------------------------
def get_clean(x):
    x = str(x).lower().replace('\\', '').replace('_', ' ')
    x = osx.cont_exp(x)
    x = osx.remove_emails(x)
    x = osx.remove_urls(x)
    x = osx.remove_html_tags(x)
    x = osx.remove_rt(x)
    x = osx.remove_mentions(x)
    x = osx.remove_accented_chars(x)
    x = osx.remove_special_chars(x)
    x = osx.remove_stopwords(x)
    x = osx.remove_dups_char(x)
    x = re.sub("(.)\\1{2,}", "\\1", x)
    return x
----------------------------------------------------------

Use this function to get numeric features, but wait is'nt this cumbersome? We have a surprise for you.

----------------------------------------------------------
def numeric_feats(x):
    x = osx.get_wordcounts(x)
    x = osx.get_charcounts(x)
    x = osx.get_avg_wordlength(x)
    x = osx.get_stopwords_counts(x)
    x = osx.get_hashtag_counts(x)
    x = osx.get_mentions_counts(x)
    x = osx.get_uppercase_counts(x)
    x = osx.get_digit_counts(x)
    x = re.sub("(.)\\1{2,}", "\\1", x)
    return x
----------------------------------------------------------

Surprise 🤗

Use this ONE function to get all the numeric features mentioned above.

def numeric_feats(x):
    x = osx.get_basic_features(x)
    return x

Use it indivitually 🧨

---------------------------------------
import pandas as pd
import OneShot as osx

df = pd.read_csv('imdb_reviews.txt', sep = '\t', header = None)
df.columns = ['reviews', 'sentiment']

#@ you're -> you are; i'm -> i am
df['reviews'] = df['reviews'].apply(lambda x: osx.cont_exp(x))
#@ removes emails
df['reviews'] = df['reviews'].apply(lambda x: osx.remove_emails(x))
#@ removes html tags
df['reviews'] = df['reviews'].apply(lambda x: osx.remove_html_tags(x))
#@ removes urls
df['reviews'] = df['reviews'].apply(lambda x: osx.remove_urls(x))
#@ removes special characters
df['reviews'] = df['reviews'].apply(lambda x: osx.remove_special_chars(x))
#@ removes accented characters
df['reviews'] = df['reviews'].apply(lambda x: osx.remove_accented_chars(x))
#@ converts root word to its base word
df['reviews'] = df['reviews'].apply(lambda x: osx.make_base(x))
#@ corrects the spelling using textblob
df['reviews'] = df['reviews'].apply(lambda x: osx.spelling_correction(x).raw_sentences[0]) 
---------------------------------------

Note: Avoid to use make_base and spelling_correction for very large dataset otherwise it might take hours to process.

Extras 😎

----------------------------------
import re
x = 'lllooooovvveeee youuuu'
x = re.sub("(.)\\1{2,}", "\\1", x)
print(x)
---
love you
----------------------------------

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.github		.github
OneShot		OneShot
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
example.ipynb		example.ipynb
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OneShot is all you need

Dependencies 🚒

Install ⬇️

Uninstall 👋

How to use it? 🤔

Surprise 🤗

Use it indivitually 🧨

Extras 😎

About

Releases

Sponsor this project

Packages

Languages

License

nakshatrasinghh/OneShot

Folders and files

Latest commit

History

Repository files navigation

OneShot is all you need

Dependencies 🚒

Install ⬇️

Uninstall 👋

How to use it? 🤔

Surprise 🤗

Use it indivitually 🧨

Extras 😎

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Sponsor this project

Packages 0

Languages

Packages