Skip to content

Version 0.7.0 significantly slower than 0.6.1 #19

Open
@virtustate

Description

We've been using this great module in a larger analysis for months: thank you for making it available! Here's the relevant code:

corpus = reviews['review_text']
profanity.add_censor_words([x.lower() for x in other_stop_words])
corpus = corpus.apply(profanity.censor, args=(' ',))

The statement corpus.apply(profanity.censor, args=(' ',)) is taking a couple orders of magnitude longer using version 0.7.0 than 0.6.1. Here are some timings with everything the same other than the better_profanity version. "Time to apply profanity" is for just corpus = corpus.apply(profanity.censor, args=(' ',))

better_profanity=0.7.0

This product is named Oasis High-Waisted Pocket Capri
Begin by noting there are 1669 reviews for this product
Time to apply profanity: 95.60132622718811
Time it takes to run this LDA model: 97.7805495262146
{'size', 'material', 'working', 'comfortable', 'waist', 'length', 'fit', 'fabric', 'soft'}
1 of 122 products' review sets remodeled

This product is named Ryan Built In Bra Tank II
Begin by noting there are 427 reviews for this product
Time to apply profanity: 29.865559816360474
Time it takes to run this LDA model: 30.556731939315796
{'cute', 'size', 'top', 'comfortable', 'fit'}
2 of 122 products' review sets remodeled

This product is named Oasis High-Waisted Pocket 7/8
Begin by noting there are 10934 reviews for this product
Time to apply profanity: 710.3287241458893
Time it takes to run this LDA model: 726.3966491222382
{'pocket', 'comfortable', 'feel', 'waist', 'fit', 'see', 'color', 'soft'}
3 of 122 products' review sets remodeled

This product is named High-Waisted Ultracool Side Stripe Crop
Begin by noting there are 168 reviews for this product
Time to apply profanity: 10.014711618423462
Time it takes to run this LDA model: 10.347350835800171
{'comfortable', 'feel', 'waist', 'size'}
4 of 122 products' review sets remodeled

This product is named Oasis High-Waisted Twist 7/8
Begin by noting there are 1750 reviews for this product
Time to apply profanity: 121.31187510490417
Time it takes to run this LDA model: 123.6646056175232
{'cute', 'style', 'size', 'material', 'comfortable', 'detail', 'bit', 'fit', 'color', 'soft'}
5 of 122 products' review sets remodeled

better_profanity=0.6.1

This product is named Oasis High-Waisted Pocket Capri
Begin by noting there are 1669 reviews for this product
Time to apply profanity: 0.19291996955871582
Time it takes to run this LDA model: 4.058649063110352
{'size', 'material', 'working', 'comfortable', 'waist', 'length', 'fit', 'fabric', 'soft'}
1 of 122 products' review sets remodeled

This product is named Ryan Built In Bra Tank II
Begin by noting there are 427 reviews for this product
Time to apply profanity: 0.05718731880187988
Time it takes to run this LDA model: 0.7385601997375488
{'cute', 'size', 'top', 'comfortable', 'fit'}
2 of 122 products' review sets remodeled

This product is named Oasis High-Waisted Pocket 7/8
Begin by noting there are 10934 reviews for this product
Time to apply profanity: 1.264852523803711
Time it takes to run this LDA model: 17.08655619621277
{'pocket', 'size', 'comfortable', 'waist', 'fit', 'around', 'color', 'amazing'}
3 of 122 products' review sets remodeled

This product is named High-Waisted Ultracool Side Stripe Crop
Begin by noting there are 168 reviews for this product
Time to apply profanity: 0.018624067306518555
Time it takes to run this LDA model: 0.34430885314941406
{'comfortable', 'feel', 'waist', 'size'}
4 of 122 products' review sets remodeled

This product is named Oasis High-Waisted Twist 7/8
Begin by noting there are 1750 reviews for this product
Time to apply profanity: 0.2005002498626709
Time it takes to run this LDA model: 2.5792129039764404
{'cute', 'style', 'size', 'material', 'comfortable', 'detail', 'bit', 'fit', 'color', 'soft'}
5 of 122 products' review sets remodeled

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions