Skip to content

Commit

Permalink
Update dataset/README.md with documentation on corpus [resolves #235]
Browse files Browse the repository at this point in the history
  • Loading branch information
dafajon committed Apr 15, 2021
1 parent 32a2dc3 commit 727c0b9
Showing 1 changed file with 25 additions and 0 deletions.
25 changes: 25 additions & 0 deletions sadedegel/dataset/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -315,6 +315,31 @@ test = load_hotel_sentiment_test()
test_label = load_hotel_sentiment_test_label()
```

## `product_sentiment`

This corpus contains 11426 instance of product reviews annotated by a sentiment label from set of `['POSITIVE', 'NEGATIVE', 'NEUTRAL']` sentiments. Dataset [source](https://www.kaggle.com/burhanbilenn/duygu-analizi-icin-urun-yorumlari/version/1)

### Using corpus
```python
from sadedegel.dataset.product_sentiment import load_product_sentiment_train
from sadedegel.dataset.product_sentiment import CLASS_VALUES

import pandas as pd

raw = load_product_sentiment_train()

next(raw)

# Out [0]: {text: "ses kalitesi ve ergonomisi rezalet, sony olduğu için aldım ama 4'de 1 fiyatına çin replika ürün alsaydım çok çok daha iyiydi, kesinlikle tavsiye etmiyorum."
# sentiment_class: 0}

df = pd.DataFrame().from_records(raw)

CLASS_VALUES[df.sentiment_class.iloc[0]]

# Out [1]: 'NEGATIVE'
```

## `categorized_product_sentiment`

This corpus contains 5600 instances of customer product reviews from E-commerce sites. Reviews contain two sets of class labels. First label is `sentiment_class` which contains `[POSITIVE, NEGATIVE]` sentiment of the review. Second label is `product_category` which contains `["Kitchen", "DVD", "Books", "Electronics"]` as the category of the product being reviewed. Each product category contains 1400 instances. The dataset is material to the research [paper](https://sentic.net/wisdom2013pechenizkiy.pdf) by Demirtaş and Pechenizkiy.
Expand Down

0 comments on commit 727c0b9

Please sign in to comment.