Chinese ASTE Dataset

Our dataset serves as an accessible resource for Aspect Sentiment Triplet Extraction (ASTE) on Chinese restaurant reviews from Google Maps.

For further information, please refer to our publication:
Automatic Construction of a Chinese Review Dataset for Aspect Sentiment Triplet Extraction via Iterative Weak Supervision

File Structure

data/raw/raw.csv: It consists of 104358 raw restaurant reviews collected from Google Maps.
data/train/train.json, valid/valid.json, test/test.json: These files respectively contain 64007, 5000, and 5000 restaurant reviews, along with the corresponding processed labels generated by our models.
data/test/test_gold300.json: This file includes a subset of the testing set, consisting of 300 reviews with manually annotated ground truth.
src/mt5_aste.py: python codes of ASTE model.

Data Information

Dataset	Size	Source Model	Golden Answer Provider
train	64007	Self-train-C	None
valid	5000	Rule-Based System	None
test	5000	Union of models	gpt-3.5-turbo
test_gold300	300	Union of models	Labeling worker

Model Usage

Use our dataset to conduct Aspect Sentiment Triplet Extraction (ASTE).

Install packages

Install Python dependencies

pip3 intall -r requirements.txt

Additionally, Mac users can install this package to accelerate PyTorch training (ref)

pip3 install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu

Download the base model

Download mt5-drcd-qa from huggingface (https://huggingface.co/chiawen0104/mt5-drcd-qa)

python3 download.py

Train

Train the mT5 ASTE model

bash train.sh

Inference

Note that please train the model before inference and check your directory path.

bash inference.sh

Citation

@inproceedings{lu-etal-2024-automatic-construction,
    title = "Automatic Construction of a {C}hinese Review Dataset for Aspect Sentiment Triplet Extraction via Iterative Weak Supervision",
    author = "Lu, Chia-Wen  and
      Yang, Ching-Wen  and
      Ma, Wei-Yun",
    editor = "Calzolari, Nicoletta  and
      Kan, Min-Yen  and
      Hoste, Veronique  and
      Lenci, Alessandro  and
      Sakti, Sakriani  and
      Xue, Nianwen",
    booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)",
    month = may,
    year = "2024",
    address = "Torino, Italia",
    publisher = "ELRA and ICCL",
    url = "https://aclanthology.org/2024.lrec-main.167",
    pages = "1871--1882",
    abstract = "Aspect Sentiment Triplet Extraction (ASTE), introduced in 2020, is a task that involves the extraction of three key elements: target aspects, descriptive opinion spans, and their corresponding sentiment polarity. This process, however, faces a significant hurdle, particularly when applied to Chinese languages, due to the lack of sufficient datasets for model training, largely attributable to the arduous manual labeling process. To address this issue, we present an innovative framework that facilitates the automatic construction of ASTE via Iterative Weak Supervision, negating the need for manual labeling, aided by a discriminator to weed out subpar samples. The objective is to successively improve the quality of this raw data and generate supplementary data. The effectiveness of our approach is underscored by our results, which include the creation of a substantial Chinese review dataset. This dataset encompasses over 60,000 Google restaurant reviews in Chinese and features more than 200,000 extracted triplets. Moreover, we have also established a robust baseline model by leveraging a novel method of weak supervision. Both our dataset and model are openly accessible to the public.",
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chinese ASTE Dataset

File Structure

Data Information

Model Usage

Install packages

Download the base model

Train

Inference

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
data		data
src		src
.gitignore		.gitignore
README.md		README.md
download.py		download.py
inference.sh		inference.sh
requirements.txt		requirements.txt
train.sh		train.sh

chiawen0104/chn_review_aste

Folders and files

Latest commit

History

Repository files navigation

Chinese ASTE Dataset

File Structure

Data Information

Model Usage

Install packages

Download the base model

Train

Inference

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages