forked from chengchingwen/Transformers.jl
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
91a3fe0
commit de3e9a3
Showing
1 changed file
with
51 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
# Tutorial for huggingface users from Python | ||
|
||
Text classification is a common NLP task that assigns a label or class to text. Some of the largest companies run text classification in production for a wide range of practical applications. One of the most popular forms of text classification is sentiment analysis, which assigns a label like 🙂 positive, 🙁 negative, or 😐 neutral to a sequence of text. | ||
|
||
This guide will show you how to: | ||
|
||
1. Finetune [DistilBERT](https://huggingface.co/distilbert-base-uncased) on the [IMDb](https://huggingface.co/datasets/imdb) dataset to determine whether a movie review is positive or negative. | ||
2. Use your finetuned model for inference. | ||
|
||
## Installation | ||
|
||
First, install the `Transformers.jl` package by running the following command: | ||
|
||
```julia | ||
using Pkg | ||
Pkg.add("Transformers") | ||
``` | ||
|
||
Secondly, install the `HuggingFaceDatasets.jl` package by running the following command: | ||
|
||
```julia | ||
using Pkg | ||
Pkg.add("HuggingFaceDatasets") | ||
``` | ||
|
||
The next step is to load a DistilBERT tokenizer to preprocess the `text` field: | ||
|
||
```julia | ||
using Transformers | ||
using Transformers.TextEncoders | ||
using Transformers.HuggingFace | ||
|
||
tokenizer = HuggingFace.load_tokenizer("distilbert-base-uncased") | ||
``` | ||
|
||
## Load dataset | ||
|
||
|
||
### Start by loading the IMDb dataset from the 🤗 Datasets library: | ||
|
||
```julia | ||
train_data = load_dataset("imdb", split="train").with_format("julia") | ||
test_data = load_dataset("imdb", split="test").with_format("julia") | ||
|
||
train_data[1] | ||
``` | ||
|
||
|
||
|
||
|
||
source: https://huggingface.co/docs/transformers/en/tasks/sequence_classification |