-
Notifications
You must be signed in to change notification settings - Fork 6k
Updated dataset in Sentiment Analysis #11114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @JRAlexander . Minor changes needed.
Thanks, @luisquintanilla! I believe I've addressed your concerns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making the changes. One last change. Update the name of the data file to match that of the code sample yelp_labelled.txt.
1. Download the [Wikipedia detox-250-line-data.tsv](https://github.com/dotnet/machinelearning/blob/master/test/data/wikipedia-detox-250-line-data.tsv) and the [wikipedia-detox-250-line-test.tsv](https://github.com/dotnet/machinelearning/blob/master/test/data/wikipedia-detox-250-line-test.tsv) data sets and save them to the *Data* folder previously created. The first dataset trains the machine learning model and the second can be used to evaluate how accurate your model is. | ||
1. Download [The UCI Sentiment Labeled Sentences dataset zip file (see citations in the following note)](https://archive.ics.uci.edu/ml/machine-learning-databases/00331/sentiment%20labelled%20sentences.zip), and unzip. | ||
|
||
2. Copy the `yelp_labeled.txt` file into the *Data* directory you created. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change file name to match the name in the code sample -> yelp_labelled.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch.
Updated dataset from wikidetox to UCI Sentiment in Sentiment Analysis tutorial to improve model prediction, and updated to version 0.11.0.
Related Code Sample
Internal Review Link
Fixes #7024
Fixes #10080
Fixes #10106
Fixes #11054