Skip to content

Updated dataset in Sentiment Analysis #11114

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 8, 2019

Conversation

JRAlexander
Copy link
Contributor

@JRAlexander JRAlexander commented Mar 6, 2019

Updated dataset from wikidetox to UCI Sentiment in Sentiment Analysis tutorial to improve model prediction, and updated to version 0.11.0.

Related Code Sample

Internal Review Link

Fixes #7024
Fixes #10080
Fixes #10106
Fixes #11054

@JRAlexander JRAlexander added doc-update 🚧 Hold for related PR Indicates a PR can only be merged when other related PRs are merged (see comments for links) labels Mar 6, 2019
@JRAlexander JRAlexander added this to the March 2019 milestone Mar 6, 2019
@JRAlexander JRAlexander self-assigned this Mar 6, 2019
@JRAlexander JRAlexander changed the title Updated dataset in Sentiment Analysis [WIP] Updated dataset in Sentiment Analysis Mar 6, 2019
Copy link
Contributor

@luisquintanilla luisquintanilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @JRAlexander . Minor changes needed.

@JRAlexander
Copy link
Contributor Author

Thanks, @luisquintanilla! I believe I've addressed your concerns.

@JRAlexander JRAlexander changed the title [WIP] Updated dataset in Sentiment Analysis Updated dataset in Sentiment Analysis Mar 7, 2019
Copy link
Contributor

@luisquintanilla luisquintanilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the changes. One last change. Update the name of the data file to match that of the code sample yelp_labelled.txt.

1. Download the [Wikipedia detox-250-line-data.tsv](https://github.com/dotnet/machinelearning/blob/master/test/data/wikipedia-detox-250-line-data.tsv) and the [wikipedia-detox-250-line-test.tsv](https://github.com/dotnet/machinelearning/blob/master/test/data/wikipedia-detox-250-line-test.tsv) data sets and save them to the *Data* folder previously created. The first dataset trains the machine learning model and the second can be used to evaluate how accurate your model is.
1. Download [The UCI Sentiment Labeled Sentences dataset zip file (see citations in the following note)](https://archive.ics.uci.edu/ml/machine-learning-databases/00331/sentiment%20labelled%20sentences.zip), and unzip.

2. Copy the `yelp_labeled.txt` file into the *Data* directory you created.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change file name to match the name in the code sample -> yelp_labelled.txt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

@JRAlexander JRAlexander merged commit d21e422 into dotnet:master Mar 8, 2019
@mairaw mairaw removed the 🚧 Hold for related PR Indicates a PR can only be merged when other related PRs are merged (see comments for links) label Nov 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants