Skip to content

Update dataset usage and to version 0.11 for Sentiment Analysis #680

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 8, 2019

Conversation

JRAlexander
Copy link
Contributor

@JRAlexander JRAlexander commented Feb 28, 2019

Revising with UCI Sentiment Dataset for greater model accuracy, and updating to 0.11.0.

@JRAlexander JRAlexander added the 🚧 Hold for related PR Indicates a PR can only be merged when other related PRs are merged (see comments for links) label Feb 28, 2019
@JRAlexander JRAlexander self-assigned this Feb 28, 2019
@JRAlexander JRAlexander changed the title Update dataset usage for Sentiment Analysis Update dataset usage and to version 0.11 for Sentiment Analysis Mar 7, 2019
@prathyusha12345
Copy link
Contributor

prathyusha12345 commented Mar 8, 2019

@JRAlexander I tested the PR.working fine. But I have questions on the output.
1.The accuracy metrics is 79%. - I believe accuracy should be more than 85%.Correct me if I am wrong.
2. See the probabilities in below image. Howcome a text with probability <50% is showing prediction correctly?
image

I believe that probability also should be > 50% to get prediction correctly. Correct me if I am wrong.

@CESARDELATORRE Adding Cesar for reference.

@JRAlexander
Copy link
Contributor Author

It leaves the door open for improving the model. This is the dataset we are using. It has better accuracy than using wikidetox.

@CESARDELATORRE
Copy link
Contributor

Well, a rule of thumb is to have an accuracy higher than 80%. However, depending on the business/domain it might need to be a lot higher or even be acceptable to be lower.

When using small datasets it is normal in most cases that the accuracy won't be very super high, neither the probabilities.

Since we're using small datasets for short trainings, I'd suggest not to show the probability in the code or output..

@JRAlexander
Copy link
Contributor Author

I'm going to leave it in there with an explanation about the small data

Copy link
Contributor

@luisquintanilla luisquintanilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good @JRAlexander. Added a suggestion. I'll leave the decision to implement or not up to you.

@JRAlexander
Copy link
Contributor Author

Added your suggestion as issue here dotnet/docs#11221

@JRAlexander JRAlexander merged commit 545ca31 into dotnet:master Mar 8, 2019
@JRAlexander
Copy link
Contributor Author

Thanks, @luisquintanilla!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🚧 Hold for related PR Indicates a PR can only be merged when other related PRs are merged (see comments for links)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants