Skip to content

Add NRC lexicon to textdata#11

Merged
EmilHvitfeldt merged 1 commit into
EmilHvitfeldt:masterfrom
juliasilge:master
Jul 15, 2019
Merged

Add NRC lexicon to textdata#11
EmilHvitfeldt merged 1 commit into
EmilHvitfeldt:masterfrom
juliasilge:master

Conversation

@juliasilge
Copy link
Copy Markdown
Contributor

@juliasilge juliasilge commented Jul 12, 2019

I worked on adding the NRC emotion lexicon to textdata this evening, to address #10 and other issues hanging out in tidytext and the tidytext book.

BAD NEWS 😩

The links on the NRC site are all http, not https.

The CRAN policies say this:

Downloads of additional software or data as part of package installation or startup should only use secure download mechanisms (e.g., ‘https’ or ‘ftps’).

What do you think our options are? Any ideas?

@EmilHvitfeldt
Copy link
Copy Markdown
Owner

The pull request looks great!

I think our best options would be to see it is it possible for them to provide https downloads. Might be a high order, but the rest of their site is https so I don't know.
Or ask them if some redistribution would be okay.

@juliasilge
Copy link
Copy Markdown
Contributor Author

Yeah, that would be the best option; I'll write back via email and see if they are open to doing that.

@juliasilge
Copy link
Copy Markdown
Contributor Author

Actually, I am reading the CRAN policies again:

Downloads of additional software or data as part of package installation or startup should only use secure download mechanisms (e.g., ‘https’ or ‘ftps’).

(Bolding is mine.) Could it be argued that these downloads are not part of installation or startup? They are prompted by the user instead. Thoughts? I could also loop in some rOpenSci or other experts on this.

@EmilHvitfeldt
Copy link
Copy Markdown
Owner

That is a good catch! I think it could be argued that it not part of installation or startup.
furthermore it would be trivial to include http/https/ftps information in the download prompt.

@EmilHvitfeldt
Copy link
Copy Markdown
Owner

I'm comfortable with including it under the new reading. If you want you loop someone in I'll wait to merge.

@juliasilge
Copy link
Copy Markdown
Contributor Author

I just posted on the rOpenSci Slack to see if anyone has had relevant experience. I imagine people won't see it until tomorrow. Let's wait to see if anybody has run into something similar or has insight, but from a plain reading, this does seem like it is in line with the policies.

The links currently say whether they are http or https to the user, but it may be good to call out the download method info more explicitly in the prompt.

@juliasilge
Copy link
Copy Markdown
Contributor Author

Expert-type folks on the rOpenSci Slack seem fairly unanimous that http should be OK in this situation, a download prompted by the user but not part of package installation. From my perspective, this is good to go (merge).

I do think the change suggested in #12 is a good idea still as well.

@EmilHvitfeldt EmilHvitfeldt merged commit 2bf7a04 into EmilHvitfeldt:master Jul 15, 2019
@EmilHvitfeldt
Copy link
Copy Markdown
Owner

Sounds good! I'll merge and get working on #12.
Thanks! 🎉

@fantasycz
Copy link
Copy Markdown

@juliasilge Hi Juliasilge, I am wondering what I should I change if I want to use get_sentiments("nrc"). So far, when I run this, it still throws the error, Error in match.arg(lexicon): 'arg' should be one of “afinn”, “bing”, “loughran”. Thanks!

@juliasilge
Copy link
Copy Markdown
Contributor Author

Thanks for asking this question @fantasycz! 🙌

As of today, the NRC lexicon is available within tidytext again. You will need to install the development versions of both textdata and tidytext, and then all will work as before.

library(remotes)
install_github("EmilHvitfeldt/textdata")
install_github("juliasilge/tidytext")

After these are installed, you can access the NRC lexicon the way as you did previously:

library(tidytext)
get_sentiments("nrc")

We will get both of these updates on CRAN soon. 🎉

@fantasycz
Copy link
Copy Markdown

fantasycz commented Jul 20, 2019 via email

@juliasilge
Copy link
Copy Markdown
Contributor Author

Hmmmm, sounds like you don't actually have the updated version of tidytext installed, because "nrc" is in fact one of the arguments again. Want to try installing again, with force = TRUE?

remotes::install_github("juliasilge/tidytext", force = TRUE)
remotes::install_github("EmilHvitfeldt/textdata", force = TRUE)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants