Add NRC lexicon to textdata#11
Conversation
|
The pull request looks great! I think our best options would be to see it is it possible for them to provide https downloads. Might be a high order, but the rest of their site is https so I don't know. |
|
Yeah, that would be the best option; I'll write back via email and see if they are open to doing that. |
|
Actually, I am reading the CRAN policies again:
(Bolding is mine.) Could it be argued that these downloads are not part of installation or startup? They are prompted by the user instead. Thoughts? I could also loop in some rOpenSci or other experts on this. |
|
That is a good catch! I think it could be argued that it not part of installation or startup. |
|
I'm comfortable with including it under the new reading. If you want you loop someone in I'll wait to merge. |
|
I just posted on the rOpenSci Slack to see if anyone has had relevant experience. I imagine people won't see it until tomorrow. Let's wait to see if anybody has run into something similar or has insight, but from a plain reading, this does seem like it is in line with the policies. The links currently say whether they are http or https to the user, but it may be good to call out the download method info more explicitly in the prompt. |
|
Expert-type folks on the rOpenSci Slack seem fairly unanimous that http should be OK in this situation, a download prompted by the user but not part of package installation. From my perspective, this is good to go (merge). I do think the change suggested in #12 is a good idea still as well. |
|
Sounds good! I'll merge and get working on #12. |
|
@juliasilge Hi Juliasilge, I am wondering what I should I change if I want to use get_sentiments("nrc"). So far, when I run this, it still throws the error, Error in match.arg(lexicon): 'arg' should be one of “afinn”, “bing”, “loughran”. Thanks! |
|
Thanks for asking this question @fantasycz! 🙌 As of today, the NRC lexicon is available within tidytext again. You will need to install the development versions of both textdata and tidytext, and then all will work as before. After these are installed, you can access the NRC lexicon the way as you did previously: We will get both of these updates on CRAN soon. 🎉 |
|
Hi Julia,
Thank you for your reply and your work for NRC lexicon. I tried the way you
said. Install
install_github("EmilHvitfeldt/textdata")
install_github("juliasilge/tidytext")
Then
get_sentiments("nrc").
However, it stills showed error
Error in match.arg(lexicon): 'arg' should be one of “afinn”, “bing”, “loughran”.
Below is my code, am I missing something?
one_star_word_df <- valence_df_na %>%
filter(overallRating == '1') %>%
unnest_tokens(word, headline, token = "words", format = "text") %>%
inner_join(get_sentiments("nrc")) %>%
rename(NRC_sentiment = sentiment) %>%
mutate(NRC_sentiment = as.factor(NRC_sentiment))
The whole error is
Error in match.arg(lexicon): 'arg' should be one of “afinn”, “bing”, “loughran”
Traceback:
1. valence_df_na %>% filter(overallRating == "1") %>% unnest_tokens(word,
. headline, token = "words", format = "text") %>%
inner_join(get_sentiments("nrc")) %>%
. rename(NRC_sentiment = sentiment) %>% mutate(NRC_sentiment =
as.factor(NRC_sentiment))
2. withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
3. eval(quote(`_fseq`(`_lhs`)), env, env)
4. eval(quote(`_fseq`(`_lhs`)), env, env)
5. `_fseq`(`_lhs`)
6. freduce(value, `_function_list`)
7. function_list[[i]](value)
8. inner_join(., get_sentiments("nrc"))
9. inner_join.data.frame(., get_sentiments("nrc"))
10. as.data.frame(inner_join(tbl_df(x), y, by = by, copy = copy,
. ...))
11. inner_join(tbl_df(x), y, by = by, copy = copy, ...)
12. inner_join.tbl_df(tbl_df(x), y, by = by, copy = copy, ...)
13. check_valid_names(tbl_vars(y))
14. tbl_vars(y)
15. new_sel_vars(tbl_vars_dispatch(x), group_vars(x))
16. structure(vars, groups = group_vars, class = c("dplyr_sel_vars",
. "character"))
17. tbl_vars_dispatch(x)
18. get_sentiments("nrc")
19. match.arg(lexicon)
20. stop(gettextf("'arg' should be one of %s", paste(dQuote(choices),
. collapse = ", ")), domain = NA)
Thank you very much.
Best,
Zhen
…On Fri, Jul 19, 2019 at 3:00 PM Julia Silge ***@***.***> wrote:
Thanks for asking this question @fantasycz <https://github.com/fantasycz>!
🙌
As of today, the NRC lexicon is available within tidytext again. You will
need to install the development versions of both textdata and tidytext, and
then all will work as before.
library(remotes)
install_github("EmilHvitfeldt/textdata")
install_github("juliasilge/tidytext")
After these are installed, you can access the NRC lexicon the way as you
did previously:
library(tidytext)
get_sentiments("nrc")
We will get both of these updates on CRAN soon. 🎉
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#11?email_source=notifications&email_token=ACFIH5P3CXFU7Z2YTTUH3GTQAI2QTA5CNFSM4ICO3WM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2M4CWQ#issuecomment-513392986>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACFIH5JGKAKPEIJ5IMXLQJTQAI2QTANCNFSM4ICO3WMQ>
.
--
Zhen Chen
Electrical Engineering & Computer Science
University of California Irvine
Irvine, CA 92617
|
|
Hmmmm, sounds like you don't actually have the updated version of tidytext installed, because remotes::install_github("juliasilge/tidytext", force = TRUE)
remotes::install_github("EmilHvitfeldt/textdata", force = TRUE) |
I worked on adding the NRC emotion lexicon to textdata this evening, to address #10 and other issues hanging out in tidytext and the tidytext book.
BAD NEWS 😩
The links on the NRC site are all http, not https.
The CRAN policies say this:
What do you think our options are? Any ideas?