Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] case statements needed for grouped regression #26

Open
alexhallam opened this issue Aug 11, 2018 · 4 comments
Open

[Feature Request] case statements needed for grouped regression #26

alexhallam opened this issue Aug 11, 2018 · 4 comments
Labels
feature a feature request or enhancement

Comments

@alexhallam
Copy link

alexhallam commented Aug 11, 2018

I use grouped regressions often (aka Many Models). Though your package makes it mostly painless to create grouped model sql code, maybe it would be nice to have a convenience function to do this.

Hopefully the code below illustrates what I am thinking.

# load libs
library(tidyverse)
library(tidypredict)
library(gapminder)

# make model function
country_model <- function(df) {
  lm(lifeExp ~ year, data = df)
}

# nest data for grouped regression
by_country <- gapminder %>% 
  group_by(country, continent) %>% 
  nest() %>% 
  mutate(model = map(data, country_model)) %>% 
  # use tidypredict to make sql strings by model
  mutate(sql_txt = map(model,~tidypredict_sql(., dbplyr::simulate_mssql()))) %>% 
  # add some case statements -- more work needed here
  mutate(case_string = paste0("WHEN COUNTRY = ", "'", country, "'", " THEN ")) %>% 
  # paste case statements to model sql
  mutate(full_string = paste0(case_string, sql_txt))
  
# paste all of the models together with case statements and return one big glob
# of text for sql
by_country %>% 
  unnest(full_string) %>% 
  summarise(full_sql_text = toString(full_string)) %>% 
  as.list()

The first few entries are shown below. I am sure there are many ways to implement this in a cleaner fashion than what I did.

WHEN COUNTRY = 'Afghanistan' THEN -507.534271561773 + (`year`) * (0.275328671328672), 
WHEN COUNTRY = 'Albania' THEN -594.072510955711 + (`year`) * (0.334683216783217),
WHEN COUNTRY = 'Algeria' THEN -1067.85903962704 + (`year`) * (0.569279720279723)
...

Would it be possible to add some convenience function to do some lifting for building case statements needed for grouped regression?

@edgararuiz-zz
Copy link
Contributor

Hi Alex, thanks for the suggestion, let me take a look and see what we could do via the package.

@edgararuiz-zz
Copy link
Contributor

edgararuiz-zz commented Aug 16, 2018

Actually, I have another package called modeldb that is able to fit models inside databases, and it also respects grouped data:

library(tidyverse)
library(gapminder)
library(modeldb)
gapminder %>%
  group_by(country) %>%
  select(lifeExp, year) %>%
  linear_regression_db(lifeExp)
#> Adding missing grouping variables: `country`
#> # A tibble: 142 x 3
#>    country     `(Intercept)`  year
#>    <fct>               <dbl> <dbl>
#>  1 Afghanistan         -508. 0.275
#>  2 Albania             -594. 0.335
#>  3 Algeria            -1068. 0.569
#>  4 Angola              -377. 0.209
#>  5 Argentina           -390. 0.232
#>  6 Australia           -376. 0.228
#>  7 Austria             -406. 0.242
#>  8 Bahrain             -860. 0.468
#>  9 Bangladesh          -936. 0.498
#> 10 Belgium             -340. 0.209
#> # ... with 132 more rows

This package works with tidypredict but at this time, only with un-grouped data. If modeldb fits the models that you ultimately need, maybe we should focus on adding grouped model capability in tidypredict but based on the output from modeldb.

Thanks!

@alexhallam
Copy link
Author

Does this translate to sql text?

@edgararuiz-zz
Copy link
Contributor

Not yet, that's the portion I need to code, but wanted first to find out if it would be useful for anyone.

@topepo topepo added the feature a feature request or enhancement label Apr 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants