Open
Description
Description
When models or data sets reach a certain level of complexity, the lgb.dump()
will cause an error in R: Error: R character strings are limited to 2^31-1 bytes
.
Reproducible example
library(dplyr)
library(lightgbm)
library(nycflights13)
dt<-nycflights13::flights %>%
mutate(origin=factor(origin),
dest=factor(dest),
carrier=factor(carrier)
) %>%
select(-tailnum,-time_hour)
spt1<-round(nrow(dt)*(3/4))
spt2<-round(nrow(dt)*(1/4))
train<-head(dt,spt1)
test<-tail(dt,spt2)
dtrain <- lgb.Dataset(as.matrix(train[,colnames(train)!="arr_delay"]),
categorical_feature = c("origin","dest","carrier"),
label = train[,colnames(train)=="arr_delay"][[1]])
params <- list(
objective = "regression"
, metric = "l2"
, min_data = 1L
, learning_rate = 1.0
, num_threads = 2L
, max_cat_threshold = 2L
)
model <- lgb.train(
params = params
, data = dtrain
, nrounds = 1000000L
)
json_model <- lightgbm::lgb.dump(model) # This will cause the error
A potential solution may be to dump the data directly to a temporary file, then stream in the data from the temporary file (https://rdrr.io/cran/jsonlite/man/stream_in.html)
Environment info
Session info:
─ Session info ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
setting value
version R version 4.3.3 (2024-02-29)
os Ubuntu 22.04.4 LTS
system x86_64, linux-gnu
ui RStudio
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz Etc/UTC
date 2024-03-22
rstudio 2023.12.1+402 Ocean Storm (server)
pandoc 3.1.1 @ /usr/lib/rstudio-server/bin/quarto/bin/tools/ (via rmarkdown)
─ Packages ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
base64enc 0.1-3 2015-07-28 [3] CRAN (R 4.0.2)
bslib 0.6.1 2023-11-28 [2] CRAN (R 4.3.2)
cachem 1.0.8 2023-05-01 [2] CRAN (R 4.3.0)
callr 3.7.5 2024-02-19 [2] CRAN (R 4.3.2)
cellranger 1.1.0 2016-07-27 [3] CRAN (R 4.0.1)
class 7.3-22 2023-05-03 [4] CRAN (R 4.3.1)
classInt 0.4-10 2023-09-05 [2] CRAN (R 4.3.1)
cli 3.6.2 2023-12-11 [2] CRAN (R 4.3.2)
codetools 0.2-19 2023-02-01 [4] CRAN (R 4.2.2)
colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.0)
cowplot 1.1.3 2024-01-22 [2] CRAN (R 4.3.2)
crosstalk 1.2.1 2023-11-23 [2] CRAN (R 4.3.2)
DALEX 2.4.3 2023-01-15 [2] CRAN (R 4.2.3)
data.table 1.15.2 2024-02-29 [2] CRAN (R 4.3.2)
datamods 1.4.5 2024-02-28 [2] CRAN (R 4.3.2)
DBI 1.2.2 2024-02-16 [2] CRAN (R 4.3.2)
dbplyr 2.5.0 2024-03-19 [2] CRAN (R 4.3.3)
digest 0.6.35 2024-03-11 [2] CRAN (R 4.3.3)
dplyr * 1.1.4 2023-11-17 [2] CRAN (R 4.3.2)
e1071 1.7-14 2023-12-06 [2] CRAN (R 4.3.2)
ellipsis 0.3.2 2021-04-29 [3] CRAN (R 4.1.1)
esquisse 1.2.0 2024-01-10 [2] CRAN (R 4.3.2)
evaluate 0.23 2023-11-01 [2] CRAN (R 4.3.1)
extrafont 0.19 2023-01-18 [1] CRAN (R 4.3.0)
extrafontdb 1.0 2012-06-11 [1] CRAN (R 4.3.0)
fansi 1.0.6 2023-12-08 [2] CRAN (R 4.3.2)
fastmap 1.1.1 2023-02-24 [3] CRAN (R 4.2.2)
forcats * 1.0.0 2023-01-29 [3] CRAN (R 4.2.2)
fs 1.6.3 2023-07-20 [3] CRAN (R 4.3.1)
generics 0.1.3 2022-07-05 [2] CRAN (R 4.2.3)
ggiraph 0.8.9 2024-02-24 [2] CRAN (R 4.3.2)
ggiraphExtra 0.3.0 2020-10-06 [2] CRAN (R 4.3.2)
ggplot2 * 3.5.0 2024-02-23 [2] CRAN (R 4.3.2)
ggrepel 0.9.5 2024-01-10 [2] CRAN (R 4.3.2)
glue 1.7.0 2024-01-09 [2] CRAN (R 4.3.2)
gridExtra 2.3 2017-09-09 [2] CRAN (R 4.2.3)
gtable 0.3.4 2023-08-21 [2] CRAN (R 4.3.1)
hardhat 1.3.1 2024-02-02 [2] CRAN (R 4.3.2)
here 1.0.1 2020-12-13 [2] CRAN (R 4.2.3)
hms 1.1.3 2023-03-21 [3] CRAN (R 4.2.3)
htmltools 0.5.7 2023-11-03 [2] CRAN (R 4.3.1)
htmlwidgets 1.6.4 2023-12-06 [2] CRAN (R 4.3.2)
httpuv 1.6.14 2024-01-26 [2] CRAN (R 4.3.2)
httr 1.4.7 2023-08-15 [2] CRAN (R 4.3.1)
iBreakDown 2.1.2 2023-12-01 [2] CRAN (R 4.3.2)
insight 0.19.9 2024-03-15 [2] CRAN (R 4.3.3)
jquerylib 0.1.4 2021-04-26 [3] CRAN (R 4.1.2)
jsonlite 1.8.8 2023-12-04 [1] CRAN (R 4.3.2)
KernSmooth 2.23-22 2023-07-10 [4] CRAN (R 4.3.1)
knitr 1.45 2023-10-30 [2] CRAN (R 4.3.1)
later 1.3.2 2023-12-06 [2] CRAN (R 4.3.2)
lattice 0.22-5 2023-10-24 [4] CRAN (R 4.3.1)
lazyeval 0.2.2 2019-03-15 [2] CRAN (R 4.2.3)
leafem 0.2.3 2023-09-17 [2] CRAN (R 4.3.2)
leaflet 2.2.1 2023-11-13 [2] CRAN (R 4.3.1)
lifecycle 1.0.4 2023-11-07 [2] CRAN (R 4.3.1)
lightgbm * 4.3.0 2024-01-18 [2] CRAN (R 4.3.2)
lubridate * 1.9.3 2023-09-27 [2] CRAN (R 4.3.1)
magrittr 2.0.3 2022-03-30 [2] CRAN (R 4.2.3)
mapview 2.11.2 2023-10-13 [2] CRAN (R 4.3.1)
MASS 7.3-60.0.1 2024-01-13 [4] CRAN (R 4.3.2)
Matrix 1.6-5 2024-01-11 [2] CRAN (R 4.3.2)
mgcv 1.9-1 2023-12-21 [4] CRAN (R 4.3.2)
mime 0.12 2021-09-28 [3] CRAN (R 4.2.0)
munsell 0.5.0 2018-06-12 [2] CRAN (R 4.2.3)
mycor 0.1.1 2018-04-10 [2] CRAN (R 4.3.2)
NADA 1.6-1.1 2020-03-22 [2] CRAN (R 4.3.1)
nlme 3.1-163 2023-08-09 [4] CRAN (R 4.3.1)
nycflights13 * 1.0.2 2021-04-12 [1] CRAN (R 4.3.3)
openxlsx 4.2.5.2 2023-02-06 [2] CRAN (R 4.2.3)
parsnip 1.2.0 2024-02-16 [2] CRAN (R 4.3.2)
phosphoricons 0.2.0 2023-05-17 [2] CRAN (R 4.3.1)
pillar 1.9.0 2023-03-22 [2] CRAN (R 4.2.3)
pkgconfig 2.0.3 2019-09-22 [2] CRAN (R 4.2.3)
plotly 4.10.4 2024-01-13 [2] CRAN (R 4.3.2)
plyr 1.8.9 2023-10-02 [2] CRAN (R 4.3.1)
png 0.1-8 2022-11-29 [2] CRAN (R 4.2.3)
pool 1.0.3 2024-02-14 [2] CRAN (R 4.3.2)
ppcor 1.1 2015-12-03 [2] CRAN (R 4.3.2)
processx 3.8.4 2024-03-16 [2] CRAN (R 4.3.3)
promises 1.2.1 2023-08-10 [2] CRAN (R 4.3.1)
proxy 0.4-27 2022-06-09 [2] CRAN (R 4.2.3)
ps 1.7.6 2024-01-18 [2] CRAN (R 4.3.2)
purrr * 1.0.2 2023-08-10 [2] CRAN (R 4.3.1)
R6 2.5.1 2021-08-19 [2] CRAN (R 4.2.3)
raster 3.6-26 2023-10-14 [2] CRAN (R 4.3.1)
RColorBrewer 1.1-3 2022-04-03 [2] CRAN (R 4.2.3)
Rcpp 1.0.12 2024-01-09 [1] CRAN (R 4.3.2)
reactable 0.4.4 2023-03-12 [2] CRAN (R 4.3.1)
readr * 2.1.5 2024-01-10 [2] CRAN (R 4.3.2)
readxl 1.4.3 2023-07-06 [1] CRAN (R 4.3.0)
reprex 2.1.0 2024-01-11 [2] CRAN (R 4.3.2)
reshape2 1.4.4 2020-04-09 [2] CRAN (R 4.3.1)
rio 1.0.1 2023-09-19 [2] CRAN (R 4.3.1)
rlang 1.1.3 2024-01-10 [2] CRAN (R 4.3.2)
rmarkdown 2.26 2024-03-05 [2] CRAN (R 4.3.2)
rpivotTable 0.3.0 2018-01-30 [2] CRAN (R 4.3.1)
rprojroot 2.0.4 2023-11-05 [2] CRAN (R 4.3.1)
rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.0)
Rttf2pt1 1.3.12 2023-01-22 [1] CRAN (R 4.3.0)
sass 0.4.9 2024-03-15 [2] CRAN (R 4.3.3)
satellite 1.0.5 2024-02-10 [2] CRAN (R 4.3.2)
scales 1.3.0 2023-11-28 [2] CRAN (R 4.3.2)
sessioninfo 1.2.2 2021-12-06 [2] CRAN (R 4.2.3)
sf 1.0-15 2023-12-18 [2] CRAN (R 4.3.2)
shiny 1.8.0 2023-11-17 [2] CRAN (R 4.3.2)
shinybusy 0.3.3 2024-03-09 [2] CRAN (R 4.3.3)
shinyWidgets 0.8.3 2024-03-21 [2] CRAN (R 4.3.3)
sjlabelled 1.2.0 2022-04-10 [2] CRAN (R 4.3.2)
sjmisc 2.8.9 2021-12-03 [2] CRAN (R 4.3.2)
sp 2.1-3 2024-01-30 [2] CRAN (R 4.3.2)
stringi 1.8.3 2023-12-11 [2] CRAN (R 4.3.2)
stringr * 1.5.1 2023-11-14 [2] CRAN (R 4.3.2)
survival 3.5-8 2024-02-14 [4] CRAN (R 4.3.3)
systemfonts 1.0.6 2024-03-07 [2] CRAN (R 4.3.3)
tibble * 3.2.1 2023-03-20 [2] CRAN (R 4.2.3)
tidyr * 1.3.1 2024-01-24 [2] CRAN (R 4.3.2)
tidyselect 1.2.1 2024-03-11 [2] CRAN (R 4.3.3)
tidyverse * 2.0.0 2023-02-22 [2] CRAN (R 4.2.3)
timechange 0.3.0 2024-01-18 [2] CRAN (R 4.3.2)
treeshap 0.3.1 2024-01-22 [2] CRAN (R 4.3.2)
tzdb 0.4.0 2023-05-12 [3] CRAN (R 4.3.0)
units 0.8-5 2023-11-28 [2] CRAN (R 4.3.2)
utf8 1.2.4 2023-10-22 [2] CRAN (R 4.3.1)
uuid 1.2-0 2024-01-14 [2] CRAN (R 4.3.2)
vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.2)
viridisLite 0.4.2 2023-05-02 [2] CRAN (R 4.3.0)
withr 3.0.0 2024-01-16 [2] CRAN (R 4.3.2)
writexl 1.5.0 2024-02-09 [2] CRAN (R 4.3.2)
xfun 0.42 2024-02-08 [2] CRAN (R 4.3.2)
xgboost 1.7.7.1 2024-01-25 [2] CRAN (R 4.3.2)
xtable 1.8-4 2019-04-21 [2] CRAN (R 4.2.3)
yaml 2.3.8 2023-12-11 [2] CRAN (R 4.3.2)
zip 2.3.1 2024-01-27 [2] CRAN (R 4.3.2)
[1] /home/pschaefer/R/x86_64-pc-linux-gnu-library/4.3
[2] /usr/local/lib/R/site-library
[3] /usr/lib/R/site-library
[4] /usr/lib/R/library
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Additional Comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment