Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model creation fails with updated XGBoost (β‰₯ v2.1) #1227

Open
therealjpetereit opened this issue Jan 8, 2025 · 4 comments
Open

Model creation fails with updated XGBoost (β‰₯ v2.1) #1227

therealjpetereit opened this issue Jan 8, 2025 · 4 comments
Labels
bug an unexpected problem or unintended behavior

Comments

@therealjpetereit
Copy link

Hi and a merry 2025 πŸŽ‰πŸ™Œ,

I just updated my XGBoost to 2.1.3 and started having problems building models with that.

I assume they did many changes when they updated from 2.0.3 to 2.1+ , but I tracked down what broke the code for me.

For me in particular the xgb.DMatrix function fails when I use the current Parsnip version (1.2.1) with Tidymodels (v 1.2.0)

the Reason is the update function in XGBoost:

Old

xgb.DMatrix <- function(data, info = list(), missing = NA, silent = FALSE, nthread = NULL, ...)  

New

xgb.DMatrix <- function(
  data,
  label = NULL,
  weight = NULL,
  base_margin = NULL,
  missing = NA,
  silent = FALSE,
  feature_names = colnames(data),
  feature_types = NULL,
  nthread = NULL,
  group = NULL,
  qid = NULL,
  label_lower_bound = NULL,
  label_upper_bound = NULL,
  feature_weights = NULL,
  data_split_mode = "row"
)

I am not sure what exactly was in the old info=list() but probably all these arguments which are now directly passed to the function.
This could be all part of their general R interface overhaul, but I thought I just let you know after I spent some time tracking this down.

For now I downgrade to 2.0.3 and wait until you had the time to update the functionality to match the newer XGBoost releases.

Cheers
Jakob

@EmilHvitfeldt EmilHvitfeldt added the bug an unexpected problem or unintended behavior label Jan 8, 2025
@EmilHvitfeldt
Copy link
Member

Yes this is indeed a bug. Thanks for catching it!

{parsnip} is currently compatible with {xgboost} version 1.7.8.1, which is the most recent CRAN version. This is happening because the {xgboost} R package on CRAN doesn't match the release versions.

It appears that xgboost is gearing up for another CRAN release dmlc/xgboost#9810 so we should get ready for this.

@therealjpetereit is correct, in this PR dmlc/xgboost#9862, they switch from having some arguments passed to xgb.DMatrix() as a named list in info, instead having all of them spelled out in full. They removed the info argument instead of deprecating it, giving us the error, because we pass things to info.

Ideally, {xgboost} would have deprecated info a little more robustly, so I think we need to so some switching on {xgboost} versions or updating {parsnip} once {xgboost} is out.

CRAN versions

library(parsnip)

xgb_spec <- boost_tree() |>
  set_mode("regression") |>
  set_engine("xgboost")

xgb_spec |>
  fit(mpg ~ ., data = mtcars)
#> parsnip model object
#> 
#> ##### xgb.Booster
#> raw: 21.6 Kb 
#> call:
#>   xgboost::xgb.train(params = list(eta = 0.3, max_depth = 6, gamma = 0, 
#>     colsample_bytree = 1, colsample_bynode = 1, min_child_weight = 1, 
#>     subsample = 1), data = x$data, nrounds = 15, watchlist = x$watchlist, 
#>     verbose = 0, nthread = 1, objective = "reg:squarederror")
#> params (as set within xgb.train):
#>   eta = "0.3", max_depth = "6", gamma = "0", colsample_bytree = "1", colsample_bynode = "1", min_child_weight = "1", subsample = "1", nthread = "1", objective = "reg:squarederror", validate_parameters = "TRUE"
#> xgb.attributes:
#>   niter
#> callbacks:
#>   cb.evaluation.log()
#> # of features: 10 
#> niter: 15
#> nfeatures : 10 
#> evaluation_log:
#>   iter training_rmse
#>  <num>         <num>
#>      1    14.9313149
#>      2    10.9568064
#>    ---           ---
#>     14     0.5628964
#>     15     0.4603055

sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.2 (2024-10-31)
#>  os       macOS Sequoia 15.2
#>  system   aarch64, darwin20
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       America/Los_Angeles
#>  date     2025-01-07
#>  pandoc   3.6.1 @ /usr/local/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  cli           3.6.3   2024-06-21 [1] CRAN (R 4.4.0)
#>  colorspace    2.1-1   2024-07-26 [1] CRAN (R 4.4.0)
#>  data.table    1.16.4  2024-12-06 [1] CRAN (R 4.4.1)
#>  digest        0.6.37  2024-08-19 [1] CRAN (R 4.4.1)
#>  dplyr         1.1.4   2023-11-17 [1] CRAN (R 4.4.0)
#>  evaluate      1.0.1   2024-10-10 [1] CRAN (R 4.4.1)
#>  fastmap       1.2.0   2024-05-15 [1] CRAN (R 4.4.0)
#>  fs            1.6.5   2024-10-30 [1] CRAN (R 4.4.1)
#>  generics      0.1.3   2022-07-05 [1] CRAN (R 4.4.0)
#>  ggplot2       3.5.1   2024-04-23 [1] CRAN (R 4.4.0)
#>  glue          1.8.0   2024-09-30 [1] CRAN (R 4.4.1)
#>  gtable        0.3.6   2024-10-25 [1] CRAN (R 4.4.1)
#>  hardhat       1.4.0   2024-06-02 [1] CRAN (R 4.4.0)
#>  htmltools     0.5.8.1 2024-04-04 [1] CRAN (R 4.4.0)
#>  jsonlite      1.8.9   2024-09-20 [1] CRAN (R 4.4.1)
#>  knitr         1.49    2024-11-08 [1] CRAN (R 4.4.1)
#>  lattice       0.22-6  2024-03-20 [2] CRAN (R 4.4.2)
#>  lifecycle     1.0.4   2023-11-07 [1] CRAN (R 4.4.0)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.4.0)
#>  Matrix        1.7-1   2024-10-18 [2] CRAN (R 4.4.2)
#>  munsell       0.5.1   2024-04-01 [1] CRAN (R 4.4.0)
#>  parsnip     * 1.2.1   2024-03-22 [1] CRAN (R 4.4.0)
#>  pillar        1.10.0  2024-12-17 [1] CRAN (R 4.4.1)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.4.0)
#>  purrr         1.0.2   2023-08-10 [1] CRAN (R 4.4.0)
#>  R6            2.5.1   2021-08-19 [1] CRAN (R 4.4.0)
#>  reprex        2.1.1   2024-07-06 [1] CRAN (R 4.4.0)
#>  rlang         1.1.4   2024-06-04 [1] CRAN (R 4.4.0)
#>  rmarkdown     2.29    2024-11-04 [1] CRAN (R 4.4.1)
#>  scales        1.3.0   2023-11-28 [1] CRAN (R 4.4.0)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.4.0)
#>  tibble        3.2.1   2023-03-20 [1] CRAN (R 4.4.0)
#>  tidyr         1.3.1   2024-01-24 [1] CRAN (R 4.4.0)
#>  tidyselect    1.2.1   2024-03-11 [1] CRAN (R 4.4.0)
#>  vctrs         0.6.5   2023-12-01 [1] CRAN (R 4.4.0)
#>  withr         3.0.2   2024-10-28 [1] CRAN (R 4.4.1)
#>  xfun          0.49    2024-10-31 [1] CRAN (R 4.4.1)
#>  xgboost       1.7.8.1 2024-07-24 [1] CRAN (R 4.4.0)
#>  yaml          2.3.10  2024-07-26 [1] CRAN (R 4.4.0)
#> 
#>  [1] /Users/emilhvitfeldt/Library/R/arm64/4.4/library
#>  [2] /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Created on 2025-01-07 with reprex v2.1.1

@therealjpetereit
Copy link
Author

Yes,

I have been forced to move to GitHub versions of XGBoost for GPU integrations and even AMD GPU versions lately πŸ˜…
And who can resist installing the latest versions πŸ˜‚πŸ˜‚

Cheers
J

@simonpcouch
Copy link
Contributor

Noting that this is likely related to / duplicate of #1087. :)

@EmilHvitfeldt
Copy link
Member

appears to be separate issue. but will hit us at the same time when they release to CRAN :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

3 participants