Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dummyVars ignores sep argument for character variables #992

Open
jakobludewig opened this issue Jan 22, 2019 · 2 comments
Open

dummyVars ignores sep argument for character variables #992

jakobludewig opened this issue Jan 22, 2019 · 2 comments

Comments

@jakobludewig
Copy link

Hey,

I spotted a potential issue with dummyVars: When a character variable (instead of a factor variable) is included in the RHS of the formula the sep argument is ignored for these variables (ie the separators between variable name and levels are not inserted).

It's a minor issue but took me some time to figure out the reason. Is this expected behaviour? If not I'm happy to invest some time into finding a fix.

Cheers

Minimal, reproducible example:

library(earth)
library(tidyverse)
library(magrittr)
library(caret)

data(etitanic)

#  this works fine giving the correct separators between variable name and level
dummies <- dummyVars(survived ~ ., data = etitanic,sep=".")
head(predict(dummies, newdata = etitanic))
#>   pclass.1st pclass.2nd pclass.3rd sex.female sex.male     age sibsp parch
#> 1          1          0          0          1        0 29.0000     0     0
#> 2          1          0          0          0        1  0.9167     1     2
#> 3          1          0          0          1        0  2.0000     1     2
#> 4          1          0          0          0        1 30.0000     1     2
#> 5          1          0          0          1        0 25.0000     1     2
#> 6          1          0          0          0        1 48.0000     0     0

#  after converting a variable to character dummyVars fails to insert the separator character
etitanic %<>% mutate(pclass = as.character(pclass))
dummies <- dummyVars(survived ~ ., data = etitanic,sep=".")
head(predict(dummies, newdata = etitanic))
#>   pclass1st pclass2nd pclass3rd sex.female sex.male     age sibsp parch
#> 1         1         0         0          1        0 29.0000     0     0
#> 2         1         0         0          0        1  0.9167     1     2
#> 3         1         0         0          1        0  2.0000     1     2
#> 4         1         0         0          0        1 30.0000     1     2
#> 5         1         0         0          1        0 25.0000     1     2
#> 6         1         0         0          0        1 48.0000     0     0

Session Info:

>sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.4

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] bindrcpp_0.2.2     caret_6.0-81       lattice_0.20-35    magrittr_1.5       forcats_0.3.0      stringr_1.3.1     
 [7] dplyr_0.7.6        purrr_0.2.5        readr_1.1.1        tidyr_0.8.1        tibble_1.4.2       ggplot2_3.0.0     
[13] tidyverse_1.2.1    earth_4.7.0        plotmo_3.5.2       TeachingDemos_2.10 plotrix_3.7-4     

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.18       lubridate_1.7.4    class_7.3-14       assertthat_0.2.0   ipred_0.9-7        foreach_1.4.4     
 [7] R6_2.2.2           cellranger_1.1.0   plyr_1.8.4         backports_1.1.2    stats4_3.5.1       httr_1.3.1        
[13] pillar_1.3.0       rlang_0.3.1        lazyeval_0.2.1     readxl_1.1.0       rstudioapi_0.7     data.table_1.11.4 
[19] rpart_4.1-13       Matrix_1.2-14      splines_3.5.1      gower_0.1.2        munsell_0.5.0      broom_0.5.0       
[25] compiler_3.5.1     modelr_0.1.2       pkgconfig_2.0.1    nnet_7.3-12        tidyselect_0.2.4   prodlim_2018.04.18
[31] codetools_0.2-15   crayon_1.3.4       withr_2.1.2        MASS_7.3-50        recipes_0.1.4      ModelMetrics_1.2.0
[37] grid_3.5.1         nlme_3.1-137       jsonlite_1.6       gtable_0.2.0       scales_1.0.0       cli_1.0.0         
[43] stringi_1.2.4      reshape2_1.4.3     timeDate_3043.102  xml2_1.2.0         generics_0.0.2     lava_1.6.3        
[49] iterators_1.0.10   tools_3.5.1        glue_1.3.0         hms_0.4.2          survival_2.42-3    yaml_2.2.0        
[55] colorspace_1.3-2   rvest_0.3.2        bindr_0.1.1        haven_1.1.2 
@jimtheflash
Copy link

jimtheflash commented Mar 12, 2019

I'll try to add a MRE to this as well but seeing same behavior for the fullRank and levelsOnly arguments as well, i.e. dummyVars seems to ignore them if the RHS has character columns. Thanks to OP for calling this behavior out!

@bdrhoa
Copy link

bdrhoa commented Dec 12, 2023

This is still broken.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants