Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PipeOpTargetTrafo drops missing factor levels #631

Open
be-marc opened this issue Nov 20, 2021 · 2 comments
Open

PipeOpTargetTrafo drops missing factor levels #631

be-marc opened this issue Nov 20, 2021 · 2 comments
Assignees

Comments

@be-marc
Copy link
Member

be-marc commented Nov 20, 2021

PipeOpTargetTrafo drops missing factor levels in task, whereas mlr3 keeps the factor levels.

library(mlr3)
library(mlr3pipelines)
options(mlr3.debug = TRUE)

task = tsk("boston_housing")
learner = lrn("regr.rpart")
ppl = ppl("targettrafo",
  graph = learner,
  targetmutate.trafo = function(x) log(x),
  targetmutate.inverter = function(x) list(response =  expm1(x$response)))
graph_learner = as_learner(ppl)

# fails
resample(task, graph_learner, rsmp("holdout"))

# > Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = attr(object,  : 
# >  factor town has new levels Dover, Duxbury, Hamilton, Manchester, Marshfield, Medfield, Millis, Nahant, Weston
# > This happened PipeOp regr.rpart's $predict()

# works
resample(task, learner, rsmp("holdout"))

Using PipeOpFixFactors fixes the issue but maybe PipeOpTargetTrafo should not drop the factor levels?

The gallery post bike sharing fails (mlr-org/mlr3gallery#119).

@be-marc
Copy link
Member Author

be-marc commented Nov 20, 2021

PipeOpFixFactors is no solution for the gallery post. PipeOpFixFactors introduces missing values which are not supported by regr.kknn.

@sumny
Copy link
Member

sumny commented Nov 29, 2021

I think this could either be fixed in mlr3pipelines "manually" or we directly fix mlr3::convert_task which seems to cause the problem (i.e., during the trafo a new Task is created using the DataBackend of the task but during resampling it can happen that some levels are no longer present and therefore are also no longer present in the backend resulting in also being missing in the new Task.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants