Single future.control argument rather than multiple, individual future.* arguments #27

HenrikBengtsson · 2018-09-13T16:41:14Z

Have you considered to introduce a control object (in the fashion of passing a rpart.control-object to rpart())? A future.control-object could bundle all arguments. Would be good for the overview (also in the documentation) and ...

Yes, I've been having internal (as in lots of inner voices ;)) debates about this and it's been discussed with other in the past. I'm not opposed to it. The main reason I've stayed away from it is that we have to deside exactly how the control elements should be controlled.

... you would avoid inconsistencies like #26.

You're meaning in the sense that the future.apply package and other similar high-level packages (e.g. furrr) won't have to know about future-specific arguments and can just pass whatever down to the future package? That's a nice side effect I haven't thought of before.

However, not all elements in future.control should be passed down to the future package. For instance, scheduling and chunk.size are higher level properties. If so, do they belong to a future.control argument or should they be separate?

So several things to think of. Thanks for bringing this up.

The text was updated successfully, but these errors were encountered:

mllg · 2018-09-14T10:32:20Z

However, not all elements in future.control should be passed down to the future package. For instance, scheduling and chunk.size are higher level properties. If so, do they belong to a future.control argument or should they be separate?

I don't see a problem with future.apply introducing some additional arguments (i.e. scheduling and chunk.size) and then passing the object forward to future. I think my first attempt to do this for multiple packages would look like this:

future::future.control = function(globals = FALSE) {
  list(globals = globals)
}

future.apply::future.apply.control = function(..., scheduling = 1, chunk.size = NULL) {
  c(future::future.control(...), list(scheduling = scheduling, chunk.size = chunk.size))
}

This way, you could update future and introduce new options w/o having to touch future.apply.

HenrikBengtsson · 2018-09-15T22:47:30Z

I was actually thinking of taking an easier approach to avoid to introducing a yet-another function to the Future API. I was thinking this could be added API above the Future API (e.g. future.apply package).

For instance, instead of doing:

y <- future_lapply(X, FUN = identity, future.seed = 42, future.scheduling = 2.0)

one could do:

y <- future_lapply(X, FUN = identity, future.control = list(seed = 42, scheduling = 2.0))

where the future.control argument would be used to over ride the defaults. Conceptually, something

control <- .update_control(future.control)

can be used internally with:

.update_control <- function(...) {
  args <- list(...)
  control <- list(
    globals = TRUE, packages = NULL, lazy = FALSE,
    seed = FALSE, scheduling = 1, chunk.size = NULL
  )
  for (name in names(args)) control[[name]] <- args[[name]]
  control
}

The current arguments would then correspond to:

future.globals <- control$globals
future.packages <- control$packages
future.seed <- control$seed

mllg · 2018-09-17T07:19:18Z

Where does .update_control() live? In the future package? I find it odd that it knows about the defaults of an "extension" package like future.apply. What if you introduce a new argument in future or future.apply?

HenrikBengtsson · 2018-09-17T13:41:32Z

In future.apply. No plan of introducing control argument in the future package at this point.

mllg · 2018-09-17T14:59:57Z

Please clarify how this is intended to be used:

The choice of backend can be controlled by modifying the global state via plan() (and tweak?).

Some options of future apparently should be set by the package developer, like globals (at least if the function f to apply is not user-provided), or lazy. This is basically hard coded in my package.

Other options seem to be relevant for the user, e.g. seed or the output handling via %stdout%/%stderr%.
How do I expose these options to the user? If there is an control object in future.apply, I can just pass it down, that works for me. But what am I supposed to do in my packages if I want to run futureCall?

HenrikBengtsson · 2018-09-17T15:09:38Z

So, I'm not thinking of a global option here, just wrapping up existing future.* arguments into a future.control = list(...) argument. The latter will override and/or add the defaults (which are hard-coded in the package - not set by the user). So, the idea is that it work just as now, but just a different way to specifying the arguments (to future_lapply() et al.)

mllg · 2018-09-17T19:43:16Z

Hm okay. If I understand you correctly, most options can/should be hardcoded in the package. But what about scheduling or chunk.size? Do I have to expose these arguments and pass them down? Can they be set via plan()?

Sorry if these are dumb question, I should really take more time to RTFM...

HenrikBengtsson · 2018-09-20T04:01:53Z

No dumb questions. No, you cannot set those via plan(). Can you give me an example where you think that would make more sense for the end user to control the "chunking" (via plan()) rather than you as the developer of the method/algorithm to control it (via future_lapply(), parLapply(), foreach(), whathaveyou). If I can understand your use case(s), I probably can give you a better answer/explanation.

mllg · 2018-09-20T12:10:54Z

I'm currently working on mlr3 (https://github.com/mlr-org/mlr3), a successor to mlr. The benchmark() function is used to benchmark multiple learning algorithms on multiple machine learning tasks via resampling. Internally, I basically expand.grid() over learners, tasks, and resampling iterations.
The runtimes of the iterations (one iteration = a single learning algorithm on a single task in a single resampling iteration) are often very heterogeneous (linear model takes a few seconds, deep neural net takes many hours).

As the user typically has some expectations about the runtime and which iterations or combinations will be expensive (e.g., a random forest is more expensive than a single tree), he could optimize the parallelization by evenly distributing the heavy jobs among available workers. So it would be nice to be able to control the chunking, or at least "shuffle" the jobs (as suggested in other issue).

A more general use case for scheduling/chunk.size for homogeneous runtimes:

If you have 1e7 very fast jobs you want to chunk to [ncpu] jobs in order to reduce the overhead.
If you have 10 very slow jobs you want to have 10 jobs which start in a load-balanced fashion (see parLapply vs. parLapplyLB or mclapply's mc.preschedule).
Heuristics like defaulting to min(iters, ncpu * 2) chunks might be helpful here, but do not solve the issue for all setups

I'm also not saying that I definitely need this. Especially the manual chunking might unnecessarily blow up the interface. I was just curious if this is possible.

HenrikBengtsson added the feature request label Sep 13, 2018

HenrikBengtsson mentioned this issue Sep 13, 2018

Add option to randomize order of execution #25

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Single future.control argument rather than multiple, individual future.* arguments #27

Single future.control argument rather than multiple, individual future.* arguments #27

HenrikBengtsson commented Sep 13, 2018

mllg commented Sep 14, 2018

HenrikBengtsson commented Sep 15, 2018

mllg commented Sep 17, 2018

HenrikBengtsson commented Sep 17, 2018

mllg commented Sep 17, 2018

HenrikBengtsson commented Sep 17, 2018

mllg commented Sep 17, 2018

HenrikBengtsson commented Sep 20, 2018

mllg commented Sep 20, 2018 •

edited

Loading

Single future.control argument rather than multiple, individual future.* arguments #27

Single future.control argument rather than multiple, individual future.* arguments #27

Comments

HenrikBengtsson commented Sep 13, 2018

mllg commented Sep 14, 2018

HenrikBengtsson commented Sep 15, 2018

mllg commented Sep 17, 2018

HenrikBengtsson commented Sep 17, 2018

mllg commented Sep 17, 2018

HenrikBengtsson commented Sep 17, 2018

mllg commented Sep 17, 2018

HenrikBengtsson commented Sep 20, 2018

mllg commented Sep 20, 2018 • edited Loading

mllg commented Sep 20, 2018 •

edited

Loading