add extract_fit_time #191

EmilHvitfeldt · 2022-12-16T21:20:36Z

This is part of a series of PRs to add a new generic extract_fit_time() to allow users to investigate how long different tidymodels objects takes to fit.

TODO:

make sure names are good.
- right now id clashed with id returned from tune_grid()
Make work with non-recipe preprocessor
add tests
Make sure this change is backwards compatible
remove hardhat from Remotes

Related PRs
tidymodels/hardhat#218
tidymodels/recipes#1071
tidymodels/parsnip#853

# pak::pak(c(
#   "tidymodels/hardhat@extract_fit_time",
#   "tidymodels/recipes@extract_fit_time",
#   "tidymodels/parsnip@extract_fit_time",
#   "tidymodels/workflows@extract_fit_time"
# ))

library(tidymodels)

data(ames, package = "modeldata")

rec_ames <- recipe(Sale_Price ~ ., data = ames) |>
  step_dummy(all_nominal_predictors()) |>
  step_nzv(all_predictors()) |>
  step_normalize(all_predictors())

lm_spec <- linear_reg()

wf_spec <- workflow(rec_ames, lm_spec)

wf_fit <- fit(wf_spec, ames)

# recipes
wf_fit |>
  extract_recipe() |>
  extract_fit_time()
#> # A tibble: 1 × 2
#>   id      time
#>   <chr>  <dbl>
#> 1 recipe 0.259

wf_fit |>
  extract_recipe() |>
  extract_fit_time(summarize = FALSE)
#> # A tibble: 6 × 2
#>   id                      time
#>   <chr>                  <dbl>
#> 1 prep.dummy_edFdo     0.0190 
#> 2 bake.dummy_edFdo     0.0350 
#> 3 prep.nzv_oSTvE       0.197  
#> 4 bake.nzv_oSTvE       0      
#> 5 prep.normalize_1D9e7 0.00500
#> 6 bake.normalize_1D9e7 0.00300

# parsnip
wf_fit |>
  extract_fit_parsnip() |>
  extract_fit_time() # defaults to summarize = FALSE
#> # A tibble: 1 × 2
#>   id           time
#>   <chr>       <dbl>
#> 1 linear_reg 0.0310

wf_fit |>
  extract_fit_parsnip() |>
  extract_fit_time(summarize = TRUE)
#> Error in `extract_fit_time()`:
#> ! `summarize = TRUE` is not supported for `model_fit` objects.

# workflow
wf_fit |>
  extract_fit_time()
#> # A tibble: 1 × 3
#>   stage    id        time
#>   <chr>    <chr>    <dbl>
#> 1 workflow workflow 0.290

wf_fit |>
  extract_fit_time(summarize = FALSE)
#> # A tibble: 7 × 3
#>   stage      id                      time
#>   <chr>      <chr>                  <dbl>
#> 1 preprocess prep.dummy_edFdo     0.0190 
#> 2 preprocess bake.dummy_edFdo     0.0350 
#> 3 preprocess prep.nzv_oSTvE       0.197  
#> 4 preprocess bake.nzv_oSTvE       0      
#> 5 preprocess prep.normalize_1D9e7 0.00500
#> 6 preprocess bake.normalize_1D9e7 0.00300
#> 7 model      linear_reg           0.0310

^{Created on 2023-06-15 with reprex v2.0.2}

Merge commit '882b06bb27592055493b10065b3ee805399fad38' #Conflicts: # DESCRIPTION

simonpcouch

I think the interface is super solid, and the potential uses for this in stacks, workflow sets, and tuning results are many!

Just made a few small docs suggestions—I think, as often as we can, we should use the same term to refer to this value. I think "elapsed fit time" is a good cue, in that it self-documents system.time()$elapsed and aligns with the function names. :)

Pending those TODO's, thumbs up from me!

R/extract.R

Co-authored-by: Simon P. Couch <[email protected]>

simonpcouch · 2023-06-16T15:54:53Z

re:

right now id clashed with id returned from tune_grid()

workflowsets uses wflow_id, so we have some precedent for distinguishing different types of ids. there’s also id2 in iterative searches, unfortunately named with a different pattern (though that may be internal-only?).

I'm not sure I have any silver bullet recommendations. process_id? Probably something with *_id.

This may be a separate issue from naming, but there may also be some natural linkage to .config here. The processing for that will likely happen in tune, but perhaps something worth anticipating.

EmilHvitfeldt · 2023-06-16T16:32:11Z

I like process_id!

hfrick

I'm wondering if stage_id is a good alternative to process_id?

NEWS.md

Merge branch 'main' into extract_fit_time # Conflicts: # DESCRIPTION

simonpcouch · 2024-03-27T20:46:45Z

I think a part of this interface that's been tough to wrap my head around is the degree of summarization. I have a hunch that getting summarization right will make deciding on column names a bit easier.

At the moment, output looks like:

library(tidymodels)

data(ames, package = "modeldata")

rec_ames <- 
  recipe(Sale_Price ~ ., data = ames) |>
  step_dummy(all_nominal_predictors()) |>
  step_nzv(all_predictors()) |>
  step_normalize(all_predictors())

lm_spec <- linear_reg()

wf_fit <- fit(workflow(rec_ames, lm_spec), ames)
wf_fit2 <- fit(workflow(Sale_Price ~ ., lm_spec), ames)

extract_fit_time(wf_fit)
#> # A tibble: 1 × 3
#>   stage    process_id  time
#>   <chr>    <chr>      <dbl>
#> 1 workflow workflow   0.419
extract_fit_time(wf_fit, summarize = FALSE)
#> # A tibble: 7 × 3
#>   stage      process_id              time
#>   <chr>      <chr>                  <dbl>
#> 1 preprocess prep.dummy_wjSqo     0.0290 
#> 2 preprocess bake.dummy_wjSqo     0.156  
#> 3 preprocess prep.nzv_lUZwC       0.199  
#> 4 preprocess bake.nzv_lUZwC       0      
#> 5 preprocess prep.normalize_I9sdp 0.00400
#> 6 preprocess bake.normalize_I9sdp 0.00300
#> 7 model      linear_reg           0.0280

extract_fit_time(wf_fit2)
#> # A tibble: 1 × 3
#>   stage    process_id  time
#>   <chr>    <chr>      <dbl>
#> 1 workflow workflow   0.173
extract_fit_time(wf_fit2, summarize = FALSE)
#> # A tibble: 1 × 3
#>   stage process_id  time
#>   <chr> <chr>      <dbl>
#> 1 model linear_reg 0.173

With a recipe, the summarize argument reduces the number of rows, but with a formula preprocessor, the only change is the values of stage and process_id.

Workflows already have the built-in unit ("stage") pre (preprocess), fit (model), post (postprocess). I imagine, in the tuning context, this is likely the unit of observation I'm interesting in timing:

extract_fit_time(wf_fit, summarize = FALSE) %>%
  group_by(stage) %>%
  summarize(time = sum(time))
#> # A tibble: 2 × 2
#>   stage        time
#>   <chr>       <dbl>
#> 1 model      0.0280
#> 2 preprocess 0.392

It'd be nice if that was the default.

An argument like unit = c("workflow", "stage", "process") in favor of summarize might feel better for workflows. That would make the analogous arguments for the recipe method, though, "stage" and "process", which are totally unnatural compared to summarize = TRUE or FALSE or unit = "recipe" or "step". Hmph.

^{Created on 2024-03-27 with reprex v2.1.0}

simonpcouch · 2024-03-27T21:22:38Z

The other remaining question for me is whether we should be thinking of some functionality that measures predict elapsed time alongside this. extract_predict_time() is definitely a stranger interface, as the outputs of bake() and predict.model_fit() are just tibbles. Also, whatever that function is couldn't be passed as an extract control argument to resampling functions, because those can only be called on a fitted workflow.

My general leaning is that since 1) the current methods can point out when bakeing, at least on data, take a notable amount of time and 2) predict.model_fit() tends to take much less time than fit.model_spec(), we can probably live with just measuring fit time and not providing an analogous interface for prediction.

simonpcouch

Looking solid. :) Just one docs clarification.

R/extract.R

Co-authored-by: Simon P. Couch <[email protected]>

github-actions · 2024-04-20T01:10:56Z

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

EmilHvitfeldt added 2 commits December 16, 2022 12:29

add extract_fit_time

8b23a48

use dev hardhat

3a9142d

This was referenced Dec 16, 2022

add extract_fit_time tidymodels/hardhat#218

Merged

add extract_fit_time tidymodels/recipes#1071

Merged

add extract_fit_time tidymodels/parsnip#853

Merged

EmilHvitfeldt added 4 commits June 14, 2023 16:57

merge main

7bdd567

Merge commit '882b06bb27592055493b10065b3ee805399fad38' #Conflicts: # DESCRIPTION

add summarize argument

0c8f17b

fix DESCRIPTION file

9c378cc

temp remove dev parsnip

ca1c4d2

simonpcouch reviewed Jun 16, 2023

View reviewed changes

R/extract.R Outdated Show resolved Hide resolved

R/extract.R Outdated Show resolved Hide resolved

R/extract.R Outdated Show resolved Hide resolved

EmilHvitfeldt and others added 2 commits June 16, 2023 08:27

Apply suggestions from code review

c6c76d6

Co-authored-by: Simon P. Couch <[email protected]>

redocument

853fd49

EmilHvitfeldt added 5 commits June 16, 2023 12:11

id -> process_id in extract_fit_time()

651342b

add tests for extract_fit_time()

b2017de

add news bullet

47cb0af

update remotes

ece5e2e

add skip if not installed

60a3eed

EmilHvitfeldt marked this pull request as ready for review June 16, 2023 19:31

EmilHvitfeldt requested a review from hfrick June 16, 2023 20:00

hfrick approved these changes Jun 26, 2023

View reviewed changes

NEWS.md Show resolved Hide resolved

simonpcouch added 3 commits March 27, 2024 11:04

merge upstream

13e180a

Merge branch 'main' into extract_fit_time # Conflicts: # DESCRIPTION

merge upstream snap changes

4c7d75d

address error in control_*() with no recipe

ce1aded

simonpcouch mentioned this pull request Mar 27, 2024

add extract_fit_time() for tuning results tidymodels/tune#880

Closed

time -> elapsed

7159974

simonpcouch mentioned this pull request Apr 3, 2024

Feature request: Option to enable timing for each iteration tidymodels/tune#248

Open

EmilHvitfeldt added 2 commits April 5, 2024 13:34

process_id -> stage_id

715d1a6

= -> <-

e233dcb

simonpcouch approved these changes Apr 5, 2024

View reviewed changes

R/extract.R Outdated Show resolved Hide resolved

EmilHvitfeldt and others added 3 commits April 5, 2024 14:00

Update R/extract.R

a853c82

Co-authored-by: Simon P. Couch <[email protected]>

redocument

1c38711

update Remotes

6265b6f

EmilHvitfeldt merged commit 78426ac into main Apr 5, 2024
10 checks passed

EmilHvitfeldt deleted the extract_fit_time branch April 5, 2024 21:52

simonpcouch mentioned this pull request Apr 11, 2024

correct NEWS entry ordering for extract_fit_time() #223

Closed

github-actions bot locked and limited conversation to collaborators Apr 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add extract_fit_time #191

add extract_fit_time #191

EmilHvitfeldt commented Dec 16, 2022 •

edited

Loading

simonpcouch left a comment

simonpcouch commented Jun 16, 2023

EmilHvitfeldt commented Jun 16, 2023

hfrick left a comment

simonpcouch commented Mar 27, 2024

simonpcouch commented Mar 27, 2024

simonpcouch left a comment

github-actions bot commented Apr 20, 2024

add extract_fit_time #191

add extract_fit_time #191

Conversation

EmilHvitfeldt commented Dec 16, 2022 • edited Loading

simonpcouch left a comment

Choose a reason for hiding this comment

simonpcouch commented Jun 16, 2023

EmilHvitfeldt commented Jun 16, 2023

hfrick left a comment

Choose a reason for hiding this comment

simonpcouch commented Mar 27, 2024

simonpcouch commented Mar 27, 2024

simonpcouch left a comment

Choose a reason for hiding this comment

github-actions bot commented Apr 20, 2024

EmilHvitfeldt commented Dec 16, 2022 •

edited

Loading