Skip to content

v0.8.4

Compare
Choose a tag to compare
@dcherian dcherian released this 30 Nov 21:50
· 93 commits to main since this release
666d45e

What's Changed

Another round of absolutely massive performance improvements for method="cohorts". This should bigly improve many Xarray groupby workloads (which use cohorts by default). Resampling in particular should be much better

Benchmarks improvements undersell the changes, since the core loop is approximately quadratic. Graph construction time for the example in https://xarray.dev/blog/flox with "cohorts" specified now drops from 30s to 3s 😱

| Before [15324a7a] <v0.8.3>   | After [666d45e9] <main>   |   Ratio | Benchmark (Parameter)                                   |
|------------------------------|---------------------------|---------|---------------------------------------------------------|

#### Larger cohorts, lower tasks
| 5180                         | 3600                      |    0.69 | cohorts.NWMMidwest.track_num_tasks                      |
| 4891                         | 3385                      |    0.69 | cohorts.NWMMidwest.track_num_tasks_optimized            |
| 505                          | 345                       |    0.68 | cohorts.NWMMidwest.track_num_layers                     |

#### Much faster algorithm for detecting cohorts, that should scale better.
| 3.19±0.07ms                  | 2.42±0.05ms               |    0.76 | cohorts.ERA5Google.time_find_group_cohorts              |
| 1.04±0.01ms                  | 782±70μs                  |    0.75 | cohorts.PerfectMonthly.time_find_group_cohorts          |
| 1.06±0.01ms                  | 781±70μs                  |    0.73 | cohorts.PerfectMonthlyRechunked.time_find_group_cohorts |
| 29.7±2ms                     | 12.6±0.9ms                |    0.43 | cohorts.ERA5DayOfYear.time_find_group_cohorts           |
| 7.76±1ms                     | 2.90±0.2ms                |    0.37 | cohorts.ERA5MonthHour.time_find_group_cohorts           |
| 8.17±0.8ms                   | 2.75±0.2ms                |    0.34 | cohorts.ERA5MonthHourRechunked.time_find_group_cohorts  |
| 242±5ms                      | 47.3±2ms                  |    0.2  | cohorts.NWMMidwest.time_find_group_cohorts              |
| 28.8±3ms                     | 4.11±0.3ms                |    0.14 | cohorts.ERA5DayOfYearRechunked.time_find_group_cohorts  |

##### Total time is not too different, we have some overhead in constructing the graphs
| 162±5ms                      | 144±9ms                   |    0.89 | cohorts.ERA5DayOfYearRechunked.time_graph_construct     |
| 20.7±0.2ms                   | 18.3±0.4ms                |    0.89 | cohorts.ERA5Google.time_graph_construct                 |
| 3.21±0.2ms                   | 2.40±0.04ms               |    0.75 | cohorts.PerfectMonthly.time_graph_construct             |
| 181±10ms                     | 129±10ms                  |    0.71 | cohorts.NWMMidwest.time_graph_construct                 |

Changes

Full Changelog: v0.8.3...v0.8.4