kron-torch 0.2.2
What's Changed
- Trust region clipping improved
- Get rid of max skew triangular and replace with
memory_save_mode
which can be either None to use default triangular preconditioners, 'one_diag' to use one diagonal per layer, or 'all_diag' to use all diagonal preconditioners (fastest/lowest mem but slower learning)