Standard Scaler fit-transform interface #179

santiago-imelio · 2023-09-14T23:39:10Z

There is already Preprocessing.standard_scale that performs this transformation. However, with this function we cannot standardize a test set using the train set distribution data (mean and std deviation).

Scaler.StandardScaler.fit/2 will return a struct containing the mean and standard deviation of the distribution of learned observations.

lib/scholar/scaler/standard_scaler.ex

josevalim · 2023-09-15T06:32:32Z

@msluszniak should we generally convert our preprocessing steps into modules to mirror scikit learn?

msluszniak · 2023-09-15T15:20:00Z

I think we shouldn't convert preprocessing into modules. For functions that calculate some metrics and then apply them to data, it would be easy to make this interface. However, for functions like one-hot encoding or ordinal encoding, we would need to somehow map the relation key -> calculated_mapping. In Scikit they use dictionaries. One of the functions - normalize has the equivalent in Nx.Linalg.norm. If we decide to support standard_scaler, max_abs_scaler, min_max_scaler, and binarizer then the interface would be inconsistent. If we change the interface to use struct users wouldn't be able to pipe the results of standard_scale. But on the other hand, it would be cool to support both.

msluszniak · 2023-09-15T15:23:08Z

What we can actually do is create separate functions like standard_scale(reference_tensor, tensor_to_apply, opts) and support the application of standardization on a different tensor, WDYT?

msluszniak · 2023-09-15T15:24:06Z

And the standard_scale/2 is a simplified version of this because the reference tensor is the same as tensor_to_apply

josevalim · 2023-09-15T16:12:54Z

@msluszniak the downside of this approach is that we would not be able to fit once and apply the same fitting multiple times. Is that a concern?

msluszniak · 2023-09-21T13:24:00Z

You're right, hmm. In a standard use case, this function won't be called multiple times. But if so, then the benefit from modular implementation might be considerable.

santiago-imelio · 2023-09-23T14:54:27Z

@msluszniak From what I understand, it seems too soon to agree on an interface that fits both the language and the needs of developers, so in my opinion keeping the preprocessing API functional seems to be appropriate for now.

Closing in favor of continuing the discussion.

test/scholar/scaler/standard_scaler_test.exs

josevalim · 2023-12-13T06:19:30Z

I have reopened this PR in case you would like to address the comments in it. :)

msluszniak

We also need to add docs for fit, transform, and fit_transform

lib/scholar/preprocessing/standard_scaler.ex

standard scaler fit-transform interface

84643fe

josevalim reviewed Sep 15, 2023

View reviewed changes

lib/scholar/scaler/standard_scaler.ex Outdated Show resolved Hide resolved

josevalim reviewed Sep 15, 2023

View reviewed changes

lib/scholar/scaler/standard_scaler.ex Outdated Show resolved Hide resolved

josevalim reviewed Sep 15, 2023

View reviewed changes

lib/scholar/scaler/standard_scaler.ex Outdated Show resolved Hide resolved

santiago-imelio closed this Sep 23, 2023

josevalim mentioned this pull request Dec 12, 2023

Scaling train and test return different results #217

Closed

josevalim reopened this Dec 13, 2023

josevalim reviewed Dec 13, 2023

View reviewed changes

test/scholar/scaler/standard_scaler_test.exs Outdated Show resolved Hide resolved

address feedback, refactor to reuse code

c19c200

santiago-imelio requested a review from josevalim December 13, 2023 13:09

msluszniak reviewed Dec 13, 2023

View reviewed changes

lib/scholar/preprocessing/standard_scaler.ex Outdated Show resolved Hide resolved

lib/scholar/preprocessing/standard_scaler.ex Outdated Show resolved Hide resolved

lib/scholar/preprocessing/standard_scaler.ex Outdated Show resolved Hide resolved

msluszniak reviewed Dec 13, 2023

View reviewed changes

lib/scholar/preprocessing/standard_scaler.ex Outdated Show resolved Hide resolved

add docs and improvements

eb644e2

msluszniak reviewed Dec 13, 2023

View reviewed changes

lib/scholar/preprocessing/standard_scaler.ex Outdated Show resolved Hide resolved

msluszniak reviewed Dec 13, 2023

View reviewed changes

lib/scholar/preprocessing/standard_scaler.ex Outdated Show resolved Hide resolved

santiago-imelio added 2 commits December 13, 2023 15:45

fix doctests

5ad63dd

fix doctest syntax

fbcef94

santiago-imelio requested a review from msluszniak December 13, 2023 18:50

simplify code

8a7bfe6

msluszniak approved these changes Dec 14, 2023

View reviewed changes

josevalim reviewed Dec 14, 2023

View reviewed changes

lib/scholar/preprocessing/standard_scaler.ex Outdated Show resolved Hide resolved

Update lib/scholar/preprocessing/standard_scaler.ex

be03623

josevalim reviewed Dec 14, 2023

View reviewed changes

lib/scholar/preprocessing/standard_scaler.ex Outdated Show resolved Hide resolved

Update lib/scholar/preprocessing/standard_scaler.ex

043a2d7

josevalim reviewed Dec 14, 2023

View reviewed changes

lib/scholar/preprocessing/standard_scaler.ex Outdated Show resolved Hide resolved

Update lib/scholar/preprocessing/standard_scaler.ex

23a1bbc

josevalim merged commit f651fdc into elixir-nx:main Dec 14, 2023
2 checks passed

santiago-imelio deleted the feat/scaler-fit-transform branch December 14, 2023 17:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Standard Scaler fit-transform interface #179

Standard Scaler fit-transform interface #179

santiago-imelio commented Sep 14, 2023 •

edited

Loading

josevalim commented Sep 15, 2023

msluszniak commented Sep 15, 2023

msluszniak commented Sep 15, 2023

msluszniak commented Sep 15, 2023

josevalim commented Sep 15, 2023

msluszniak commented Sep 21, 2023

santiago-imelio commented Sep 23, 2023

josevalim commented Dec 13, 2023

msluszniak left a comment

Standard Scaler fit-transform interface #179

Standard Scaler fit-transform interface #179

Conversation

santiago-imelio commented Sep 14, 2023 • edited Loading

josevalim commented Sep 15, 2023

msluszniak commented Sep 15, 2023

msluszniak commented Sep 15, 2023

msluszniak commented Sep 15, 2023

josevalim commented Sep 15, 2023

msluszniak commented Sep 21, 2023

santiago-imelio commented Sep 23, 2023

josevalim commented Dec 13, 2023

msluszniak left a comment

Choose a reason for hiding this comment

santiago-imelio commented Sep 14, 2023 •

edited

Loading