-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standard Scaler fit-transform interface #179
Standard Scaler fit-transform interface #179
Conversation
@msluszniak should we generally convert our preprocessing steps into modules to mirror scikit learn? |
I think we shouldn't convert preprocessing into modules. For functions that calculate some metrics and then apply them to data, it would be easy to make this interface. However, for functions like one-hot encoding or ordinal encoding, we would need to somehow map the relation |
What we can actually do is create separate functions like standard_scale(reference_tensor, tensor_to_apply, opts) and support the application of standardization on a different tensor, WDYT? |
And the standard_scale/2 is a simplified version of this because the reference tensor is the same as tensor_to_apply |
@msluszniak the downside of this approach is that we would not be able to fit once and apply the same fitting multiple times. Is that a concern? |
You're right, hmm. In a standard use case, this function won't be called multiple times. But if so, then the benefit from modular implementation might be considerable. |
@msluszniak From what I understand, it seems too soon to agree on an interface that fits both the language and the needs of developers, so in my opinion keeping the preprocessing API functional seems to be appropriate for now. Closing in favor of continuing the discussion. |
I have reopened this PR in case you would like to address the comments in it. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need to add docs for fit
, transform
, and fit_transform
There is already
Preprocessing.standard_scale
that performs this transformation. However, with this function we cannot standardize a test set using the train set distribution data (mean and std deviation).Scaler.StandardScaler.fit/2
will return a struct containing the mean and standard deviation of the distribution of learned observations.