Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement StandardScaler; add associated tests #132

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

cryptodeal
Copy link
Contributor

Implemented BaseScaler abstract class + StandardScaler, which extends the base class. Also added simple unit tests for StandardScaler.

Fixed a few type errors in shumai/tensor/tensor.ts while working on this.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 5, 2022
@cryptodeal
Copy link
Contributor Author

Also, noticed that Bun's GC is eager to the point that when running consecutive bun wiptest, I would see an occasional failure of the test on memory as Bun was garbage collecting tensors eagerly enough that the resulting mem usage was less than at start as Bun had GC'd some tensors before the call to dispose.

Accordingly, I've updated the test to check that the memory is less than or equal to memory usage at the start of the test.

Copy link
Contributor

@asilvas asilvas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM just a couple suggestions

shumai/util/preprocessing.ts Outdated Show resolved Hide resolved
shumai/util/preprocessing.ts Outdated Show resolved Hide resolved
@cryptodeal
Copy link
Contributor Author

cryptodeal commented Dec 5, 2022

Biggest issue is the weirdness w test CI runs failing due to mismatched dtype w/ logged warnings RE usage of Float16Array from this test (when the tests don't explicitly use Float16Array). I'm going to mark as a draft pending further investigation into this. (Unable to replicate that error specifically locally, but there's further bugs I'm finding where a test will pass when run as Float32Array and fail when run as Float64Array, so definitely needs to be debugged before ready to merge.

Tweaked implementation to debug a few errors I was catching locally.

@cryptodeal cryptodeal marked this pull request as draft December 5, 2022 02:48
@cryptodeal
Copy link
Contributor Author

Going to implement remaining class methods so that it's at feature parity with the sklearn.preprocessing.StandardScaler implementation, then will flag as ready for review.

@cryptodeal cryptodeal marked this pull request as ready for review December 10, 2022 06:00
@cryptodeal
Copy link
Contributor Author

cryptodeal commented Dec 10, 2022

At this point, I think there's either something I'm missing in the implementation causing failure on Linux runs due to transformed.dtype not matching the original dtype of the original Tensor.

Welcome feedback with regard to the implementation; I have largely directly ported this Golang sklearn port as I'm more familiar with Go than python.

@cryptodeal
Copy link
Contributor Author

Currently, it's failing on GPU when comparing dtype of scaled against dtype of the pre-transformed inputs. The actual values are scaled as expected, but dtype doesn't match.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants