Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Appends prediction columns to transform schema #60

Merged
merged 2 commits into from
Dec 17, 2024

Conversation

Marcus-Rosti
Copy link
Contributor

Pipelines fail when stages depend on the output of transformSchema since IsolationForest does not append them.

@jverbus jverbus self-assigned this Dec 17, 2024
@jverbus jverbus self-requested a review December 17, 2024 07:41
@jverbus
Copy link
Contributor

jverbus commented Dec 17, 2024

Thank you for making this suggestion! I already had these checks in the Model class transformSchema method, but you're right that the Estimator should have these too. By adding the new columns at the Estimator stage, downstream pipeline components can anticipate these columns right from the start. This matches the Spark ML design philosophy, where transformSchema at the Estimator level should describe the schema that the resulting Model will produce once fitted.

@jverbus jverbus merged commit bc58518 into linkedin:master Dec 17, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants