Appends prediction columns to transform schema #60

Marcus-Rosti · 2024-11-27T22:30:39Z

Pipelines fail when stages depend on the output of transformSchema since IsolationForest does not append them.

jverbus · 2024-12-17T07:52:10Z

Thank you for making this suggestion! I already had these checks in the Model class transformSchema method, but you're right that the Estimator should have these too. By adding the new columns at the Estimator stage, downstream pipeline components can anticipate these columns right from the start. This matches the Spark ML design philosophy, where transformSchema at the Estimator level should describe the schema that the resulting Model will produce once fitted.

Marcus-Rosti added 2 commits November 27, 2024 14:29

Appends prediction columns to transform schema

f61607b

fixes the comment

4d49dca

jverbus self-assigned this Dec 17, 2024

jverbus self-requested a review December 17, 2024 07:41

jverbus merged commit bc58518 into linkedin:master Dec 17, 2024
13 checks passed

jverbus mentioned this pull request Dec 17, 2024

Made minor edits to match the proposed Estimator transformSchema mehtod to the existing Model transformSchema method. #61

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Appends prediction columns to transform schema #60

Appends prediction columns to transform schema #60

Marcus-Rosti commented Nov 27, 2024

jverbus commented Dec 17, 2024

Appends prediction columns to transform schema #60

Appends prediction columns to transform schema #60

Conversation

Marcus-Rosti commented Nov 27, 2024

jverbus commented Dec 17, 2024