You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Provide pipeline abstraction (Model, ModelBuilder, Transformation, …) encapsulating a sequence of transformations and a final (optional) predictive model.
** Requirements **
Training of the pipeline should support cross-validation without data leakage.
The trained pipeline model must be able to score/predict, including from the clients.
** Notes **
MOJO not yet supported.
The current implementation focuses on the backend implementation: the client support is only here to manipulate—esp. be able to predict—pipeline models that have been built by AutoML for example.
There are many ways to extend the pipeline logic and make it more customizable:
in AutoML, the preprocessing param can used to select various predefined transformers that will make up the training pipeline.
for single models and grids, the pipeline client can later be extended to allow the user to define its own pipeline (for example using a syntax similar to sklearn Pipeline).
for even more ad-hoc customization, like the scenario you're suggesting, we could allow code ingestion—probably jython scripts, it should not be too difficult to implement a JythonDataTransformer.
Finally, let's keep in mind that Mojo support is also not supported yet, although likely to be much easier to support with this Pipeline mechanism than with the legacy Target encoding support embedded in Model/ModelBuilder for example, as every transformation now applies clearly sequentially and the estimator model (e.g. GLM) contains only post-transformation information, whereas with the legacy TE integration the model contained a mix of pre-encoding and post-encoding making the MOJO extremely difficult to implement due to other categorical encoding being mixed to this.
The text was updated successfully, but these errors were encountered:
* core pipeline API
* remove unnecessary explicit casting
* remove grid some integration logic (moving to dedicated PR)
* fix ref comparison in PipelineHelperTest
* remove Pipeline from sklearn estimators support
* fix dynamic test for py pipeline algo
* fix R CRAN check
* revert changes on Model.scoreMetrics, but extracted suspicious code
* added example of multiplier transformer in PipelineTest to show how columns transformations can be easily implemented and applied declaratively
* Apply suggestions from @TomF 's code review
Co-authored-by: Tomáš Frýda <[email protected]>
* addressed tomf suggestions
---------
Co-authored-by: Tomáš Frýda <[email protected]>
* Revert "GH-15857: cleanup legacy TE integration in ModelBuilder and AutoML (#16061)"
This reverts commit a8f309b.
* Revert "GH-15857: AutoML pipeline support (#16041)"
This reverts commit 17fa9ee.
* Revert "GH-15856: Grid pipeline support (#16040)"
This reverts commit b7ac670.
* Revert "GH-15855: core pipeline API (#16039)"
This reverts commit c15ea1e.
Sub-issue of #15854
Provide pipeline abstraction (Model, ModelBuilder, Transformation, …) encapsulating a sequence of transformations and a final (optional) predictive model.
** Requirements **
Training of the pipeline should support cross-validation without data leakage.
The trained pipeline model must be able to score/predict, including from the clients.
** Notes **
MOJO not yet supported.
The current implementation focuses on the backend implementation: the client support is only here to manipulate—esp. be able to predict—pipeline models that have been built by AutoML for example.
There are many ways to extend the pipeline logic and make it more customizable:
Finally, let's keep in mind that Mojo support is also not supported yet, although likely to be much easier to support with this Pipeline mechanism than with the legacy Target encoding support embedded in Model/ModelBuilder for example, as every transformation now applies clearly sequentially and the estimator model (e.g. GLM) contains only post-transformation information, whereas with the legacy TE integration the model contained a mix of pre-encoding and post-encoding making the MOJO extremely difficult to implement due to other categorical encoding being mixed to this.
The text was updated successfully, but these errors were encountered: