Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AutoML Pipeline – Java API #15855

Closed
Tracked by #15854
sebhrusen opened this issue Oct 23, 2023 · 0 comments · Fixed by #16039
Closed
Tracked by #15854

AutoML Pipeline – Java API #15855

sebhrusen opened this issue Oct 23, 2023 · 0 comments · Fixed by #16039
Assignees
Milestone

Comments

@sebhrusen
Copy link
Contributor

sebhrusen commented Oct 23, 2023

Sub-issue of #15854

Provide pipeline abstraction (Model, ModelBuilder, Transformation, …) encapsulating a sequence of transformations and a final (optional) predictive model.

** Requirements **
Training of the pipeline should support cross-validation without data leakage.
The trained pipeline model must be able to score/predict, including from the clients.

** Notes **
MOJO not yet supported.

The current implementation focuses on the backend implementation: the client support is only here to manipulate—esp. be able to predict—pipeline models that have been built by AutoML for example.
There are many ways to extend the pipeline logic and make it more customizable:

  1. in AutoML, the preprocessing param can used to select various predefined transformers that will make up the training pipeline.
  2. for single models and grids, the pipeline client can later be extended to allow the user to define its own pipeline (for example using a syntax similar to sklearn Pipeline).
  3. for even more ad-hoc customization, like the scenario you're suggesting, we could allow code ingestion—probably jython scripts, it should not be too difficult to implement a JythonDataTransformer.

Finally, let's keep in mind that Mojo support is also not supported yet, although likely to be much easier to support with this Pipeline mechanism than with the legacy Target encoding support embedded in Model/ModelBuilder for example, as every transformation now applies clearly sequentially and the estimator model (e.g. GLM) contains only post-transformation information, whereas with the legacy TE integration the model contained a mix of pre-encoding and post-encoding making the MOJO extremely difficult to implement due to other categorical encoding being mixed to this.

@sebhrusen sebhrusen self-assigned this Oct 23, 2023
@sebhrusen sebhrusen added this to the 3.46.0.1 milestone Oct 23, 2023
@sebhrusen sebhrusen linked a pull request Jan 29, 2024 that will close this issue
sebhrusen added a commit that referenced this issue Feb 12, 2024
* core pipeline API

* remove unnecessary explicit casting

* remove grid some integration logic (moving to dedicated PR)

* fix ref comparison in PipelineHelperTest

* remove Pipeline from sklearn estimators support

* fix dynamic test for py pipeline algo

* fix R CRAN check

* revert changes on Model.scoreMetrics, but extracted suspicious code

* added example of multiplier transformer in PipelineTest to show how columns transformations can be easily implemented and applied declaratively

* Apply suggestions from @TomF 's code review

Co-authored-by: Tomáš Frýda <[email protected]>

* addressed tomf suggestions

---------

Co-authored-by: Tomáš Frýda <[email protected]>
mn-mikke added a commit that referenced this issue Feb 27, 2024
valenad1 added a commit that referenced this issue Mar 8, 2024
valenad1 added a commit that referenced this issue Mar 11, 2024
* Revert "GH-15857: cleanup legacy TE integration in ModelBuilder and AutoML (#16061)"

This reverts commit a8f309b.

* Revert "GH-15857: AutoML pipeline support (#16041)"

This reverts commit 17fa9ee.

* Revert "GH-15856: Grid pipeline support (#16040)"

This reverts commit b7ac670.

* Revert "GH-15855: core pipeline API (#16039)"

This reverts commit c15ea1e.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant