-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
2 changed files
with
141 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,140 @@ | ||
###################################################### | ||
Migration Guide: How to migrate from XGBoost Spark 3.x | ||
###################################################### | ||
|
||
XGBoost Spark underwent significant modifications beginning with version 3.0, | ||
which may cause compatibility issues with existing user code. | ||
|
||
This guide will walk you through the process of updating your code to ensure | ||
it's compatible with XGBoost Spark 3.0 and later versions. | ||
|
||
********************** | ||
XGBoost Spark Packages | ||
********************** | ||
|
||
XGBoost Spark 3.0 introduced a single uber package named xgboost-spark_2.12-3.0.0.jar, which bundles | ||
both xgboost4j and xgboost4j-spark. This means you can now simply use `xgboost-spark`` for your application. | ||
|
||
.. code-block:: xml | ||
<dependency> | ||
<groupId>ml.dmlc</groupId> | ||
<artifactId>xgboost-spark_${scala.binary.version}</artifactId> | ||
<version>3.0.0</version> | ||
</dependency> | ||
When submitting the XGBoost application to the Spark cluster, you only need to specify the single `xgboost-spark` package. | ||
|
||
.. code-block:: bash | ||
spark-submit \ | ||
--jars xgboost-spark_2.12-3.0.0.jar \ | ||
... \ | ||
************** | ||
XGBoost Ranking | ||
************** | ||
|
||
The ability to handle ranking problems using XGBoostRegressor has been discontinued. | ||
As an alternative, we have introduced XGBoostRanker, which is specifically designed | ||
to support ranking algorithms. | ||
|
||
.. code-block:: scala | ||
// before 3.0 | ||
val regressor = new XGBoostRegressor().setObjective("rank:ndcg") | ||
// after 3.0 | ||
val ranker = new XGBoostRanker() | ||
****************************** | ||
XGBoost Constructor Parameters | ||
****************************** | ||
|
||
XGBoost Spark now categorizes parameters into two groups: XGBoost-Spark parameters and XGBoost parameters. | ||
When constructing an XGBoost estimator, only XGBoost-specific parameters are permitted. XGBoost-Spark specific | ||
parameters must be configured using the estimator's setter methods. It's worth noting that | ||
`XGBoost Parameters <https://xgboost.readthedocs.io/en/stable/parameter.html>`_ | ||
can be set both during construction and through the estimator's setter methods. | ||
|
||
.. code-block:: scala | ||
// before 3.0 | ||
val xgboost_paras = Map( | ||
"eta" -> "1", | ||
"max_depth" -> "6", | ||
"objective" -> "binary:logistic", | ||
"num_round" -> 5, | ||
"num_workers" -> 1, | ||
"features" -> "feature_column", | ||
"label" -> "label_column", | ||
) | ||
val classifier = new XGBoostClassifier(xgboost_paras) | ||
// after 3.0 | ||
val xgboost_paras = Map( | ||
"eta" -> "1", | ||
"max_depth" -> "6", | ||
"objective" -> "binary:logistic", | ||
) | ||
val classifier = new XGBoostClassifier(xgboost_paras) | ||
.setNumRound(5) | ||
.setNumWorkers(1) | ||
.setFeaturesCol("feature_column") | ||
.setLabelCol("label_column") | ||
// Or you can use setter to set all parameters | ||
val classifier = new XGBoostClassifier() | ||
.setNumRound(5) | ||
.setNumWorkers(1) | ||
.setFeaturesCol("feature_column") | ||
.setLabelCol("label_column") | ||
.setEta(1) | ||
.setMaxDepth(6) | ||
.setObjective("binary:logistic") | ||
***************** | ||
Unused Parameters | ||
***************** | ||
|
||
Starting from 3.0, below parameters are not used anymore. | ||
|
||
- cacheTrainingSet | ||
|
||
If you wish to cache the training dataset, you have the option to implement caching | ||
in your code prior to fitting the data to an estimator. | ||
|
||
.. code-block:: scala | ||
val df = input.cache() | ||
val model = new XGBoostClassifier().fit(df) | ||
- trainTestRatio | ||
|
||
The following method can be employed to do the evaluation. | ||
|
||
.. code-block:: scala | ||
val Array(train, eval) = trainDf.randomSplit(Array(0.7, 0.3)) | ||
val classifier = new XGBoostClassifer().setEvalDataset(eval) | ||
val model = classifier.fit(train) | ||
- tracker_conf | ||
|
||
The following method can be used to configure RabitTracker. | ||
|
||
.. code-block:: scala | ||
val classifier = new XGBoostClassifer() | ||
.setRabitTrackerTimeout(100) | ||
.setRabitTrackerHostIp("192.168.0.2") | ||
.setRabitTrackerPort(19203) | ||
- rabitRingReduceThreshold | ||
- rabitTimeout | ||
- rabitConnectRetry | ||
- singlePrecisionHistogram | ||
- lambdaBias | ||
- interactionConstraints | ||
- objectiveType |