page_type | languages | products | description | experimental | |||
---|---|---|---|---|---|---|---|
sample |
|
|
Learn how to use [XGBoost](https://github.com/dmlc/xgboost) with Azure ML. |
issues with multinode xgboost |
This tutorial demonstrates how to run XGBoost on Azure through a series of Python notebooks to demonstrate how a project might develop. This tutorial leverages the Microsoft Kaggle Malware, repartitioned and hosted in Azure Blob.
This tutorial consists of two notebooks:
The 1.local-eda.ipynb
notebook uses the notebook's local compute to find, read, explore, process, and train on the data. This notebook will fail as-is if your machine is not powerful enough - you can try working on a sample of the data (i.e. a single partition).
The code from this notebook is modified into src/run.py and the required packages in environment.yml for operationalization.
The 2.distributed-cpu.ipynb
notebook uses an Azure ML CPU cluster to distributed the data processing and XGBoost training steps remotely, resulting in significant speedup over standard local machines.