Skip to content
This repository has been archived by the owner on Oct 16, 2023. It is now read-only.

Latest commit

 

History

History
44 lines (31 loc) · 1.41 KB

README.md

File metadata and controls

44 lines (31 loc) · 1.41 KB

Run GPT With Colossal-AI

How to Prepare Webtext Dataset

You can download the preprocessed sample dataset for this demo via our Google Drive sharing link.

Run this Demo

Use the following commands to install prerequisites.

# assuming using cuda 11.3
conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch
pip install colossalai==0.1.9+torch1.11cu11.3 -f https://release.colossalai.org

Use the following commands to execute training.

#!/usr/bin/env sh
export DATA=/path/to/small-gpt-dataset.json'

# run on a single node
colossalai run --nproc_per_node=<num_gpus> train_gpt.py --config configs/<config_file> --from_torch

# run on multiple nodes with slurm
colossalai run --nproc_per_node=<num_gpus> \
   --master_addr <hostname> \
   --master_port <port-number> \
   --hosts <list-of-hostname-separated-by-comma> \
   train_gpt.py \
   --config configs/<config_file> \
   --from_torch

# run on multiple nodes with slurm
srun python \
   train_gpt.py \
   --config configs/<config_file> \
   --host <master_node>
   

You can set the <config_file> to any file in the configs folder. To simply get it running, you can start with gpt_small_zero3_pp1d.py on a single node first. You can view the explanations in the config file regarding how to change the parallel setting.