-
Notifications
You must be signed in to change notification settings - Fork 1
Home
Thank you for your interest in Grape 🍇.
Please note that since part of Grape 🍇 involves communicating with the NVIDIA GPU kernel module via the /proc
filesystem (which, to my best knowledge, cannot be handled easily with Docker), all of the following steps are done natively (i.e., outside the Docker environment). We also assume that the OS is either Ubuntu 20.04 or 22.04.
-
Checkout Grape's 🍇 source code:
git clone https://github.com/UofT-EcoSystem/Grape-MICRO56-Artifact
-
Make sure that common software dependencies are installed properly:
./scripts/Installation/0-install_build_essentials.sh
-
Install our customized NVIDIA GPU driver and then reboot the machine:
./scripts/Installation/1-install_NVIDIA_GPU_driver.sh sudo reboot
When the machine is rebooted, make sure that the message
NVRM: loading customized kernel module from Grape
appears when running the commandsudo dmesg
. If it does not, reinstall the GPU driver and then reboot again:# Note the `--reinstall` option. ./scripts/Installation/1-install_NVIDIA_GPU_driver.sh --reinstall sudo reboot
- You will be asked some questions upon installing/uninstalling the GPU driver. Most of the questions are regarding the X server, which, to our best knowledge, does not affect the experiment results:
-
Questions when installing:
An alternate method of installing the NVIDIA driver was detected ...
Please review the message provided by the maintainer of this alternate installation method and decide how to proceed:
"Continue installation"
You specified the '--no-kernel-modules' command line option ... Please ensure that NVIDIA kernel modules matching this driver version are installed separately.
"Ok"
Install NVIDIA's 32-bit compatibility libraries?
"No"
Unable to determine the path to install the libglvnd EGL vendor library config files.
"Ok"
Would you like to run the nvidia-xconfig utility to automatically update your X configuration file so that the NVIDIA X driver will be used when you restart X? Any pre-existing X configuration file will be backed up.
"No"
-
One question when uninstalling:
Would you like to run
nvidia-xconfig --restore-original-backup
to attempt restoration of the original X configuration file?"No"
-
- You will be asked some questions upon installing/uninstalling the GPU driver. Most of the questions are regarding the X server, which, to our best knowledge, does not affect the experiment results:
-
Install CUDA:
./scripts/Installation/2-install_CUDA.sh
-
Build PyTorch:
./scripts/Installation/3-build_PyTorch.sh
-
Checkout the HuggingFace Transformers submodule (no building or installation is required):
git submodule update --init submodules/transformers
-
Finally, use the
activate
script to modify the environment variables accordingly:source scripts/Installation/activate
The script
./scripts/Experiment_Workflow/1-test_metadata_compression.sh
runs the experiments that compress CUDA graphs' memory regions and calculate the compression ratios for different models. At the end of the experiments, the results are dumped into a CSV file named "metadata_compression.csv" and visualized as follows:
Model Original Size Compressed Size
GPT-2 ___ ___
GPT-J ___ ___
Wav2Vec2 ___ ___
Wav2Vec2 ___ ___
The last two rows both correspond to the Wav2Vec2 model, with the former being the forward pass and the latter being the backward pass. Hence, they should be summed together for the total GPU memory consumption for Wav2Vec2.
The script
./scripts/Experiment_Workflow/2-test_runtime_performance.sh --model=gpt2
./scripts/Experiment_Workflow/2-test_runtime_performance.sh --model=gptj
./scripts/Experiment_Workflow/2-test_runtime_performance.sh --model=wav2vec2
runs the experiments that measure the runtime performance of different models (under 3 different settings, namely Baseline, PtGraph, and Grape as described in the paper). At the end of the experiments, the results are dumped into a CSV file named "speedometer.csv" and visualized as follows:
Name Attrs Avg Std min Median MAX
Baseline {"Model": "GPT-2"} ___ ___ ___ ___ ___
PtGraph {"Model": "GPT-2"} ___ ___ ___ ___ ___
Grape {"Model": "GPT-2"} ___ ___ ___ ___ ___
Baseline {"Model": "GPT-J"} ___ ___ ___ ___ ___
PtGraph {"Model": "GPT-J"} ___ ___ ___ ___ ___
Grape {"Model": "GPT-J"} ___ ___ ___ ___ ___
Baseline {"Model": "Wav2Vec2"} ___ ___ ___ ___ ___
PtGraph {"Model": "Wav2Vec2"} ___ ___ ___ ___ ___
Grape {"Model": "Wav2Vec2"} ___ ___ ___ ___ ___
The script
./scripts/Experiment_Workflow/3-test_runtime_breakdown.sh
runs the experiments that compare the runtime breakdown of the GPT-2 model at 3 different stages (input preparation, model, and beam search) between PtGraph and Grape. At the end of the experiments, the results are dumped into a CSV file named "gpt2_generate_profile.csv" and visualized as follows:
Name Attrs Avg Std min Median MAX
InputPrep {"Executor": "PtGraph"} ___ ___ ___ ___ ___
Model {"Executor": "PtGraph"} ___ ___ ___ ___ ___
BeamSearch {"Executor": "PtGraph"} ___ ___ ___ ___ ___
InputPrep {"Executor": "Grape"} ___ ___ ___ ___ ___
Model {"Executor": "Grape"} ___ ___ ___ ___ ___
BeamSearch {"Executor": "Grape"} ___ ___ ___ ___ ___
Last edited on 2023/8/7
.