Skip to content
Bojian Zheng edited this page Sep 4, 2023 · 41 revisions

Thank you for your interest in Grape 🍇.

Installation

Please note that since part of Grape 🍇 involves communicating with the NVIDIA GPU kernel module via the /proc filesystem (which, to my best knowledge, cannot be handled easily with Docker), all of the following steps are done natively (i.e., outside the Docker environment). We also assume that the OS is either Ubuntu 20.04 or 22.04.

  1. Checkout Grape's 🍇 source code:

    git clone https://github.com/UofT-EcoSystem/Grape-MICRO56-Artifact
  2. Make sure that common software dependencies are installed properly:

    ./scripts/Installation/0-install_build_essentials.sh
  3. Install our customized NVIDIA GPU driver and then reboot the machine:

    ./scripts/Installation/1-install_NVIDIA_GPU_driver.sh
    sudo reboot

    When the machine is rebooted, make sure that the message NVRM: loading customized kernel module from Grape appears when running the command sudo dmesg. If it does not, reinstall the GPU driver and then reboot again:

    # Note the `--reinstall` option.
    ./scripts/Installation/1-install_NVIDIA_GPU_driver.sh --reinstall
    sudo reboot
    • You will be asked some questions upon installing/uninstalling the GPU driver. Most of the questions are regarding the X server, which, to our best knowledge, does not affect the experiment results:
      • Questions when installing:

        An alternate method of installing the NVIDIA driver was detected ...

        Please review the message provided by the maintainer of this alternate installation method and decide how to proceed:

        "Continue installation"

        You specified the '--no-kernel-modules' command line option ... Please ensure that NVIDIA kernel modules matching this driver version are installed separately.

        "Ok"

        Install NVIDIA's 32-bit compatibility libraries?

        "No"

        Unable to determine the path to install the libglvnd EGL vendor library config files.

        "Ok"

        Would you like to run the nvidia-xconfig utility to automatically update your X configuration file so that the NVIDIA X driver will be used when you restart X? Any pre-existing X configuration file will be backed up.

        "No"

      • One question when uninstalling:

        Would you like to run nvidia-xconfig --restore-original-backup to attempt restoration of the original X configuration file?

        "No"

  4. Install CUDA:

    ./scripts/Installation/2-install_CUDA.sh
  5. Build PyTorch:

    ./scripts/Installation/3-build_PyTorch.sh
  6. Checkout the HuggingFace Transformers submodule (no building or installation is required):

    git submodule update --init submodules/transformers
  7. Finally, use the activate script to modify the environment variables accordingly:

    source scripts/Installation/activate

Experiment Workflow

Metadata Compression (Figure 13)

The script

./scripts/Experiment_Workflow/1-test_metadata_compression.sh

runs the experiments that compress CUDA graphs' memory regions and calculate the compression ratios for different models. At the end of the experiments, the results are dumped into a CSV file named "metadata_compression.csv" and visualized as follows:

Model    Original Size Compressed Size
GPT-2    ___           ___
GPT-J    ___           ___
Wav2Vec2 ___           ___
Wav2Vec2 ___           ___

The last two rows both correspond to the Wav2Vec2 model, with the former being the forward pass and the latter being the backward pass. Hence, they should be summed together for the total GPU memory consumption for Wav2Vec2.

Runtime Performance (Figure 11)

The script

./scripts/Experiment_Workflow/2-test_runtime_performance.sh --model=gpt2
./scripts/Experiment_Workflow/2-test_runtime_performance.sh --model=gptj
./scripts/Experiment_Workflow/2-test_runtime_performance.sh --model=wav2vec2

runs the experiments that measure the runtime performance of different models (under 3 different settings, namely Baseline, PtGraph, and Grape as described in the paper). At the end of the experiments, the results are dumped into a CSV file named "speedometer.csv" and visualized as follows:

Name     Attrs                 Avg Std min Median MAX
Baseline {"Model": "GPT-2"}    ___ ___ ___ ___    ___
PtGraph  {"Model": "GPT-2"}    ___ ___ ___ ___    ___
Grape    {"Model": "GPT-2"}    ___ ___ ___ ___    ___
Baseline {"Model": "GPT-J"}    ___ ___ ___ ___    ___
PtGraph  {"Model": "GPT-J"}    ___ ___ ___ ___    ___
Grape    {"Model": "GPT-J"}    ___ ___ ___ ___    ___
Baseline {"Model": "Wav2Vec2"} ___ ___ ___ ___    ___
PtGraph  {"Model": "Wav2Vec2"} ___ ___ ___ ___    ___
Grape    {"Model": "Wav2Vec2"} ___ ___ ___ ___    ___

Runtime Breakdown (Figure 12)

The script

./scripts/Experiment_Workflow/3-test_runtime_breakdown.sh

runs the experiments that compare the runtime breakdown of the GPT-2 model at 3 different stages (input preparation, model, and beam search) between PtGraph and Grape. At the end of the experiments, the results are dumped into a CSV file named "gpt2_generate_profile.csv" and visualized as follows:

Name       Attrs                   Avg Std min Median MAX
InputPrep  {"Executor": "PtGraph"} ___ ___ ___ ___    ___
Model      {"Executor": "PtGraph"} ___ ___ ___ ___    ___
BeamSearch {"Executor": "PtGraph"} ___ ___ ___ ___    ___
InputPrep  {"Executor": "Grape"}   ___ ___ ___ ___    ___
Model      {"Executor": "Grape"}   ___ ___ ___ ___    ___
BeamSearch {"Executor": "Grape"}   ___ ___ ___ ___    ___