Skip to content

CUDA implementation of calculating the speedup of matrix-vector multiplication using row-wise decomposition on GPU across different number of threads.

Notifications You must be signed in to change notification settings

rezajahadi/cuda-matrix-multiplication

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

CUDA Matrix-Vector Multiplication

This repository contains CUDA code for performing matrix-vector multiplication using row-wise decomposition. The CUDA kernel launches multiple threads to efficiently compute the result in parallel on a GPU.

The code calculates the speedup for matrix-vector multiplication with varying thread configurations and prints the speedup for each configuration. The test was started from threads_per_block = 32 to 32*20.

Results

Speed up vs number of threads per block:

image

We can clearly see that the speedup decreases as the number of threads per blocks increased.

Execution Environment

The code was run on the wes-00-00 GPU node of Wesley.

Prerequisites

  • NVIDIA GPU with CUDA support
  • CUDA Toolkit installed
  • C compiler (e.g., GCC) for compiling host code
  • sshpass utility for password-based SSH authentication (install using sudo apt-get install sshpass)

Usage

Compilation

Compile the code using the provided Makefile:

nvcc -g -G script.cu -o script

This will generate an executable named matrix_mult.

Execution

Run the executable, providing the number of threads as an argument:

./matrix_mult

Replace <threadsnum> with the desired number of threads per block. The program will calculate the parallel execution time and print the speedup for each thread configuration.

Example

To run the program with 32 threads per block:

./matrix_mult 32

Submit a job

In order to submit a job in the interaction section:

qsub -I -l host=wes-00-00

Compile and Deploy

I provided a bash script that automates the process of compiling and executing the CUDA code on the specified remote server with the provided file and input value 'n'

Contributing

Contributions are welcome! Feel free to open an issue or submit a pull request for any improvements or bug fixes.

Ensure to provide clear instructions on how to use the script and what parameters it expects. Adjust the file paths and server credentials in the script according to your environment.

About

CUDA implementation of calculating the speedup of matrix-vector multiplication using row-wise decomposition on GPU across different number of threads.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages