Skip to content

voxel-tracer/raytracinginoneweekendincuda

Repository files navigation

Ray Tracing in One Weekend in CUDA -Optimized-

This code is an optimization exercise of the excelent Ray Tracing in One Weekend in CUDA

I've made several changes to make the code run faster on my hardware:

  • I switched from the default cuRand random generator to curandStatePhilox4_32_10 as it was taking too long to initialize
  • using struct instead of class for the scene objects
  • simplified the kernel logic in various places to reduce the register count
  • initialize the scene on the host and copy it to constant memory
  • using my own random number generator

The code includes a Visual Studio solution, and was developped with CUDA 9.2.

If you run the renderer from the command line you can optionally pass a number to indicate how many sample it should render per pixel. Default is 10. The argument parsing logic is rudimentary and the program may crash if you don't pass a positive value.

On my hardware (GTX 1050), rendering an image of 1200x800 using 10 samples takes ~ .6s

I wrote a blog post explaining how to Install Visual Studio 2017, CUDA 9.2 and get your first CUDA project up and running.

UPDATE 2/22/2019

I did some more optimizations to the code, mostly trying to reduce the kernel's register usage to improve occupancy. As such, a lot of those optimizations made the code less readable and should only be considered last. Here is a list of all the changes I made:

  • use rsqrtf() to compute 1/sqrtf() when normalizing vectors
  • use sincosf() to compute sin() and cos() in a single call
  • changed scatter() to update passed ray and attenuation directly instead of computing them in separate variables
  • normalize ray.direction at creation time and remove all unnecessary normalizations later on

This set of changes helped reduce register usage from 56 registers to 46. Performance was slightly better.

On my hardware (GTX 1050), here are a few median rendering times over 10 consecutive runs:

1200x800x1 0.059s (was 0.063s before the update)

1200x800x10 0.549s (was 0.588s before the update)

1200x800x100 5.416s (was 5.846s before the update)

I updated the renderer to accept 2 optional arguments from the command line: num_samples (defaults to 1) and num_runs (defaults to 1). If you pass num_runs > 1 the renderer will compute the median rendering time and print it at the end.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published