diff --git a/README.md b/README.md
index ae0896a..32cce04 100644
--- a/README.md
+++ b/README.md
@@ -1,184 +1,114 @@
--------------------------------------------------------------------------------
-CIS565: Project 4: CUDA Rasterizer
--------------------------------------------------------------------------------
-Fall 2014
--------------------------------------------------------------------------------
-Due Monday 10/27/2014 @ 12 PM
--------------------------------------------------------------------------------
-
--------------------------------------------------------------------------------
-NOTE:
--------------------------------------------------------------------------------
-This project requires an NVIDIA graphics card with CUDA capability! Any card with CUDA compute capability 1.1 or higher will work fine for this project. For a full list of CUDA capable cards and their compute capability, please consult: http://developer.nvidia.com/cuda/cuda-gpus. If you do not have an NVIDIA graphics card in the machine you are working on, feel free to use any machine in the SIG Lab or in Moore100 labs. All machines in the SIG Lab and Moore100 are equipped with CUDA capable NVIDIA graphics cards. If this too proves to be a problem, please contact Patrick or Karl as soon as possible.
-
--------------------------------------------------------------------------------
-INTRODUCTION:
--------------------------------------------------------------------------------
-In this project, you will implement a simplified CUDA based implementation of a standard rasterized graphics pipeline, similar to the OpenGL pipeline. In this project, you will implement vertex shading, primitive assembly, perspective transformation, rasterization, fragment shading, and write the resulting fragments to a framebuffer. More information about the rasterized graphics pipeline can be found in the class slides and in your notes from CIS560.
-
-The basecode provided includes an OBJ loader and much of the mundane I/O and bookkeeping code. The basecode also includes some functions that you may find useful, described below. The core rasterization pipeline is left for you to implement.
-
-You MAY NOT use ANY raycasting/raytracing AT ALL in this project, EXCEPT in the fragment shader step. One of the purposes of this project is to see how a rasterization pipeline can generate graphics WITHOUT the need for raycasting! Raycasting may only be used in the fragment shader effect for interesting shading results, but is absolutely not allowed in any other stages of the pipeline.
-
-Also, you MAY NOT use OpenGL ANYWHERE in this project, aside from the given OpenGL code for drawing Pixel Buffer Objects to the screen. Use of OpenGL for any pipeline stage instead of your own custom implementation will result in an incomplete project.
-
-Finally, note that while this basecode is meant to serve as a strong starting point for a CUDA rasterizer, you are not required to use this basecode if you wish, and you may also change any part of the basecode specification as you please, so long as the final rendered result is correct.
-
--------------------------------------------------------------------------------
-CONTENTS:
--------------------------------------------------------------------------------
-The Project4 root directory contains the following subdirectories:
-	
-* src/ contains the source code for the project. Both the Windows Visual Studio solution and the OSX makefile reference this folder for all source; the base source code compiles on OSX and Windows without modification.
-* objs/ contains example obj test files: cow.obj, cube.obj, tri.obj.
-* renders/ contains an example render of the given example cow.obj file with a z-depth fragment shader. 
-* windows/ contains a Windows Visual Studio 2010 project and all dependencies needed for building and running on Windows 7.
-
-The Windows and OSX versions of the project build and run exactly the same way as in Project0, Project1, and Project2.
-
--------------------------------------------------------------------------------
-REQUIREMENTS:
--------------------------------------------------------------------------------
-In this project, you are given code for:
-
-* A library for loading/reading standard Alias/Wavefront .obj format mesh files and converting them to OpenGL style VBOs/IBOs
-* A suggested order of kernels with which to implement the graphics pipeline
-* Working code for CUDA-GL interop
-
-You will need to implement the following stages of the graphics pipeline and features:
-
-* Vertex Shading
-* Primitive Assembly with support for triangle VBOs/IBOs
-* Perspective Transformation
-* Rasterization through either a scanline or a tiled approach
-* Fragment Shading
-* A depth buffer for storing and depth testing fragments
-* Fragment to framebuffer writing
-* A simple lighting/shading scheme, such as Lambert or Blinn-Phong, implemented in the fragment shader
-
-You are also required to implement at least 3 of the following features:
-
-* Additional pipeline stages. Each one of these stages can count as 1 feature:
-   * Geometry shader
-   * Transformation feedback
-   * Back-face culling
-   * Scissor test
-   * Stencil test
-   * Blending
-
-IMPORTANT: For each of these stages implemented, you must also add a section to your README stating what the expected performance impact of that pipeline stage is, and real performance comparisons between your rasterizer with that stage and without.
-
-* Correct color interpolation between points on a primitive
-* Texture mapping WITH texture filtering and perspective correct texture coordinates
-* Support for additional primitices. Each one of these can count as HALF of a feature.
-   * Lines
-   * Line strips
-   * Triangle fans
-   * Triangle strips
-   * Points
-* Anti-aliasing
-* Order-independent translucency using a k-buffer
-* MOUSE BASED interactive camera support. Interactive camera support based only on the keyboard is not acceptable for this feature.
-
--------------------------------------------------------------------------------
-BASE CODE TOUR:
--------------------------------------------------------------------------------
-You will be working primarily in two files: rasterizeKernel.cu, and rasterizerTools.h. Within these files, areas that you need to complete are marked with a TODO comment. Areas that are useful to and serve as hints for optional features are marked with TODO (Optional). Functions that are useful for reference are marked with the comment LOOK.
-
-* rasterizeKernels.cu contains the core rasterization pipeline. 
-	* A suggested sequence of kernels exists in this file, but you may choose to alter the order of this sequence or merge entire kernels if you see fit. For example, if you decide that doing has benefits, you can choose to merge the vertex shader and primitive assembly kernels, or merge the perspective transform into another kernel. There is not necessarily a right sequence of kernels (although there are wrong sequences, such as placing fragment shading before vertex shading), and you may choose any sequence you want. Please document in your README what sequence you choose and why.
-	* The provided kernels have had their input parameters removed beyond basic inputs such as the framebuffer. You will have to decide what inputs should go into each stage of the pipeline, and what outputs there should be. 
-
-* rasterizeTools.h contains various useful tools, including a number of barycentric coordinate related functions that you may find useful in implementing scanline based rasterization...
-	* A few pre-made structs are included for you to use, such as fragment and triangle. A simple rasterizer can be implemented with these structs as is. However, as with any part of the basecode, you may choose to modify, add to, use as-is, or outright ignore them as you see fit.
-	* If you do choose to add to the fragment struct, be sure to include in your README a rationale for why. 
-
-You will also want to familiarize yourself with:
-
-* main.cpp, which contains code that transfers VBOs/CBOs/IBOs to the rasterization pipeline. Interactive camera work will also have to be implemented in this file if you choose that feature.
-* utilities.h, which serves as a kitchen-sink of useful functions
-
--------------------------------------------------------------------------------
-SOME RESOURCES:
--------------------------------------------------------------------------------
-The following resources may be useful for this project:
-
-* High-Performance Software Rasterization on GPUs
-	* Paper (HPG 2011): http://www.tml.tkk.fi/~samuli/publications/laine2011hpg_paper.pdf
-	* Code: http://code.google.com/p/cudaraster/ Note that looking over this code for reference with regard to the paper is fine, but we most likely will not grant any requests to actually incorporate any of this code into your project.
-	* Slides: http://bps11.idav.ucdavis.edu/talks/08-gpuSoftwareRasterLaineAndPantaleoni-BPS2011.pdf
-* The Direct3D 10 System (SIGGRAPH 2006) - for those interested in doing geometry shaders and transform feedback.
-	* http://133.11.9.3/~takeo/course/2006/media/papers/Direct3D10_siggraph2006.pdf
-* Multi-Fragment Eﬀects on the GPU using the k-Buﬀer - for those who want to do a k-buffer
-	* http://www.inf.ufrgs.br/~comba/papers/2007/kbuffer_preprint.pdf
-* FreePipe: A Programmable, Parallel Rendering Architecture for Efficient Multi-Fragment Effects (I3D 2010)
-	* https://sites.google.com/site/hmcen0921/cudarasterizer
-* Writing A Software Rasterizer In Javascript:
-	* Part 1: http://simonstechblog.blogspot.com/2012/04/software-rasterizer-part-1.html
-	* Part 2: http://simonstechblog.blogspot.com/2012/04/software-rasterizer-part-2.html
-
--------------------------------------------------------------------------------
-NOTES ON GLM:
--------------------------------------------------------------------------------
-This project uses GLM, the GL Math library, for linear algebra. You need to know two important points on how GLM is used in this project:
-
-* In this project, indices in GLM vectors (such as vec3, vec4), are accessed via swizzling. So, instead of v[0], v.x is used, and instead of v[1], v.y is used, and so on and so forth.
-* GLM Matrix operations work fine on NVIDIA Fermi cards and later, but pre-Fermi cards do not play nice with GLM matrices. As such, in this project, GLM matrices are replaced with a custom matrix struct, called a cudaMat4, found in cudaMat4.h. A custom function for multiplying glm::vec4s and cudaMat4s is provided as multiplyMV() in intersections.h.
-
--------------------------------------------------------------------------------
-README
--------------------------------------------------------------------------------
-All students must replace or augment the contents of this Readme.md in a clear 
-manner with the following:
-
-* A brief description of the project and the specific features you implemented.
-* At least one screenshot of your project running.
-* A 30 second or longer video of your project running.  To create the video you
-  can use http://www.microsoft.com/expression/products/Encoder4_Overview.aspx 
-* A performance evaluation (described in detail below).
-
--------------------------------------------------------------------------------
-PERFORMANCE EVALUATION
--------------------------------------------------------------------------------
-The performance evaluation is where you will investigate how to make your CUDA
-programs more efficient using the skills you've learned in class. You must have
-performed at least one experiment on your code to investigate the positive or
-negative effects on performance. 
-
-We encourage you to get creative with your tweaks. Consider places in your code
-that could be considered bottlenecks and try to improve them. 
-
-Each student should provide no more than a one page summary of their
-optimizations along with tables and or graphs to visually explain any
-performance differences.
-
--------------------------------------------------------------------------------
-THIRD PARTY CODE POLICY
--------------------------------------------------------------------------------
-* Use of any third-party code must be approved by asking on Piazza.  If it is approved, all students are welcome to use it.  Generally, we approve use of third-party code that is not a core part of the project.  For example, for the ray tracer, we would approve using a third-party library for loading models, but would not approve copying and pasting a CUDA function for doing refraction.
-* Third-party code must be credited in README.md.
-* Using third-party code without its approval, including using another student's code, is an academic integrity violation, and will result in you receiving an F for the semester.
-
--------------------------------------------------------------------------------
-SELF-GRADING
--------------------------------------------------------------------------------
-* On the submission date, email your grade, on a scale of 0 to 100, to Liam, harmoli+cis565@seas.upenn.edu, with a one paragraph explanation.  Be concise and realistic.  Recall that we reserve 30 points as a sanity check to adjust your grade.  Your actual grade will be (0.7 * your grade) + (0.3 * our grade).  We hope to only use this in extreme cases when your grade does not realistically reflect your work - it is either too high or too low.  In most cases, we plan to give you the exact grade you suggest.
-* Projects are not weighted evenly, e.g., Project 0 doesn't count as much as the path tracer.  We will determine the weighting at the end of the semester based on the size of each project.
-
----
-SUBMISSION
----
-As with the previous project, you should fork this project and work inside of
-your fork. Upon completion, commit your finished project back to your fork, and
-make a pull request to the master repository.  You should include a README.md
-file in the root directory detailing the following
-
-* A brief description of the project and specific features you implemented
-* At least one screenshot of your project running.
-* A link to a video of your raytracer running.
-* Instructions for building and running your project if they differ from the
-  base code.
-* A performance writeup as detailed above.
-* A list of all third-party code used.
-* This Readme file edited as described above in the README section.
+CIS 565 project 04 : CUDA software rasterizer
+===================
 
+![alt tag](https://raw.githubusercontent.com/drerucha/Project4-Rasterizer/master/renders/cow_flat_colored.jpg)
+
+## INTRODUCTION
+
+This project is an implementation of a simplified, CUDA-based rasterized graphics pipeline, similar to OpenGL's pipeline. I implemented vertex shading, primitive assembly, perspective transformation, rasterization, and fragment shading, and wrote the resulting fragments to a framebuffer for display.
+
+With the exception of the lighting calculations in my fragment shader, no other raycasting/raytracing is used in this project to generate graphics. Additionally, with the exception of drawing pixels to the screen after all stages of my rasterization pipeline have completed execution, OpenGL is not used anywhere else in this project. The purpose of this project is to see how a rasterization pipeline can generate graphics without the use of raycasting or OpenGL.
+
+## BASECODE
+
+The basecode provided to me included an OBJ loader and most of the I/O and bookkeeping code. The basecode also included a few helper functions for things like computing Barycentric coordinates and AABB bounding boxes for triangles. I implemented the core rasterization pipeline.
+
+## VERTEX SHADING
+
+My vertex shader is responsible for transforming vertices from object-space into screen-space. This transformation requires three matrices to complete. First, a model matrix is used to transform vertices from object-space to world-space. Second, a view matrix is used to transform vertices from world-space to camera-space. Finally, a projection matrix is used to transform vertices from camera-space to clip-space.
+
+The wrapper function that calls my vertex shader kernel computes the model, view, and projection matrices, multiplies them together, and then passes the composite matrix on to my vertex shader. (I used glm's built-in lookAt() and perspective() functions to compute my view and projection matrices.) My vertex shader then multiplies every vertex with this passed-in composite matrix to convert the vertex from object-space to clip-space.
+
+Once in clip space, a perspective transformation is performed by dividing the x-, y-, and z-components of the vertex by the w-component (the transformed vertex is homogeneous, and thus has four components instead of three). This division puts the vertex in NDC-space (normalized device coordinate). This NDC vertex has range [-1, 1]. Next, I remap the x- and y-value ranges from [-1, 1] to [0, 1] by adding 1 and dividing by 2. I want x and y in the [0, 1] range to facilitate conversion to pixels (window coordinates). This conversion is performed by multiplying x by the rendered image's width, and y by the rendered image's height.
+
+I keep the world-space vertices around for lighting and depth computations later in the pipeline, so I store my newly computed screen-space vertices to a second vertex buffer object.
+
+## PRIMITIVE ASSEMBLY
+
+My primitive assembly kernel is responsible for creating triangle primitives from the passed-in index, vertex, color, and normal buffers. Each triangle maintains its world-space vertex positions, screen-space vertex positions, vertex colors, vertex normals, and a flag that marks whether or not the triangle will be rendered in the rasterization stage.
+
+Indices in the index buffer are extracted based on the index of the current triangle primitive. These extracted indices are then used to index into the vertex, color, and normal buffers to create the three points that comprise a correct triangle. This pipeline stage is quite simple, and a quick glance at my code will elucidate things far better than a paragraph in a README can, so I encourage you to look at my code.
+
+Inside the primitive assembly stage, I also perform a simple computation to check if the triangle is facing toward or away from the camera. The result of this computation sets the visibility flag in the triangle which is used in the rasterization stage during backface culling. My method for backface culling is described in greater detail below in the section labeled "Baclface culling".
+
+## RASTERIZATION
+
+The rasterization stage is responsible for determining which fragments (a pixel-sized piece of a triangle primitive) are visible to the camera. This process is completed using a per-primitive scanline conversion algorithm.
+
+First, my method checks the visibility flag of the current triangle (set in the primitive assembly stage). If it is false, then no further processing is done for that triangle.
+
+My method then computes an AABB bounding box for the current triangle in screen-space. I iterate through all the pixels inside this bounding box and compute the Barycentric coordinates for the current pixel against the triangle in screen-space. If negative Barycentric coordinates are detected, then the pixel is outside the bounds of the triangle, and the fragment can be discarded.
+
+If the Barycentric coordinates are not negative, then the depth of the current fragment in world-space is compared against the fragment depth already stored in the depth buffer at the current pixel location. Here, it is probably important to mention that a fragment is basically a pixel before it becomes a pixel. So, each fragment stored in the depth buffer maps to one pixel in the eventual output image.
+
+If the current fragment is determined to be closer to the camera than the fragment already stored in the depth buffer, then the current fragment replaces the fragment in the depth buffer. The world-space position, color, and normal for the new fragment are computed by using the Barycentric coordinates computed in screen-space to interpolate over the triangle in world-space.
+
+This method is parallelized per primitive rather than per scanline, so each thread processes all fragments for a single triangle and updates the depth buffer accordingly.
+
+## FRAGMENT SHADING
+
+My pipeline applies a simple diffuse Lambertian shading to each fragment stored in my depth buffer. Each fragment knows its world-space position, color, and normal. Those three pieces of information, along with a light position and light intensity, are all that are needed to compute a diffuse lighting coefficient and perform Lambertian shading.
+
+Below, you can see the results of adding light contribution to the rendered image using Lambertian shading. Without any lighting, the image lacks all depth.
+
+![alt tag](https://raw.githubusercontent.com/drerucha/Project4-Rasterizer/master/renders/cow_flat_with_aa.jpg)
+
+![alt tag](https://raw.githubusercontent.com/drerucha/Project4-Rasterizer/master/renders/cow_diffuse_with_aa.jpg)
+
+## BARYCENTRIC COLOR INTERPOLATION
+
+![alt tag](https://raw.githubusercontent.com/drerucha/Project4-Rasterizer/master/renders/tri_gray_smaller.jpg) ![alt tag](https://raw.githubusercontent.com/drerucha/Project4-Rasterizer/master/renders/tri_colored_smaller.jpg)
+
+## BACKFACE CULLING
+
+Backface culling is a process where triangles facing away from the camera are not rasterized. For closed, 3D objects, culling backfaces can result in "throwing out" as much as 50% of triangles. As a result, for complex scenes, backface culling can result in a significant performance boost.
+
+My backface culling method is very simple. The direction a triangle is facing is determined by the order of that triangle's vertices. If the order of vertices is counter-clockwise, then the triangle is facing toward the camera. If the order of vertices is clockwise, then the triangle is facing away from the camera. I compute the order of vertices by taking the cross product of the two triangle vectors that originate from the first triangle vertex. So, if the triangle has vertices p1, p2, and p3, I compute ( p2 - p1 ) X ( p3 - p1 ), where 'X' is the cross product. Once I have the result of this cross product, I leverage the right-hand rule to determine vertex ordering.
+
+To perform the right-hand rule, using your right hand, with an open hand, line your fingers up with the first vector, and curl your fingers toward the second vector. Note the direction of your thumb. If your thumb is pointing toward you, then the z-component of the vector cross product result will be positive. If your thumb is pointing away from you, then the z-component of the vector cross product result will be negative. After performing the cross product, I check this z-value to determine triangle visibility.
+
+An performance analysis comparing frame rates with and without backface culling is included below in the section named "Performance analysis".
+
+## ANTI-ALIASING AS A POST-PROCESS
+
+I implemented a simple post-process anti-aliasing scheme. After the fragment shader stage (after lighting computations), I detected edge pixels by computing the color distances between neighboring pixels. If the difference in colors between adjacent pixels exceeded a predefined threshold, then that pixel was marked as an edge. Then, for all edge pixels, I applied a simple uniform blurring by averaging the edge pixel with its eight neighbors.
+
+In the first image below, the red pixels indicate edge pixels that are to be blurred. In the next images, you can see a closeup of the blurring results. The first image does not use anti-aliasing. Anti-aliasing has been applied to the second image.
+
+This was an interesting exercise, and it was very simple to implement and understand, but I am not impressed with the results, and suspect there are improved alternative anti-aliasing methods I should explore. An performance analysis comparing frame rates with and without anti-aliasing is included below in the section named "Performance analysis".
+
+![alt tag](https://raw.githubusercontent.com/drerucha/Project4-Rasterizer/master/renders/cow_diffuse_outlined.jpg)
+
+![alt tag](https://raw.githubusercontent.com/drerucha/Project4-Rasterizer/master/renders/cow_diffuse_no_aa_zoomed_02.jpg) ![alt tag](https://raw.githubusercontent.com/drerucha/Project4-Rasterizer/master/renders/cow_diffuse_with_aa_zoomed_02.jpg)
+
+## VIDEO DEMO
+
+[Video demo.](https://vimeo.com/110144028)
+
+## PERFORMANCE ANALYSIS
+
+This first chart visualizes the impact backface culling has on performance when rendering the cow object. Without backface culling, my rasterization pipeline runs at 38 frames-per-second. With backface culling turned on, performance improves to 42 frames-per-second. As the scene gets more complicated, I suspect the positive performance impact due to backface culling will increase.
+
+![alt tag](https://raw.githubusercontent.com/drerucha/Project4-Rasterizer/master/renders/rasterizer_bar_01.png)
+
+This second chart visualizes the impact anti-aliasing has on performance when rendering the cow object. Without anti-aliasing, my rasterization pipeline runs at 42 frames-per-second. With anti-aliasing turned on, performance decreases to 36 frames-per-second. This is a pretty steep performance hit for only a slight improvement in visual fidelity.
+
+![alt tag](https://raw.githubusercontent.com/drerucha/Project4-Rasterizer/master/renders/rasterizer_bar_02.png)
+
+This third and final chart visualizes the approximate percentage of execution time required for each stage of my pipeline per rasterization cycle. Each cycle takes approximately 20 milliseconds to complete. As can be seen in the chart, the anti-aliasing stage takes the most time to complete, which corroborates the information presented in the chart above.
+
+It's interesting to note the differences in execution time between the vertex shader and primitive assembly stage on one hand and the rasterization stage and fragment shader on the other hand.
+
+The vertex shader and primitive assembly stage perform computations on the vertices in the scene. Since these two stages operate on the same inputs (vertices), it makes sense that their execution times would be similar. Alternatively, the rasterization stage and fragment shader operate on the fragments in the scene as opposed to the vertices. Again, since these two stages operate on the same inputs (fragments), it makes sense that their execution times would be similar.
+
+The different inputs between the two pairs of pipeline stages (vertices vs. fragments) explains why the execution times for these two pairs differs so drastically. If the resolution of the output image were decreased, and the complexity of the geometry in the scene were increased, I suspect the relationship in execution time between these two pairs would have the opposite relationship. (More vertices would presumably increase the execution times of the vertex shader and primitive assembly stage. Fewer fragments would presumably decrease the execution times of the rasterization stage and fragment shader.)
+
+![alt tag](https://raw.githubusercontent.com/drerucha/Project4-Rasterizer/master/renders/rasterizer_pie_chart.png)
+
+## ROOM FOR IMPROVEMENT
+
+Currently, I have a bug in my rasterization stage when computing fragment depths, so occluded geometry is sometimes rendered in front of the geometry occluding it. My immediate next steps for this project include locating and correcting that error. After that, I would like to interpolate normal values across triangle faces to give the illusion of smoothed geometry even in low polygon objects. After that, I think it would be interesting to try a per-scanline parallelization scheme in place of my per-primitive parallelization scheme. There is a lot of room for performance improvements in my rasterization stage that I think are worth exploring.
+
+## SPECIAL THANKS
+
+I want to give a quick shout-out to Patrick Cozzi who led the fall 2014 CIS 565 course at Penn and Harmony Li who was the TA for the same course. Thanks guys!
\ No newline at end of file
diff --git a/objs/tri.obj b/objs/tri.obj
index fb38e35..7794f67 100644
--- a/objs/tri.obj
+++ b/objs/tri.obj
@@ -1,7 +1,10 @@
+# Geometric vertices.
 v 0 0 0
 v 0.1 0 0
 v 0 0.1 0
 
+# Vertex normals.
 vn 0 0 1
 
+# Face with vertex normals.
 f 1//1 2//1 3//1
\ No newline at end of file
diff --git a/renders/cow_diffuse_colored.jpg b/renders/cow_diffuse_colored.jpg
new file mode 100644
index 0000000..2c301dc
Binary files /dev/null and b/renders/cow_diffuse_colored.jpg differ
diff --git a/renders/cow_diffuse_no_aa.jpg b/renders/cow_diffuse_no_aa.jpg
new file mode 100644
index 0000000..e66b6a1
Binary files /dev/null and b/renders/cow_diffuse_no_aa.jpg differ
diff --git a/renders/cow_diffuse_no_aa_zoomed.jpg b/renders/cow_diffuse_no_aa_zoomed.jpg
new file mode 100644
index 0000000..bf60701
Binary files /dev/null and b/renders/cow_diffuse_no_aa_zoomed.jpg differ
diff --git a/renders/cow_diffuse_no_aa_zoomed_02.jpg b/renders/cow_diffuse_no_aa_zoomed_02.jpg
new file mode 100644
index 0000000..f325619
Binary files /dev/null and b/renders/cow_diffuse_no_aa_zoomed_02.jpg differ
diff --git a/renders/cow_diffuse_outlined.jpg b/renders/cow_diffuse_outlined.jpg
new file mode 100644
index 0000000..acc87e7
Binary files /dev/null and b/renders/cow_diffuse_outlined.jpg differ
diff --git a/renders/cow_diffuse_with_aa.jpg b/renders/cow_diffuse_with_aa.jpg
new file mode 100644
index 0000000..8903c51
Binary files /dev/null and b/renders/cow_diffuse_with_aa.jpg differ
diff --git a/renders/cow_diffuse_with_aa_zoomed.jpg b/renders/cow_diffuse_with_aa_zoomed.jpg
new file mode 100644
index 0000000..5089ef5
Binary files /dev/null and b/renders/cow_diffuse_with_aa_zoomed.jpg differ
diff --git a/renders/cow_diffuse_with_aa_zoomed_02.jpg b/renders/cow_diffuse_with_aa_zoomed_02.jpg
new file mode 100644
index 0000000..990fbcb
Binary files /dev/null and b/renders/cow_diffuse_with_aa_zoomed_02.jpg differ
diff --git a/renders/cow_flat_colored.jpg b/renders/cow_flat_colored.jpg
new file mode 100644
index 0000000..7cba235
Binary files /dev/null and b/renders/cow_flat_colored.jpg differ
diff --git a/renders/cow_flat_no_aa.jpg b/renders/cow_flat_no_aa.jpg
new file mode 100644
index 0000000..bdc41bc
Binary files /dev/null and b/renders/cow_flat_no_aa.jpg differ
diff --git a/renders/cow_flat_outlined.jpg b/renders/cow_flat_outlined.jpg
new file mode 100644
index 0000000..06dcf17
Binary files /dev/null and b/renders/cow_flat_outlined.jpg differ
diff --git a/renders/cow_flat_with_aa.jpg b/renders/cow_flat_with_aa.jpg
new file mode 100644
index 0000000..996dfe9
Binary files /dev/null and b/renders/cow_flat_with_aa.jpg differ
diff --git a/renders/rasterizer_bar_01.png b/renders/rasterizer_bar_01.png
new file mode 100644
index 0000000..82bd587
Binary files /dev/null and b/renders/rasterizer_bar_01.png differ
diff --git a/renders/rasterizer_bar_02.png b/renders/rasterizer_bar_02.png
new file mode 100644
index 0000000..7d73c6e
Binary files /dev/null and b/renders/rasterizer_bar_02.png differ
diff --git a/renders/rasterizer_pie_chart.png b/renders/rasterizer_pie_chart.png
new file mode 100644
index 0000000..f817d17
Binary files /dev/null and b/renders/rasterizer_pie_chart.png differ
diff --git a/renders/tri_colored.jpg b/renders/tri_colored.jpg
new file mode 100644
index 0000000..8bd942a
Binary files /dev/null and b/renders/tri_colored.jpg differ
diff --git a/renders/tri_colored_smaller.jpg b/renders/tri_colored_smaller.jpg
new file mode 100644
index 0000000..d61277e
Binary files /dev/null and b/renders/tri_colored_smaller.jpg differ
diff --git a/renders/tri_gray.jpg b/renders/tri_gray.jpg
new file mode 100644
index 0000000..ed8d169
Binary files /dev/null and b/renders/tri_gray.jpg differ
diff --git a/renders/tri_gray_no_aa.jpg b/renders/tri_gray_no_aa.jpg
new file mode 100644
index 0000000..237884f
Binary files /dev/null and b/renders/tri_gray_no_aa.jpg differ
diff --git a/renders/tri_gray_smaller.jpg b/renders/tri_gray_smaller.jpg
new file mode 100644
index 0000000..06f9477
Binary files /dev/null and b/renders/tri_gray_smaller.jpg differ
diff --git a/renders/video_demo.mp4 b/renders/video_demo.mp4
new file mode 100644
index 0000000..876e47e
Binary files /dev/null and b/renders/video_demo.mp4 differ
diff --git a/src/SimpleTimer.cpp b/src/SimpleTimer.cpp
new file mode 100644
index 0000000..03b07ae
--- /dev/null
+++ b/src/SimpleTimer.cpp
@@ -0,0 +1,44 @@
+#include "SimpleTimer.h"
+
+
+SimpleTimer::SimpleTimer()
+{
+}
+
+
+SimpleTimer::~SimpleTimer()
+{
+}
+
+
+void SimpleTimer::start()
+{
+	start_time = GetTimeMs64();
+}
+
+
+float SimpleTimer::stop()
+{
+	__int64 end_time = GetTimeMs64();
+	return ( float )( end_time - start_time );
+}
+
+
+__int64 SimpleTimer::GetTimeMs64()
+{
+	/* Windows */
+	FILETIME ft;
+	LARGE_INTEGER li;
+
+	/* Get the amount of 100 nano seconds intervals elapsed since January 1, 1601 (UTC) and copy it
+	* to a LARGE_INTEGER structure. */
+	GetSystemTimeAsFileTime(&ft);
+	li.LowPart = ft.dwLowDateTime;
+	li.HighPart = ft.dwHighDateTime;
+
+	__int64 ret = li.QuadPart;
+	ret -= 116444736000000000LL; /* Convert from file time to UNIX epoch time. */
+	ret /= 10000; /* From 100 nano seconds (10^-7) to 1 millisecond (10^-3) intervals */
+
+	return ret;
+}
\ No newline at end of file
diff --git a/src/SimpleTimer.h b/src/SimpleTimer.h
new file mode 100644
index 0000000..a2a51b0
--- /dev/null
+++ b/src/SimpleTimer.h
@@ -0,0 +1,23 @@
+#pragma once
+
+#ifndef _SIMPLE_TIMER
+#define _SIMPLE_TIMER
+
+#include <Windows.h>
+
+class SimpleTimer
+{
+public:
+	SimpleTimer( void );
+	~SimpleTimer( void );
+
+	void start( void );
+	float stop( void );
+
+private:
+	__int64 GetTimeMs64( void );
+
+	__int64 start_time;
+};
+
+#endif
\ No newline at end of file
diff --git a/src/main.cpp b/src/main.cpp
index 13d8e67..21f2f68 100644
--- a/src/main.cpp
+++ b/src/main.cpp
@@ -7,39 +7,48 @@
 //-------------MAIN--------------
 //-------------------------------
 
-int main(int argc, char** argv){
-
-  bool loadedScene = false;
-  for(int i=1; i<argc; i++){
-    string header; string data;
-    istringstream liness(argv[i]);
-    getline(liness, header, '='); getline(liness, data, '=');
-    if(strcmp(header.c_str(), "mesh")==0){
-      //renderScene = new scene(data);
-      mesh = new obj();
-      objLoader* loader = new objLoader(data, mesh);
-      mesh->buildVBOs();
-      delete loader;
-      loadedScene = true;
-    }
-  }
-
-  if(!loadedScene){
-    cout << "Usage: mesh=[obj file]" << endl;
-    return 0;
-  }
-
-  frame = 0;
-  seconds = time (NULL);
-  fpstracker = 0;
-
-  // Launch CUDA/GL
-  if (init(argc, argv)) {
-    // GLFW main loop
-    mainLoop();
-  }
-
-  return 0;
+int main( int argc, char **argv )
+{
+	bool loadedScene = false;
+	for(int i=1; i<argc; i++){
+	string header; string data;
+	istringstream liness(argv[i]);
+	getline(liness, header, '='); getline(liness, data, '=');
+	if(strcmp(header.c_str(), "mesh")==0){
+		//renderScene = new scene(data);
+		mesh = new obj();
+		objLoader* loader = new objLoader(data, mesh);
+		mesh->buildVBOs();
+		delete loader;
+		loadedScene = true;
+	}
+	}
+
+	if(!loadedScene){
+	cout << "Usage: mesh=[obj file]" << endl;
+	return 0;
+	}
+
+	frame = 0;
+	seconds = time (NULL);
+	fpstracker = 0;
+
+	// Hardcoded camera definition.
+	camera.position = glm::vec3( 0.0f, 0.0f, 4.0f );
+	camera.target = glm::vec3( 0.0f, 0.0f, 0.0f );
+	camera.up = glm::vec3( 0.0f, -1.0f, 0.0f );
+	camera.fov_y = 25.0f;
+	camera.resolution = glm::vec2( width, height );
+	camera.near_clip = 0.01f;
+	camera.far_clip = 1000.0f;
+
+	// Launch CUDA/GL
+	if (init(argc, argv)) {
+		// GLFW main loop
+		mainLoop();
+	}
+
+	return 0;
 }
 
 void mainLoop() {
@@ -87,14 +96,24 @@ void runCuda(){
   float newcbo[] = {0.0, 1.0, 0.0, 
                     0.0, 0.0, 1.0, 
                     1.0, 0.0, 0.0};
+
   cbo = newcbo;
   cbosize = 9;
 
   ibo = mesh->getIBO();
   ibosize = mesh->getIBOsize();
 
+  nbo = mesh->getNBO();
+  nbosize = mesh->getNBOsize();
+
   cudaGLMapBufferObject((void**)&dptr, pbo);
-  cudaRasterizeCore(dptr, glm::vec2(width, height), frame, vbo, vbosize, cbo, cbosize, ibo, ibosize);
+  cudaRasterizeCore( dptr,
+					 frame,
+					 vbo, vbosize,
+					 cbo, cbosize,
+					 ibo, ibosize,
+					 nbo, nbosize,
+					 camera );
   cudaGLUnmapBufferObject(pbo);
 
   vbo = NULL;
diff --git a/src/main.h b/src/main.h
index 8999110..1cf400a 100644
--- a/src/main.h
+++ b/src/main.h
@@ -19,15 +19,22 @@
 #include <string>
 #include <time.h>
 
-
 #include "rasterizeKernels.h"
 #include "utilities.h"
+#include "sceneStructs.h"
 
 using namespace std;
 
+//-------------------------------
+//----------CAMERA STUFF---------
+//-------------------------------
+
+simpleCamera camera;
+
 //-------------------------------
 //------------GL STUFF-----------
 //-------------------------------
+
 int frame;
 int fpstracker;
 double seconds;
@@ -43,13 +50,16 @@ GLFWwindow *window;
 
 obj* mesh;
 
-float* vbo;
+float *vbo;
 int vbosize;
-float* cbo;
+float *cbo;
 int cbosize;
-int* ibo;
+int *ibo;
 int ibosize;
 
+float *nbo;
+int nbosize;
+
 //-------------------------------
 //----------CUDA STUFF-----------
 //-------------------------------
diff --git a/src/rasterizeKernels.cu b/src/rasterizeKernels.cu
index 10b0000..7e69384 100644
--- a/src/rasterizeKernels.cu
+++ b/src/rasterizeKernels.cu
@@ -7,13 +7,19 @@
 #include <thrust/random.h>
 #include "rasterizeKernels.h"
 #include "rasterizeTools.h"
-
-glm::vec3* framebuffer;
-fragment* depthbuffer;
-float* device_vbo;
-float* device_cbo;
-int* device_ibo;
+#include "SimpleTimer.h"
+
+glm::vec3 *framebuffer;
+fragment *depthbuffer;
+float *device_vbo;
+float *device_cbo;
+int *device_ibo;
+float *device_nbo;
 triangle* primitives;
+float *device_vbo_window_coords;
+int *device_lock_buffer;
+
+const float EMPTY_BUFFER_DEPTH = 10000.0f;
 
 void checkCUDAError(const char *msg) {
   cudaError_t err = cudaGetLastError();
@@ -94,6 +100,17 @@ __global__ void clearDepthBuffer(glm::vec2 resolution, fragment* buffer, fragmen
     }
 }
 
+__global__
+void clearLockBuffer( glm::vec2 resolution, int *lock_buffer )
+{
+	int x = ( blockIdx.x * blockDim.x ) + threadIdx.x;
+	int y = ( blockIdx.y * blockDim.y ) + threadIdx.y;
+	int index = x + ( y * resolution.x );
+	if( x <= resolution.x && y <= resolution.y ) {
+		lock_buffer[index] = 0;
+	}
+}
+
 //Kernel that writes the image to the OpenGL PBO directly. 
 __global__ void sendImageToPBO(uchar4* PBOpos, glm::vec2 resolution, glm::vec3* image){
   
@@ -128,131 +145,536 @@ __global__ void sendImageToPBO(uchar4* PBOpos, glm::vec2 resolution, glm::vec3*
   }
 }
 
-//TODO: Implement a vertex shader
-__global__ void vertexShadeKernel(float* vbo, int vbosize){
-  int index = (blockIdx.x * blockDim.x) + threadIdx.x;
-  if(index<vbosize/3){
-  }
+
+/*********** DANNY'S PRIMARY CONTRIBUTION - START ***********/
+
+// Convert vertices from object-space coordinates to window coordinates.
+__global__
+void vertexShadeKernel( float *vbo,
+						int vbosize,
+						glm::mat4 mvp_matrix,
+						glm::vec2 resolution,
+						float *vbo_window_coords )
+{
+	int index = ( blockIdx.x * blockDim.x ) + threadIdx.x;
+
+	// Divide by 3 because each vertex has 3 components (x, y, and z).
+	if ( index < vbosize / 3 ) {
+		int vbo_index = index * 3;
+
+		// Create point to transform.
+		glm::vec4 v( vbo[vbo_index + 0], vbo[vbo_index + 1], vbo[vbo_index + 2], 1.0f );
+
+		// Transform point from object-space to clip-space by multiplying by the composite model, view, projection matrices.
+		glm::vec4 vt = mvp_matrix * v;
+
+		// Transform point to NDC-space by dividing x-, y-, and z-components by w-component (perspective division).
+		// [-1, 1].
+		glm::vec3 v_ndc( vt.x / vt.w, vt.y / vt.w, vt.z / vt.w );
+
+		// Transform x and y range from [-1, 1] to [0, 1].
+		glm::vec2 v_remapped( ( v_ndc.x + 1.0f ) / 2.0f, ( v_ndc.y + 1.0f ) / 2.0f );
+
+		// Transform x- and y-coordinates to window-space.
+		glm::vec2 v_window( v_remapped.x * resolution.x, v_remapped.y * resolution.y );
+
+		// Save transformed vertices.
+		vbo_window_coords[vbo_index + 0] = v_window.x;
+		vbo_window_coords[vbo_index + 1] = v_window.y;
+		vbo_window_coords[vbo_index + 2] = v_ndc.z;
+	}
 }
 
-//TODO: Implement primative assembly
-__global__ void primitiveAssemblyKernel(float* vbo, int vbosize, float* cbo, int cbosize, int* ibo, int ibosize, triangle* primitives){
-  int index = (blockIdx.x * blockDim.x) + threadIdx.x;
-  int primitivesCount = ibosize/3;
-  if(index<primitivesCount){
-  }
+
+template<typename T>
+__host__
+__device__
+void simpleSwap( T &f1, T &f2 )
+{
+	T tmp = f1;
+	f1 = f2;
+	f2 = tmp;
 }
 
-//TODO: Implement a rasterization method, such as scanline.
-__global__ void rasterizationKernel(triangle* primitives, int primitivesCount, fragment* depthbuffer, glm::vec2 resolution){
-  int index = (blockIdx.x * blockDim.x) + threadIdx.x;
-  if(index<primitivesCount){
-  }
+
+// Construct primitives from vertices.
+__global__
+void primitiveAssemblyKernel( float *vbo, int vbosize,
+							  float *cbo, int cbosize,
+							  int *ibo, int ibosize,
+							  float *nbo, int nbosize,
+							  float *vbo_window_coords,
+							  triangle *primitives )
+{
+	int index = ( blockIdx.x * blockDim.x ) + threadIdx.x;
+	int primitivesCount = ibosize / 3;
+	if ( index < primitivesCount ) {
+		// Get indices of triangle vertices.
+		int ibo_index = index * 3;
+		int i0 = ibo[ibo_index + 0];
+		int i1 = ibo[ibo_index + 1];
+		int i2 = ibo[ibo_index + 2];
+
+		// Get positions of triangle vertices.
+		int v0_index = i0 * 3;
+		int v1_index = i1 * 3;
+		int v2_index = i2 * 3;
+
+		// Get screen-space positions of triangle vertices.
+		glm::vec3 ssp0( vbo_window_coords[v0_index + 0], vbo_window_coords[v0_index + 1], vbo_window_coords[v0_index + 2] );
+		glm::vec3 ssp1( vbo_window_coords[v1_index + 0], vbo_window_coords[v1_index + 1], vbo_window_coords[v1_index + 2] );
+		glm::vec3 ssp2( vbo_window_coords[v2_index + 0], vbo_window_coords[v2_index + 1], vbo_window_coords[v2_index + 2] );
+
+		// Check if triangle is visible.
+		glm::vec3 backface_check = glm::cross( ssp1 - ssp0, ssp2 - ssp0 );
+		if ( backface_check.z < 0.0f ) {
+			triangle tri;
+			tri.is_visible = false;
+			primitives[index] = tri;
+			return;
+		}
+
+		// Get positions of triangle vertices.
+		glm::vec3 p0( vbo[v0_index + 0], vbo[v0_index + 1], vbo[v0_index + 2] );
+		glm::vec3 p1( vbo[v1_index + 0], vbo[v1_index + 1], vbo[v1_index + 2] );
+		glm::vec3 p2( vbo[v2_index + 0], vbo[v2_index + 1], vbo[v2_index + 2] );
+
+		// Get colors of triangle vertices.
+		int c0_index = ( i0 % 3 ) * 3;
+		int c1_index = ( i1 % 3 ) * 3;
+		int c2_index = ( i2 % 3 ) * 3;
+		glm::vec3 c0( cbo[c0_index + 0], cbo[c0_index + 1], cbo[c0_index + 2] );
+		glm::vec3 c1( cbo[c1_index + 0], cbo[c1_index + 1], cbo[c1_index + 2] );
+		glm::vec3 c2( cbo[c2_index + 0], cbo[c2_index + 1], cbo[c2_index + 2] );
+
+		// Get normals of triangle vertices.
+		glm::vec3 n0( nbo[v0_index + 0], nbo[v0_index + 1], nbo[v0_index + 2] );
+		glm::vec3 n1( nbo[v1_index + 0], nbo[v1_index + 1], nbo[v1_index + 2] );
+		glm::vec3 n2( nbo[v2_index + 0], nbo[v2_index + 1], nbo[v2_index + 2] );
+
+		// Set triangle.
+		primitives[index] = triangle( p0, p1, p2,
+									  ssp0, ssp1, ssp2,
+									  c0, c1, c2,
+									  n0, n1, n2 );
+	}
 }
 
-//TODO: Implement a fragment shader
-__global__ void fragmentShadeKernel(fragment* depthbuffer, glm::vec2 resolution){
-  int x = (blockIdx.x * blockDim.x) + threadIdx.x;
-  int y = (blockIdx.y * blockDim.y) + threadIdx.y;
-  int index = x + (y * resolution.x);
-  if(x<=resolution.x && y<=resolution.y){
-  }
+
+// Scanline rasterization per triangle.
+// See http://graphics.stanford.edu/courses/cs248-08/scan/scan1.html for a similar, but slightly different rasterization method.
+__global__
+void rasterizationKernel( triangle *primitives,
+						  int primitivesCount,
+						  fragment *depthbuffer,
+						  glm::vec2 resolution,
+						  int *lock_buffer )
+{
+	int index = ( blockIdx.x * blockDim.x ) + threadIdx.x;
+	if ( index < primitivesCount ) {
+		triangle tri = primitives[index];
+
+		// Only rasterize current triangle if triangle is visible (determined in primitive assembly stage).
+		if ( !tri.is_visible ) {
+			return;
+		}
+
+	    glm::vec3 aabb_min;
+	    glm::vec3 aabb_max;
+	    getAABBForTriangle( tri.ssp0, tri.ssp1, tri.ssp2, aabb_min, aabb_max );
+
+		// TODO: Clip AABB boxes outside render resolution.
+
+		for ( int y = ceil( aabb_min.y ); y < ceil( aabb_max.y ); ++y ) {
+			for ( int x = ceil( aabb_min.x ); x < ceil( aabb_max.x ); ++x ) {
+
+				// Compute Barycentric coordinates of current fragment in screen-space triangle.
+				glm::vec3 barycentric_coordinates = calculateBarycentricCoordinate( tri.ssp0, tri.ssp1, tri.ssp2, glm::vec2( x, y ) );
+
+				// (x, y) point is outside triangle.
+				if ( barycentric_coordinates.x < 0.0f || barycentric_coordinates.y < 0.0f || barycentric_coordinates.z < 0.0f ) {
+					continue;
+				}
+
+				float current_z = getZAtCoordinate( barycentric_coordinates, tri.p0, tri.p1, tri.p2 );
+
+				fragment buffer_fragment = getFromDepthbuffer( x, y, depthbuffer, resolution );
+				float buffer_z = buffer_fragment.position.z;
+
+				// Update depth buffer atomically.
+				if ( current_z < buffer_z ) {
+					int current_index = ( y * resolution.x ) + x;
+					bool is_waiting_to_update = true;
+					while ( is_waiting_to_update ) {
+						if ( atomicExch( &lock_buffer[current_index], 1 ) == 0 ) {
+							fragment f;
+							//f.color = ( tri.c0 * barycentric_coordinates.x ) + ( tri.c1 * barycentric_coordinates.y ) + ( tri.c2 * barycentric_coordinates.z );
+							f.color = glm::vec3( 0.5f, 0.5f, 0.5f );
+							f.normal = glm::normalize( ( tri.n0 * barycentric_coordinates.x ) + ( tri.n1 * barycentric_coordinates.y ) + ( tri.n2 * barycentric_coordinates.z ) );					
+							f.position = ( tri.p0 * barycentric_coordinates.x ) + ( tri.p1 * barycentric_coordinates.y ) + ( tri.p2 * barycentric_coordinates.z );
+							writeToDepthbuffer( x, y, f, depthbuffer, resolution );
+
+							// Release lock.
+							atomicExch( &lock_buffer[current_index], 0 );
+							is_waiting_to_update = false;
+						}
+					}
+				}
+			}
+		}
+	}
 }
 
-//Writes fragment colors to the framebuffer
-__global__ void render(glm::vec2 resolution, fragment* depthbuffer, glm::vec3* framebuffer){
 
-  int x = (blockIdx.x * blockDim.x) + threadIdx.x;
-  int y = (blockIdx.y * blockDim.y) + threadIdx.y;
-  int index = x + (y * resolution.x);
+// Compute light interaction with fragments.
+// Write fragment colors to frame buffer.
+// Diffuse Lambertian shading.
+__global__
+void fragmentShadeKernel( fragment *depthbuffer,
+						  glm::vec2 resolution )
+{
+	int x = ( blockIdx.x * blockDim.x ) + threadIdx.x;
+	int y = ( blockIdx.y * blockDim.y ) + threadIdx.y;
+	int index = x + ( y * resolution.x );
+	if ( x <= resolution.x && y <= resolution.y ) {
+		fragment f = depthbuffer[index];
+
+		glm::vec3 light_pos_1( -10.0f, 0.0f, 10.0f );
+		float light_intensity_1 = 2.0f;
+		glm::vec3 light_1_contribution = max( glm::dot( f.normal, glm::normalize( light_pos_1 - f.position )), 0.0f ) * depthbuffer[index].color * light_intensity_1;
+
+		glm::vec3 light_pos_2( 10.0f, 0.0f, -10.0f );
+		float light_intensity_2 = 2.0f;
+		glm::vec3 light_2_contribution = max( glm::dot( f.normal, glm::normalize( light_pos_2 - f.position )), 0.0f ) * depthbuffer[index].color * light_intensity_2;
+		
+		depthbuffer[index].color = light_1_contribution + light_2_contribution;
+
+		//depthbuffer[index].color = max( glm::dot( f.normal, glm::normalize( light_pos - f.position )), 0.0f ) * depthbuffer[index].color * light_intensity;
+	}
+}
 
-  if(x<=resolution.x && y<=resolution.y){
-    framebuffer[index] = depthbuffer[index].color;
-  }
+
+__host__
+__device__
+float computeDistanceBetweenTwoColors( glm::vec3 p1, glm::vec3 p2 )
+{
+	return sqrt( ( p2.x - p1.x ) * ( p2.x - p1.x ) + ( p2.y - p1.y ) * ( p2.y - p1.y ) + ( p2.z - p1.z ) * ( p2.z - p1.z ) );
 }
 
-// Wrapper for the __global__ call that sets up the kernel calls and does a ton of memory management
-void cudaRasterizeCore(uchar4* PBOpos, glm::vec2 resolution, float frame, float* vbo, int vbosize, float* cbo, int cbosize, int* ibo, int ibosize){
 
-  // set up crucial magic
-  int tileSize = 8;
-  dim3 threadsPerBlock(tileSize, tileSize);
-  dim3 fullBlocksPerGrid((int)ceil(float(resolution.x)/float(tileSize)), (int)ceil(float(resolution.y)/float(tileSize)));
+__host__
+__device__
+bool shouldBlurPixel( int x, int y,
+					  fragment *depthbuffer,
+					  glm::vec2 resolution )
+{
+	if ( x <= resolution.x && y <= resolution.y ) {
+		const float threshold = 0.25f;
+
+		glm::vec3 p1 = depthbuffer[x + ( y * ( int )resolution.x )].color;
+		int i, j;
+
+		// Left.
+		i = x - 1;
+		j = y;
+		if ( i > 0 && i <= resolution.x && j > 0 && j <= resolution.y ) {
+			glm::vec3 p2 = depthbuffer[i + ( j * ( int )resolution.x )].color;
+			if ( computeDistanceBetweenTwoColors( p1, p2 ) > threshold ) {
+				return true;
+			}
+		}
+
+		// Top.
+		i = x;
+		j = y - 1;
+		if ( i > 0 && i <= resolution.x && j > 0 && j <= resolution.y ) {
+			glm::vec3 p2 = depthbuffer[i + ( j * ( int )resolution.x )].color;
+			if ( computeDistanceBetweenTwoColors( p1, p2 ) > threshold ) {
+				return true;
+			}
+		}
+
+		// Right.
+		i = x + 1;
+		j = y;
+		if ( i > 0 && i <= resolution.x && j > 0 && j <= resolution.y ) {
+			glm::vec3 p2 = depthbuffer[i + ( j * ( int )resolution.x )].color;
+			if ( computeDistanceBetweenTwoColors( p1, p2 ) > threshold ) {
+				return true;
+			}
+		}
+
+		// Bottom.
+		i = x;
+		j = y + 1;
+		if ( i > 0 && i <= resolution.x && j > 0 && j <= resolution.y ) {
+			glm::vec3 p2 = depthbuffer[i + ( j * ( int )resolution.x )].color;
+			if ( computeDistanceBetweenTwoColors( p1, p2 ) > threshold ) {
+				return true;
+			}
+		}
+	}
+
+	return false;
+}
 
-  //set up framebuffer
-  framebuffer = NULL;
-  cudaMalloc((void**)&framebuffer, (int)resolution.x*(int)resolution.y*sizeof(glm::vec3));
-  
-  //set up depthbuffer
-  depthbuffer = NULL;
-  cudaMalloc((void**)&depthbuffer, (int)resolution.x*(int)resolution.y*sizeof(fragment));
+__global__
+void antiAliasingPostProcess( fragment *depthbuffer,
+							  glm::vec2 resolution )
+{
+	int x = ( blockIdx.x * blockDim.x ) + threadIdx.x;
+	int y = ( blockIdx.y * blockDim.y ) + threadIdx.y;
+	int index = x + ( y * resolution.x );
+	if ( x <= resolution.x && y <= resolution.y ) {
+		if ( shouldBlurPixel( x, y, depthbuffer, resolution ) ) {
+			int pixel_count = 0;
+			glm::vec3 sum( 0.0f, 0.0f, 0.0f );
+			for ( int i = x - 1; i < x + 1; ++i ) {
+				for ( int j = y - 1; j < y + 1; ++j ) {
+					if ( i > 0 && i <= resolution.x && j > 0 && j <= resolution.y ) {
+						sum += depthbuffer[i + ( j * ( int )resolution.x )].color;
+						++pixel_count;
+					}
+				}
+			}
+			depthbuffer[index].color = glm::vec3( sum.x / pixel_count, sum.y / pixel_count, sum.z / pixel_count );
+			//depthbuffer[index].color = glm::vec3( 1.0f, 0.0f, 0.0f );
+		}
+	}
+}
+
+/*********** DANNY'S PRIMARY CONTRIBUTION - END ***********/
+
+// Write fragment colors to the framebuffer.
+__global__
+void render( glm::vec2 resolution, fragment *depthbuffer, glm::vec3 *framebuffer )
+{
+	int x = ( blockIdx.x * blockDim.x ) + threadIdx.x;
+	int y = ( blockIdx.y * blockDim.y ) + threadIdx.y;
+	int index = x + ( y * resolution.x );
+
+	if ( x <= resolution.x && y <= resolution.y ) {
+		framebuffer[index] = depthbuffer[index].color;
+	}
+}
 
-  //kernel launches to black out accumulated/unaccumlated pixel buffers and clear our scattering states
-  clearImage<<<fullBlocksPerGrid, threadsPerBlock>>>(resolution, framebuffer, glm::vec3(0,0,0));
+// Wrapper for the __global__ call that sets up the kernel calls and does a ton of memory management
+void cudaRasterizeCore( uchar4 *PBOpos,
+						float frame,
+						float *vbo, int vbosize,
+						float *cbo, int cbosize,
+						int *ibo, int ibosize,
+						float *nbo, int nbosize,
+						simpleCamera camera )
+{
+	SimpleTimer timer;
+	float time_elapsed;
+
+	// set up crucial magic
+	int tileSize = 8;
+	dim3 threadsPerBlock( tileSize,
+						  tileSize );
+	dim3 fullBlocksPerGrid( ( int )ceil( ( float )camera.resolution.x / ( float )tileSize ),
+							( int )ceil( ( float )camera.resolution.y / ( float )tileSize ) );
+
+	// set up framebuffer
+	framebuffer = NULL;
+	cudaMalloc( ( void** )&framebuffer,
+				( int )camera.resolution.x * ( int )camera.resolution.y * sizeof( glm::vec3 ) );
+  
+	// set up depthbuffer
+	depthbuffer = NULL;
+	cudaMalloc( ( void** )&depthbuffer,
+				( int )camera.resolution.x * ( int )camera.resolution.y * sizeof( fragment ) );
+
+	// kernel launches to black out accumulated/unaccumlated pixel buffers and clear our scattering states
+	clearImage<<< fullBlocksPerGrid, threadsPerBlock >>>( camera.resolution,
+														  framebuffer,
+														  glm::vec3( 0.0f, 0.0f, 0.0f ) );
   
-  fragment frag;
-  frag.color = glm::vec3(0,0,0);
-  frag.normal = glm::vec3(0,0,0);
-  frag.position = glm::vec3(0,0,-10000);
-  clearDepthBuffer<<<fullBlocksPerGrid, threadsPerBlock>>>(resolution, depthbuffer,frag);
-
-  //------------------------------
-  //memory stuff
-  //------------------------------
-  primitives = NULL;
-  cudaMalloc((void**)&primitives, (ibosize/3)*sizeof(triangle));
-
-  device_ibo = NULL;
-  cudaMalloc((void**)&device_ibo, ibosize*sizeof(int));
-  cudaMemcpy( device_ibo, ibo, ibosize*sizeof(int), cudaMemcpyHostToDevice);
-
-  device_vbo = NULL;
-  cudaMalloc((void**)&device_vbo, vbosize*sizeof(float));
-  cudaMemcpy( device_vbo, vbo, vbosize*sizeof(float), cudaMemcpyHostToDevice);
-
-  device_cbo = NULL;
-  cudaMalloc((void**)&device_cbo, cbosize*sizeof(float));
-  cudaMemcpy( device_cbo, cbo, cbosize*sizeof(float), cudaMemcpyHostToDevice);
-
-  tileSize = 32;
-  int primitiveBlocks = ceil(((float)vbosize/3)/((float)tileSize));
-
-  //------------------------------
-  //vertex shader
-  //------------------------------
-  vertexShadeKernel<<<primitiveBlocks, tileSize>>>(device_vbo, vbosize);
-
-  cudaDeviceSynchronize();
-  //------------------------------
-  //primitive assembly
-  //------------------------------
-  primitiveBlocks = ceil(((float)ibosize/3)/((float)tileSize));
-  primitiveAssemblyKernel<<<primitiveBlocks, tileSize>>>(device_vbo, vbosize, device_cbo, cbosize, device_ibo, ibosize, primitives);
-
-  cudaDeviceSynchronize();
-  //------------------------------
-  //rasterization
-  //------------------------------
-  rasterizationKernel<<<primitiveBlocks, tileSize>>>(primitives, ibosize/3, depthbuffer, resolution);
-
-  cudaDeviceSynchronize();
-  //------------------------------
-  //fragment shader
-  //------------------------------
-  fragmentShadeKernel<<<fullBlocksPerGrid, threadsPerBlock>>>(depthbuffer, resolution);
-
-  cudaDeviceSynchronize();
-  //------------------------------
-  //write fragments to framebuffer
-  //------------------------------
-  render<<<fullBlocksPerGrid, threadsPerBlock>>>(resolution, depthbuffer, framebuffer);
-  sendImageToPBO<<<fullBlocksPerGrid, threadsPerBlock>>>(PBOpos, resolution, framebuffer);
-
-  cudaDeviceSynchronize();
-
-  kernelCleanup();
-
-  checkCUDAError("Kernel failed!");
+	fragment frag;
+	frag.color = glm::vec3( 0.0f, 0.0f, 0.0f );
+	frag.normal = glm::vec3( 0.0f, 0.0f, 0.0f );
+	frag.position = glm::vec3( 0.0f, 0.0f, EMPTY_BUFFER_DEPTH );
+	clearDepthBuffer<<< fullBlocksPerGrid, threadsPerBlock >>>( camera.resolution,
+																depthbuffer,
+																frag );
+
+	//------------------------------
+	// memory stuff
+	//------------------------------
+	primitives = NULL;
+	cudaMalloc( ( void** )&primitives,
+				( ibosize / 3 ) * sizeof( triangle ) );
+
+	device_ibo = NULL;
+	cudaMalloc( ( void** )&device_ibo,
+				ibosize * sizeof( int ) );
+	cudaMemcpy( device_ibo,
+				ibo,
+				ibosize * sizeof( int ),
+				cudaMemcpyHostToDevice );
+
+	device_vbo = NULL;
+	cudaMalloc( ( void** )&device_vbo,
+				vbosize * sizeof( float ) );
+	cudaMemcpy( device_vbo,
+				vbo,
+				vbosize * sizeof( float ),
+				cudaMemcpyHostToDevice );
+
+	device_vbo_window_coords = NULL;
+	cudaMalloc( ( void** )&device_vbo_window_coords,
+				vbosize * sizeof( float ) );
+
+	device_lock_buffer = NULL;
+	cudaMalloc( ( void** )&device_lock_buffer,
+				( int )camera.resolution.x * ( int )camera.resolution.y * sizeof( int ) );
+
+	device_cbo = NULL;
+	cudaMalloc( ( void** )&device_cbo,
+				cbosize * sizeof( float ) );
+	cudaMemcpy( device_cbo,
+				cbo,
+				cbosize * sizeof( float ),
+				cudaMemcpyHostToDevice );
+
+	device_nbo = NULL;
+	cudaMalloc( ( void** )&device_nbo,
+				nbosize * sizeof( float ) );
+	cudaMemcpy( device_nbo,
+				nbo,
+				nbosize * sizeof( float ),
+				cudaMemcpyHostToDevice );
+
+	tileSize = 32;
+	int primitiveBlocks = ceil( ( ( float )vbosize / 3 ) / ( ( float )tileSize ) );
+
+	//------------------------------
+	// initialize lock buffer
+	//------------------------------
+
+	//timer.start();
+
+	clearLockBuffer<<< fullBlocksPerGrid, threadsPerBlock >>>( camera.resolution,
+															   device_lock_buffer );
+	cudaDeviceSynchronize();
+	
+	//time_elapsed = timer.stop();
+	//std::cout << "initialize lock buffer: " << time_elapsed << std::endl;
+
+	//------------------------------
+	// vertex shader
+	//------------------------------
+
+	//timer.start();
+
+	// Define model matrix.
+	// Transforms from object-space to world-space.
+	glm::mat4 model_matrix( 1.0f ); // Identity matrix.
+	//glm::mat4 model_matrix = glm::rotate( glm::mat4( 1.0f ), frame * 2, glm::vec3( 0.0f, 1.0f, 0.0f ));
+	
+	// Define view matrix.
+	// Transforms from world-space to camera-space.
+	glm::mat4 view_matrix = glm::lookAt( camera.position,
+										 camera.target,
+										 camera.up );
+
+	// Define projection matrix.
+	// Transforms from camera-space to clip-space.
+	glm::mat4 projection_matrix = glm::perspective( camera.fov_y,
+													camera.resolution.x / camera.resolution.y,
+													camera.near_clip,
+													camera.far_clip );
+
+	vertexShadeKernel<<< primitiveBlocks, tileSize >>>( device_vbo, vbosize,
+														projection_matrix * view_matrix * model_matrix,
+														camera.resolution,
+														device_vbo_window_coords );
+	cudaDeviceSynchronize();
+
+	//time_elapsed = timer.stop();
+	//std::cout << "vertex shader: " << time_elapsed << std::endl;
+
+	//------------------------------
+	// primitive assembly
+	//------------------------------
+
+	//timer.start();
+
+	primitiveBlocks = ceil( ( ( float )ibosize / 3 ) / ( ( float )tileSize ) );
+	primitiveAssemblyKernel<<< primitiveBlocks, tileSize >>>( device_vbo, vbosize,
+															  device_cbo, cbosize,
+															  device_ibo, ibosize,
+															  device_nbo, nbosize,
+															  device_vbo_window_coords,
+															  primitives );
+	cudaDeviceSynchronize();
+
+	//time_elapsed = timer.stop();
+	//std::cout << "primitive assembly: " << time_elapsed << std::endl;
+
+	//------------------------------
+	// rasterization
+	//------------------------------
+
+	//timer.start();
+
+	rasterizationKernel<<< primitiveBlocks, tileSize >>>( primitives,
+														  ibosize / 3,
+														  depthbuffer,
+														  camera.resolution,
+														  device_lock_buffer );
+	cudaDeviceSynchronize();
+
+	//time_elapsed = timer.stop();
+	//std::cout << "rasterization: " << time_elapsed << std::endl;
+
+	//------------------------------
+	// fragment shader
+	//------------------------------
+
+	//timer.start();
+
+	fragmentShadeKernel<<< fullBlocksPerGrid, threadsPerBlock >>>( depthbuffer,
+																   camera.resolution );
+	cudaDeviceSynchronize();
+
+	//time_elapsed = timer.stop();
+	//std::cout << "fragment shader: " << time_elapsed << std::endl;
+
+	//------------------------------
+	// anti-aliasing
+	//------------------------------
+
+	//timer.start();
+
+	antiAliasingPostProcess<<< fullBlocksPerGrid, threadsPerBlock >>>( depthbuffer,
+																	   camera.resolution );
+	cudaDeviceSynchronize();
+
+	//time_elapsed = timer.stop();
+	//std::cout << "anti-aliasing: " << time_elapsed << std::endl;
+
+	//------------------------------
+	// write fragments to framebuffer
+	//------------------------------
+
+	//timer.start();
+
+	render<<< fullBlocksPerGrid, threadsPerBlock >>>( camera.resolution,
+													  depthbuffer,
+													  framebuffer );
+	sendImageToPBO<<< fullBlocksPerGrid, threadsPerBlock >>>( PBOpos,
+															  camera.resolution,
+															  framebuffer );
+	cudaDeviceSynchronize();
+
+	//time_elapsed = timer.stop();
+	//std::cout << "write fragments to framebuffer: " << time_elapsed << std::endl;
+
+	kernelCleanup();
+	checkCUDAError("Kernel failed!");
 }
 
 void kernelCleanup(){
@@ -260,7 +682,9 @@ void kernelCleanup(){
   cudaFree( device_vbo );
   cudaFree( device_cbo );
   cudaFree( device_ibo );
+  cudaFree( device_nbo );
   cudaFree( framebuffer );
   cudaFree( depthbuffer );
-}
-
+  cudaFree( device_vbo_window_coords );
+  cudaFree( device_lock_buffer );
+}
\ No newline at end of file
diff --git a/src/rasterizeKernels.h b/src/rasterizeKernels.h
index 784be17..005cf56 100644
--- a/src/rasterizeKernels.h
+++ b/src/rasterizeKernels.h
@@ -9,8 +9,17 @@
 #include <cuda.h>
 #include <cmath>
 #include "glm/glm.hpp"
+#include "glm/gtc/matrix_transform.hpp"
+
+#include "sceneStructs.h"
 
 void kernelCleanup();
-void cudaRasterizeCore(uchar4* pos, glm::vec2 resolution, float frame, float* vbo, int vbosize, float* cbo, int cbosize, int* ibo, int ibosize);
+void cudaRasterizeCore( uchar4 *pos,
+						float frame,
+						float *vbo, int vbosize,
+						float *cbo, int cbosize,
+						int *ibo, int ibosize,
+						float *nbo, int nbosize,
+						simpleCamera camera );
 
 #endif //RASTERIZEKERNEL_H
diff --git a/src/rasterizeTools.h b/src/rasterizeTools.h
index e9b5dcc..6f5201c 100644
--- a/src/rasterizeTools.h
+++ b/src/rasterizeTools.h
@@ -10,12 +10,47 @@
 #include "cudaMat4.h"
 
 struct triangle {
-  glm::vec3 p0;
-  glm::vec3 p1;
-  glm::vec3 p2;
-  glm::vec3 c0;
-  glm::vec3 c1;
-  glm::vec3 c2;
+	// Default constructor.
+	__host__ __device__ triangle()
+	{
+		is_visible = true;
+	}
+
+	// Constructor.
+	__host__ __device__ triangle( glm::vec3 p0, glm::vec3 p1, glm::vec3 p2,
+								  glm::vec3 ssp0, glm::vec3 ssp1, glm::vec3 ssp2,
+								  glm::vec3 c0, glm::vec3 c1, glm::vec3 c2,
+								  glm::vec3 n0, glm::vec3 n1, glm::vec3 n2 ) :
+										p0( p0 ), p1( p1 ), p2( p2 ),
+										ssp0( ssp0 ), ssp1( ssp1 ), ssp2( ssp2 ),
+										c0( c0 ), c1( c1 ), c2( c2 ),
+										n0( n0 ), n1( n1 ), n2( n2 )
+	{
+		is_visible = true;
+	}
+
+	// Vertex positions.
+	glm::vec3 p0;
+	glm::vec3 p1;
+	glm::vec3 p2;
+
+	// Vertex positions in screen-space.
+	glm::vec3 ssp0;
+	glm::vec3 ssp1;
+	glm::vec3 ssp2;
+
+	// Vertex colors.
+	glm::vec3 c0;
+	glm::vec3 c1;
+	glm::vec3 c2;
+
+	// Vertex normals.
+	glm::vec3 n0;
+	glm::vec3 n1;
+	glm::vec3 n2;
+
+	// Should triangle be rasterized?
+	bool is_visible;
 };
 
 struct fragment{
@@ -34,13 +69,30 @@ __host__ __device__ glm::vec3 multiplyMV(cudaMat4 m, glm::vec4 v){
 }
 
 //LOOK: finds the axis aligned bounding box for a given triangle
-__host__ __device__ void getAABBForTriangle(triangle tri, glm::vec3& minpoint, glm::vec3& maxpoint){
-  minpoint = glm::vec3(min(min(tri.p0.x, tri.p1.x),tri.p2.x), 
-        min(min(tri.p0.y, tri.p1.y),tri.p2.y),
-        min(min(tri.p0.z, tri.p1.z),tri.p2.z));
-  maxpoint = glm::vec3(max(max(tri.p0.x, tri.p1.x),tri.p2.x), 
-        max(max(tri.p0.y, tri.p1.y),tri.p2.y),
-        max(max(tri.p0.z, tri.p1.z),tri.p2.z));
+//__host__ __device__ void getAABBForTriangle(triangle tri, glm::vec3& minpoint, glm::vec3& maxpoint){
+//  minpoint = glm::vec3(min(min(tri.p0.x, tri.p1.x),tri.p2.x), 
+//        min(min(tri.p0.y, tri.p1.y),tri.p2.y),
+//        min(min(tri.p0.z, tri.p1.z),tri.p2.z));
+//  maxpoint = glm::vec3(max(max(tri.p0.x, tri.p1.x),tri.p2.x), 
+//        max(max(tri.p0.y, tri.p1.y),tri.p2.y),
+//        max(max(tri.p0.z, tri.p1.z),tri.p2.z));
+//}
+
+__host__
+__device__
+void getAABBForTriangle( glm::vec3 tri_p0, glm::vec3 tri_p1, glm::vec3 tri_p2, glm::vec3& minpoint, glm::vec3& maxpoint )
+{
+	triangle tri;
+	tri.p0 = tri_p0;
+	tri.p1 = tri_p1;
+	tri.p2 = tri_p2;
+
+	minpoint = glm::vec3(min(min(tri.p0.x, tri.p1.x),tri.p2.x), 
+		min(min(tri.p0.y, tri.p1.y),tri.p2.y),
+		min(min(tri.p0.z, tri.p1.z),tri.p2.z));
+	maxpoint = glm::vec3(max(max(tri.p0.x, tri.p1.x),tri.p2.x), 
+		max(max(tri.p0.y, tri.p1.y),tri.p2.y),
+		max(max(tri.p0.z, tri.p1.z),tri.p2.z));
 }
 
 //LOOK: calculates the signed area of a given triangle
@@ -56,23 +108,50 @@ __host__ __device__ float calculateBarycentricCoordinateValue(glm::vec2 a, glm::
 }
 
 //LOOK: calculates barycentric coordinates
-__host__ __device__ glm::vec3 calculateBarycentricCoordinate(triangle tri, glm::vec2 point){
-  float beta  = calculateBarycentricCoordinateValue(glm::vec2(tri.p0.x,tri.p0.y), point, glm::vec2(tri.p2.x,tri.p2.y), tri);
-  float gamma = calculateBarycentricCoordinateValue(glm::vec2(tri.p0.x,tri.p0.y), glm::vec2(tri.p1.x,tri.p1.y), point, tri);
-  float alpha = 1.0-beta-gamma;
-  return glm::vec3(alpha,beta,gamma);
+//__host__ __device__ glm::vec3 calculateBarycentricCoordinate(triangle tri, glm::vec2 point){
+//  float beta  = calculateBarycentricCoordinateValue(glm::vec2(tri.p0.x,tri.p0.y), point, glm::vec2(tri.p2.x,tri.p2.y), tri);
+//  float gamma = calculateBarycentricCoordinateValue(glm::vec2(tri.p0.x,tri.p0.y), glm::vec2(tri.p1.x,tri.p1.y), point, tri);
+//  float alpha = 1.0-beta-gamma;
+//  return glm::vec3(alpha,beta,gamma);
+//}
+
+// Compute Barycentric coordiantes.
+__host__
+__device__
+glm::vec3 calculateBarycentricCoordinate( glm::vec3 tri_p0, glm::vec3 tri_p1, glm::vec3 tri_p2, glm::vec2 p )
+{
+	triangle tri;
+	tri.p0 = tri_p0;
+	tri.p1 = tri_p1;
+	tri.p2 = tri_p2;
+	float beta  = calculateBarycentricCoordinateValue( glm::vec2( tri.p0.x, tri.p0.y ), p, glm::vec2( tri.p2.x, tri.p2.y ), tri );
+	float gamma = calculateBarycentricCoordinateValue( glm::vec2( tri.p0.x, tri.p0.y ), glm::vec2( tri.p1.x, tri.p1.y ), p, tri );
+	float alpha = 1.0-beta-gamma;
+	return glm::vec3(alpha,beta,gamma);
 }
 
 //LOOK: checks if a barycentric coordinate is within the boundaries of a triangle
-__host__ __device__ bool isBarycentricCoordInBounds(glm::vec3 barycentricCoord){
+__host__ __device__ bool isBarycentricCoordInBounds( glm::vec3 barycentricCoord )
+{
    return barycentricCoord.x >= 0.0 && barycentricCoord.x <= 1.0 &&
           barycentricCoord.y >= 0.0 && barycentricCoord.y <= 1.0 &&
           barycentricCoord.z >= 0.0 && barycentricCoord.z <= 1.0;
 }
 
 //LOOK: for a given barycentric coordinate, return the corresponding z position on the triangle
-__host__ __device__ float getZAtCoordinate(glm::vec3 barycentricCoord, triangle tri){
-  return -(barycentricCoord.x*tri.p0.z + barycentricCoord.y*tri.p1.z + barycentricCoord.z*tri.p2.z);
+//__host__ __device__ float getZAtCoordinate(glm::vec3 barycentricCoord, triangle tri){
+//  return -(barycentricCoord.x*tri.p0.z + barycentricCoord.y*tri.p1.z + barycentricCoord.z*tri.p2.z);
+//}
+
+__host__
+__device__
+float getZAtCoordinate( glm::vec3 barycentricCoord, glm::vec3 tri_p0, glm::vec3 tri_p1, glm::vec3 tri_p2 )
+{
+	triangle tri;
+	tri.p0 = tri_p0;
+	tri.p1 = tri_p1;
+	tri.p2 = tri_p2;
+	return -( barycentricCoord.x * tri.p0.z + barycentricCoord.y * tri.p1.z + barycentricCoord.z * tri.p2.z );
 }
 
 #endif
\ No newline at end of file
diff --git a/src/sceneStructs.h b/src/sceneStructs.h
new file mode 100644
index 0000000..232eb3f
--- /dev/null
+++ b/src/sceneStructs.h
@@ -0,0 +1,35 @@
+#pragma once
+
+#ifndef _SCENE_STRUCTS
+#define _SCENE_STRUCTS
+
+#include "glm/glm.hpp"
+#include "cudaMat4.h"
+
+struct simpleCamera
+{
+	glm::vec3 position;
+	glm::vec3 target;
+	glm::vec3 up;
+	float fov_y;
+	glm::vec2 resolution;
+	float near_clip;
+	float far_clip;
+	//glm::vec3 translation;
+	//glm::vec3 rotation;
+	//cudaMat4 transform;
+};
+
+struct edge
+{
+	__host__ __device__ void setEdge( glm::vec3 vb, glm::vec3 vt, float y )
+	{
+		dxdy = ( vt.x - vb.x ) / ( vt.y - vb.y );
+		x = vb.x + ( y - vb.y ) * dxdy;
+	}
+
+	float dxdy;
+	float x;
+};
+
+#endif
\ No newline at end of file
diff --git a/windows/PROJ4_Rasterizer/PROJ4_Rasterizer/PROJ4_Rasterizer.vcxproj b/windows/PROJ4_Rasterizer/PROJ4_Rasterizer/PROJ4_Rasterizer.vcxproj
index f640485..3b2dd80 100644
--- a/windows/PROJ4_Rasterizer/PROJ4_Rasterizer/PROJ4_Rasterizer.vcxproj
+++ b/windows/PROJ4_Rasterizer/PROJ4_Rasterizer/PROJ4_Rasterizer.vcxproj
@@ -28,7 +28,7 @@
   </PropertyGroup>
   <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
   <ImportGroup Label="ExtensionSettings">
-    <Import Project="$(VCTargetsPath)\BuildCustomizations\CUDA 5.5.props" />
+    <Import Project="$(VCTargetsPath)\BuildCustomizations\CUDA 6.5.props" />
   </ImportGroup>
   <ImportGroup Label="PropertySheets" Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
     <Import Project="$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(UserRootDir)\Microsoft.Cpp.$(Platform).user.props')" Label="LocalAppDataPlatform" />
@@ -71,6 +71,8 @@
     <ClInclude Include="..\..\..\src\main.h" />
     <ClInclude Include="..\..\..\src\rasterizeKernels.h" />
     <ClInclude Include="..\..\..\src\rasterizeTools.h" />
+    <ClInclude Include="..\..\..\src\sceneStructs.h" />
+    <ClInclude Include="..\..\..\src\SimpleTimer.h" />
     <ClInclude Include="..\..\..\src\utilities.h" />
   </ItemGroup>
   <ItemGroup>
@@ -78,6 +80,7 @@
     <ClCompile Include="..\..\..\external\src\objUtil\obj.cpp" />
     <ClCompile Include="..\..\..\external\src\objUtil\objloader.cpp" />
     <ClCompile Include="..\..\..\src\main.cpp" />
+    <ClCompile Include="..\..\..\src\SimpleTimer.cpp" />
     <ClCompile Include="..\..\..\src\utilities.cpp" />
   </ItemGroup>
   <ItemGroup>
@@ -87,6 +90,6 @@
   </ItemGroup>
   <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
   <ImportGroup Label="ExtensionTargets">
-    <Import Project="$(VCTargetsPath)\BuildCustomizations\CUDA 5.5.targets" />
+    <Import Project="$(VCTargetsPath)\BuildCustomizations\CUDA 6.5.targets" />
   </ImportGroup>
 </Project>
\ No newline at end of file
diff --git a/windows/PROJ4_Rasterizer/PROJ4_Rasterizer/PROJ4_Rasterizer.vcxproj.filters b/windows/PROJ4_Rasterizer/PROJ4_Rasterizer/PROJ4_Rasterizer.vcxproj.filters
index 6a1d8cf..1134a14 100644
--- a/windows/PROJ4_Rasterizer/PROJ4_Rasterizer/PROJ4_Rasterizer.vcxproj.filters
+++ b/windows/PROJ4_Rasterizer/PROJ4_Rasterizer/PROJ4_Rasterizer.vcxproj.filters
@@ -30,6 +30,12 @@
     <ClInclude Include="..\..\..\src\utilities.h">
       <Filter>Header Files</Filter>
     </ClInclude>
+    <ClInclude Include="..\..\..\src\sceneStructs.h">
+      <Filter>Header Files</Filter>
+    </ClInclude>
+    <ClInclude Include="..\..\..\src\SimpleTimer.h">
+      <Filter>Header Files</Filter>
+    </ClInclude>
   </ItemGroup>
   <ItemGroup>
     <ClCompile Include="..\..\..\src\main.cpp">
@@ -47,6 +53,9 @@
     <ClCompile Include="..\..\..\external\src\objUtil\objloader.cpp">
       <Filter>Source Files</Filter>
     </ClCompile>
+    <ClCompile Include="..\..\..\src\SimpleTimer.cpp">
+      <Filter>Source Files</Filter>
+    </ClCompile>
   </ItemGroup>
   <ItemGroup>
     <CudaCompile Include="..\..\..\src\rasterizeKernels.cu">