README: pipeline; code: small cleanup

kainino0x · kainino0x · commit 17ea1fb4cba6 · 2015-09-27T15:57:42.000-04:00
diff --git a/README.md b/README.md
@@ -35,7 +35,7 @@ A rasterizer is **NOT**:
   (... unless you do some fancy raytraced effects in your fragment shader).
   This project will let you generate graphics WITHOUT the need for ray casting!
 * An OpenGL rendering engine. You shouldn't write any new OpenGL code - think
-  of your project as a reimplementation of a few of OpenGL's features.
+  of your project as a reimplementation of OpenGL's core pipeline.
 
 Finally, note that, while this base code is meant to serve as a strong starting
 point for a CUDA path tracer, you are not required to use it if you don't want
@@ -86,10 +86,10 @@ You will need to implement the following features/pipeline stages:
 * Vertex shading.
 * (Vertex shader) perspective transformation.
 * Primitive assembly with support for triangle VBOs/IBOs.
-* Rasterization through either a scanline or a tiled approach.
+* Rasterization: **either** a scanline or a tiled approach.
 * Fragment shading.
 * A depth buffer for storing and depth testing fragments.
-* Fragment to framebuffer writing (**with** atomics for race avoidance).
+* Fragment to depth buffer writing (**with** atomics for race avoidance).
 * (Fragment shader) simple lighting scheme, such as Lambert or Blinn-Phong.
 
 See below for more guidance.
@@ -128,50 +128,83 @@ For each extra feature, please provide the following analysis:
 * How might this feature be optimized beyond your current implementation?
 
 
-## Minimal Rasterization Pipeline
+## Rasterization Pipeline
 
 **INSTRUCTOR TODO**: update README to explain a minimal pipeline to see a
 triangle, e.g., no depth test, draw in NDC, etc.
 
-* Vertex shading.
+Possible pipelines are described below. Pseudo-type-signatures are given.
+Not all of the pseudocode arrays will necessarily actually exist in practice.
+
+### Minimal Pipeline
+
+This describes a minimal version *one possible* graphics pipeline, similar to
+modern hardware (DX/OpenGL). Yours need not match precisely.  To begin, try to
+write a minimal amount of code as described here. This will reduce the
+necessary time spent debugging.
+
+* Vertex shading: 
+  * `VertexIn[n] vs_input -> VertexOut[n] vs_output`
   * A minimal vertex shader will apply no transformations at all - it draws
     directly in normalized device coordinates (NDC).
 * Primitive assembly.
+  * `vertexOut[n] vs_output -> triangle[n/3] primitives`
   * Start by supporting ONLY triangles.
 * Rasterization.
+  * `triangle[n/3] primitives -> fragmentIn[m] fs_input`
   * Scanline: TODO
-    * Optimization: scissor around rasterized triangle
   * Tiled: TODO
 * Fragment shading.
-  * A test fragment shader can produce the same color for every fragment.
-  * Try displaying various debug views (normals, etc.)
+  * `fragmentIn[m] fs_input -> fragmentOut[m] fs_output`
+  * A super-simple test fragment shader: output same color for every fragment.
+    * Also try Tdisplaying various debug views (normals, etc.)
+* Fragments to depth buffer.
+  * `fragmentOut[m] -> fragmentOut[resolution]`
+  * Can really be done inside the fragment shader.
+  * Results in race conditions - don't bother to fix these until it works!
 * A depth buffer for storing and depth testing fragments.
+  * `fragmentOut[resolution] depthbuffer`
   * An array of `fragment` objects.
   * At the end of a frame, it should contain the fragments drawn to the screen.
 * Fragment to framebuffer writing.
-  * You will need to use atomics for race avoidance, to prevent different
-    primitives from overwriting each other in the wrong order.
-  * You can ignore this when starting! The race conditions will only cause
-    visual artifacts.
-  * TODO
+  * `fragmentOut[resolution] depthbuffer -> vec3[resolution] framebuffer`
+  * Simply copies the colors out of the depth buffer into the framebuffer
+    (to be displayed on the screen).
+
+### Better Pipeline
+
+INSTRUCTOR TODO
+
+* Rasterization.
+  * Scanline:
+    * Optimization: scissor around rasterized triangle
+
+* Fragments to depth buffer.
+  * `fragmentOut[m] -> fragmentOut[resolution]`
+  * Can really be done inside the fragment shader.
+    * This allows you to do depth tests before spending execution time in
+      complex fragment shader code.
+  * When writing to the depth buffer, you will need to use atomics for race
+    avoidance, to prevent different primitives from overwriting each other in
+    the wrong order.
 
 
 ## Base Code Tour
 
-**INSTRUCTOR TODO:** update according to any code changes. LOOK -> CHECKITOUT.
+**INSTRUCTOR TODO:** update according to any code changes.
 TODO: simple structs for every part of the pipeline, intended to be changed?
 (e.g. vertexPre, vertexPost, triangle = vertexPre[3], fragment).
 TODO: autoformat code
 TODO: pragma once
 TODO: doxygen
 
-You will be working primarily in two files: `rasterizeKernel.cu`, and
-`rasterizerTools.h`. Within these files, areas that you need to complete are
+You will be working primarily in two files: `rasterize.cu`, and
+`rasterizeTools.h`. Within these files, areas that you need to complete are
 marked with a `TODO` comment. Areas that are useful to and serve as hints for
 optional features are marked with `TODO (Optional)`. Functions that are useful
-for reference are marked with the comment `LOOK`.
+for reference are marked with the comment `CHECKITOUT`.
 
-* `src/rasterizeKernels.cu` contains the core rasterization pipeline. 
+* `src/rasterize.cu` contains the core rasterization pipeline. 
   * A suggested sequence of kernels exists in this file, but you may choose to
     alter the order of this sequence or merge entire kernels if you see fit.
     For example, if you decide that doing has benefits, you can choose to merge
diff --git a/src/checkCUDAError.h b/src/checkCUDAError.h
@@ -0,0 +1,23 @@
+#define ERRORCHECK 1
+
+#define FILENAME (strrchr(__FILE__, '/') ? strrchr(__FILE__, '/') + 1 : __FILE__)
+#define checkCUDAError(msg) checkCUDAErrorFn(msg, FILENAME, __LINE__)
+void checkCUDAErrorFn(const char *msg, const char *file, int line) {
+#if ERRORCHECK
+    cudaDeviceSynchronize();
+    cudaError_t err = cudaGetLastError();
+    if (cudaSuccess == err) {
+        return;
+    }
+
+    fprintf(stderr, "CUDA error");
+    if (file) {
+        fprintf(stderr, " (%s:%d)", file, line);
+    }
+    fprintf(stderr, ": %s: %s\n", msg, cudaGetErrorString(err));
+#  ifdef _WIN32
+    getchar();
+#  endif
+    exit(EXIT_FAILURE);
+#endif
+}
diff --git a/src/rasterize.cu b/src/rasterize.cu
@@ -1,11 +1,10 @@
-// CIS565 CUDA Rasterizer: A simple rasterization pipeline for Patrick Cozzi's CIS565: GPU Computing at the University of Pennsylvania
-// Written by Yining Karl Li, Copyright (c) 2012 University of Pennsylvania
+#include "rasterize.h"
 
-#include <stdio.h>
-#include <cuda.h>
 #include <cmath>
+#include <cstdio>
+#include <cuda.h>
 #include <thrust/random.h>
-#include "rasterizeKernels.h"
+#include "checkCUDAError.h"
 #include "rasterizeTools.h"
 
 glm::vec3* framebuffer;
@@ -15,15 +14,7 @@ float* device_cbo;
 int* device_ibo;
 triangle* primitives;
 
-void checkCUDAError(const char *msg) {
-  cudaError_t err = cudaGetLastError();
-  if( cudaSuccess != err) {
-    fprintf(stderr, "Cuda error: %s: %s.\n", msg, cudaGetErrorString( err) ); 
-    exit(EXIT_FAILURE); 
-  }
-} 
-
-//Handy dandy little hashing function that provides seeds for random number generation
+// Handy dandy little hashing function that provides seeds for random number generation
 __host__ __device__ unsigned int hash(unsigned int a){
     a = (a+0x7ed55d16) + (a<<12);
     a = (a^0xc761c23c) ^ (a>>19);
@@ -34,15 +25,15 @@ __host__ __device__ unsigned int hash(unsigned int a){
     return a;
 }
 
-//Writes a given fragment to a fragment buffer at a given location
+// Writes a given fragment to a fragment buffer at a given location
 __host__ __device__ void writeToDepthbuffer(int x, int y, fragment frag, fragment* depthbuffer, glm::vec2 resolution){
   if(x<resolution.x && y<resolution.y){
     int index = (y*resolution.x) + x;
     depthbuffer[index] = frag;
   }
 }
 
-//Reads a fragment from a given location in a fragment buffer
+// Reads a fragment from a given location in a fragment buffer
 __host__ __device__ fragment getFromDepthbuffer(int x, int y, fragment* depthbuffer, glm::vec2 resolution){
   if(x<resolution.x && y<resolution.y){
     int index = (y*resolution.x) + x;
@@ -53,15 +44,15 @@ __host__ __device__ fragment getFromDepthbuffer(int x, int y, fragment* depthbuf
   }
 }
 
-//Writes a given pixel to a pixel buffer at a given location
+// Writes a given pixel to a pixel buffer at a given location
 __host__ __device__ void writeToFramebuffer(int x, int y, glm::vec3 value, glm::vec3* framebuffer, glm::vec2 resolution){
   if(x<resolution.x && y<resolution.y){
     int index = (y*resolution.x) + x;
     framebuffer[index] = value;
   }
 }
 
-//Reads a pixel from a pixel buffer at a given location
+// Reads a pixel from a pixel buffer at a given location
 __host__ __device__ glm::vec3 getFromFramebuffer(int x, int y, glm::vec3* framebuffer, glm::vec2 resolution){
   if(x<resolution.x && y<resolution.y){
     int index = (y*resolution.x) + x;
@@ -71,7 +62,7 @@ __host__ __device__ glm::vec3 getFromFramebuffer(int x, int y, glm::vec3* frameb
   }
 }
 
-//Kernel that clears a given pixel buffer with a given color
+// Kernel that clears a given pixel buffer with a given color
 __global__ void clearImage(glm::vec2 resolution, glm::vec3* image, glm::vec3 color){
     int x = (blockIdx.x * blockDim.x) + threadIdx.x;
     int y = (blockIdx.y * blockDim.y) + threadIdx.y;
@@ -81,7 +72,7 @@ __global__ void clearImage(glm::vec2 resolution, glm::vec3* image, glm::vec3 col
     }
 }
 
-//Kernel that clears a given fragment buffer with a given fragment
+// Kernel that clears a given fragment buffer with a given fragment
 __global__ void clearDepthBuffer(glm::vec2 resolution, fragment* buffer, fragment frag){
     int x = (blockIdx.x * blockDim.x) + threadIdx.x;
     int y = (blockIdx.y * blockDim.y) + threadIdx.y;
@@ -94,7 +85,7 @@ __global__ void clearDepthBuffer(glm::vec2 resolution, fragment* buffer, fragmen
     }
 }
 
-//Kernel that writes the image to the OpenGL PBO directly. 
+// Kernel that writes the image to the OpenGL PBO directly. 
 __global__ void sendImageToPBO(uchar4* PBOpos, glm::vec2 resolution, glm::vec3* image){
   
   int x = (blockIdx.x * blockDim.x) + threadIdx.x;
@@ -128,29 +119,29 @@ __global__ void sendImageToPBO(uchar4* PBOpos, glm::vec2 resolution, glm::vec3*
   }
 }
 
-//TODO: Implement a vertex shader
+// TODO: Implement a vertex shader
 __global__ void vertexShadeKernel(float* vbo, int vbosize){
   int index = (blockIdx.x * blockDim.x) + threadIdx.x;
   if(index<vbosize/3){
   }
 }
 
-//TODO: Implement primative assembly
+// TODO: Implement primative assembly
 __global__ void primitiveAssemblyKernel(float* vbo, int vbosize, float* cbo, int cbosize, int* ibo, int ibosize, triangle* primitives){
   int index = (blockIdx.x * blockDim.x) + threadIdx.x;
   int primitivesCount = ibosize/3;
   if(index<primitivesCount){
   }
 }
 
-//TODO: Implement a rasterization method, such as scanline.
+// TODO: Implement a rasterization method, such as scanline.
 __global__ void rasterizationKernel(triangle* primitives, int primitivesCount, fragment* depthbuffer, glm::vec2 resolution){
   int index = (blockIdx.x * blockDim.x) + threadIdx.x;
   if(index<primitivesCount){
   }
 }
 
-//TODO: Implement a fragment shader
+// TODO: Implement a fragment shader
 __global__ void fragmentShadeKernel(fragment* depthbuffer, glm::vec2 resolution){
   int x = (blockIdx.x * blockDim.x) + threadIdx.x;
   int y = (blockIdx.y * blockDim.y) + threadIdx.y;
@@ -159,7 +150,7 @@ __global__ void fragmentShadeKernel(fragment* depthbuffer, glm::vec2 resolution)
   }
 }
 
-//Writes fragment colors to the framebuffer
+// Writes fragment colors to the framebuffer
 __global__ void render(glm::vec2 resolution, fragment* depthbuffer, glm::vec3* framebuffer){
 
   int x = (blockIdx.x * blockDim.x) + threadIdx.x;
@@ -179,15 +170,15 @@ void cudaRasterizeCore(uchar4* PBOpos, glm::vec2 resolution, float frame, float*
   dim3 threadsPerBlock(tileSize, tileSize);
   dim3 fullBlocksPerGrid((int)ceil(float(resolution.x)/float(tileSize)), (int)ceil(float(resolution.y)/float(tileSize)));
 
-  //set up framebuffer
+  // set up framebuffer
   framebuffer = NULL;
   cudaMalloc((void**)&framebuffer, (int)resolution.x*(int)resolution.y*sizeof(glm::vec3));
   
-  //set up depthbuffer
+  // set up depthbuffer
   depthbuffer = NULL;
   cudaMalloc((void**)&depthbuffer, (int)resolution.x*(int)resolution.y*sizeof(fragment));
 
-  //kernel launches to black out accumulated/unaccumlated pixel buffers and clear our scattering states
+  // kernel launches to black out accumulated/unaccumlated pixel buffers and clear our scattering states
   clearImage<<<fullBlocksPerGrid, threadsPerBlock>>>(resolution, framebuffer, glm::vec3(0,0,0));
   
   fragment frag;
@@ -196,9 +187,9 @@ void cudaRasterizeCore(uchar4* PBOpos, glm::vec2 resolution, float frame, float*
   frag.position = glm::vec3(0,0,-10000);
   clearDepthBuffer<<<fullBlocksPerGrid, threadsPerBlock>>>(resolution, depthbuffer,frag);
 
-  //------------------------------
-  //memory stuff
-  //------------------------------
+  // ------------------------------
+  // memory stuff
+  // ------------------------------
   primitives = NULL;
   cudaMalloc((void**)&primitives, (ibosize/3)*sizeof(triangle));
 
@@ -217,34 +208,34 @@ void cudaRasterizeCore(uchar4* PBOpos, glm::vec2 resolution, float frame, float*
   tileSize = 32;
   int primitiveBlocks = ceil(((float)vbosize/3)/((float)tileSize));
 
-  //------------------------------
-  //vertex shader
-  //------------------------------
+  // ------------------------------
+  // vertex shader
+  // ------------------------------
   vertexShadeKernel<<<primitiveBlocks, tileSize>>>(device_vbo, vbosize);
 
   cudaDeviceSynchronize();
-  //------------------------------
-  //primitive assembly
-  //------------------------------
+  // ------------------------------
+  // primitive assembly
+  // ------------------------------
   primitiveBlocks = ceil(((float)ibosize/3)/((float)tileSize));
   primitiveAssemblyKernel<<<primitiveBlocks, tileSize>>>(device_vbo, vbosize, device_cbo, cbosize, device_ibo, ibosize, primitives);
 
   cudaDeviceSynchronize();
-  //------------------------------
-  //rasterization
-  //------------------------------
+  // ------------------------------
+  // rasterization
+  // ------------------------------
   rasterizationKernel<<<primitiveBlocks, tileSize>>>(primitives, ibosize/3, depthbuffer, resolution);
 
   cudaDeviceSynchronize();
-  //------------------------------
-  //fragment shader
-  //------------------------------
+  // ------------------------------
+  // fragment shader
+  // ------------------------------
   fragmentShadeKernel<<<fullBlocksPerGrid, threadsPerBlock>>>(depthbuffer, resolution);
 
   cudaDeviceSynchronize();
-  //------------------------------
-  //write fragments to framebuffer
-  //------------------------------
+  // ------------------------------
+  // write fragments to framebuffer
+  // ------------------------------
   render<<<fullBlocksPerGrid, threadsPerBlock>>>(resolution, depthbuffer, framebuffer);
   sendImageToPBO<<<fullBlocksPerGrid, threadsPerBlock>>>(PBOpos, resolution, framebuffer);
 
diff --git a/src/rasterize.h b/src/rasterize.h
@@ -0,0 +1,6 @@
+#pragma once
+
+#include <glm/glm.hpp>
+
+void kernelCleanup();
+void cudaRasterizeCore(uchar4* pos, glm::vec2 resolution, float frame, float* vbo, int vbosize, float* cbo, int cbosize, int* ibo, int ibosize);
diff --git a/src/rasterizeKernels.h b/src/rasterizeKernels.h
diff --git a/src/rasterizeTools.h b/src/rasterizeTools.h