Skip to content

Project 3: Constance Wang #32

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 58 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
fdf4697
Update README.md
conswang Sep 21, 2022
f2a4cce
Merge branch 'CIS565-Fall-2022:main' into main
conswang Sep 21, 2022
5c9697f
keeps getting whiter
conswang Oct 20, 2022
f4792a5
fix really awful indexing mistake, move logic into scatterRay
conswang Oct 29, 2022
a22cd30
fix typo, working path tracer
conswang Oct 29, 2022
0bf53e5
some questions
conswang Oct 29, 2022
2f25776
add specular
conswang Oct 29, 2022
b0dbd37
add sort by material, not sure if working
conswang Oct 29, 2022
7a9a411
probably working first bounce caching
conswang Oct 29, 2022
a28eb17
update stb versions for compatability with tinyglft, add some test mo…
conswang Oct 30, 2022
7089e6a
fix namespace conflicts
conswang Oct 30, 2022
aade8e5
probably working binary buffer reading (values look suspicious)
conswang Oct 30, 2022
3f365e7
probably working mesh parsing
conswang Oct 30, 2022
296106d
probably working triangle meshs
conswang Oct 30, 2022
925c5f5
you can see a triangle now, but not the rest of the box for some reason
conswang Oct 31, 2022
159b9da
fix inside/outside test... add new scene for debugging. normals still…
conswang Oct 31, 2022
b5ed505
visible avocado
conswang Oct 31, 2022
ffe7136
test more models, maybe working scene graph traversal
conswang Oct 31, 2022
7dc2444
add image source data structure and loading
conswang Dec 27, 2022
c6a1acd
buggy code - can't use std::vector in cuda, need to find sln
conswang Dec 27, 2022
81bd902
buggy code #2 - can't memcpy into device pointer, need to refactor
conswang Dec 28, 2022
3fb4cb0
fix buffers but textures are wrong
conswang Dec 28, 2022
f47b0ca
add box with spaces for easier debugging
conswang Dec 28, 2022
c543132
working uvs with no interpolation
conswang Dec 28, 2022
363235d
working textures
conswang Dec 28, 2022
15057fb
add support for translation, rotation, and scale
conswang Dec 28, 2022
1135f2d
chess scene
conswang Dec 28, 2022
6cdc532
refactor triangle mesh intersection test
conswang Dec 29, 2022
562aeb3
add bvh preprocessing, code compiles
conswang Dec 29, 2022
f20d86c
add complete bvh construction, not tested but it compiles
conswang Dec 30, 2022
13ecbda
bvh constructed for 1 mesh
conswang Dec 30, 2022
ed1f597
buggy bvh intersection test
conswang Dec 30, 2022
e3b12bc
small changes
conswang Jan 1, 2023
acb9797
fix bug where materials without images have incorrect image id
conswang Jan 1, 2023
d4c1e8b
remove recursion to fix stack overflow, mesh is still inocrrect but n…
conswang Jan 1, 2023
7973768
correct mesh and fix indexing bugs for multiple bvh trees. Can now re…
conswang Jan 1, 2023
d785af7
larger room, have z buffer problem
conswang Jan 1, 2023
3df8b00
add normal maps which are not working
conswang Jan 1, 2023
691bb9e
add combined gltf and txt scene loading, fix transformations
conswang Jan 1, 2023
aa6d887
don't need to transform intersection point whoops
conswang Jan 1, 2023
0d5a758
cleanup comments, replace black with background_color val
conswang Jan 1, 2023
ec8befe
antialiasing
conswang Jan 1, 2023
41800b4
switch to uniform distrib, normal is too ridiculous to calculate
conswang Jan 1, 2023
99bb37a
make avocado cornell box more aesthetic, fix t value and intersesctio…
conswang Jan 1, 2023
43f592a
adjust avocado
conswang Jan 1, 2023
ca65df9
set up tangents
conswang Jan 2, 2023
7e67753
should multiply tangent and normal by invtranspose not transform
conswang Jan 2, 2023
fcbb339
correct backwards normals from intersection inside/outside check
conswang Jan 2, 2023
49fead8
working metallicness, tested on motorcycle
conswang Jan 2, 2023
5d14ec2
add readme and renders, clean up macros
conswang Jan 2, 2023
e400294
add AA to README
conswang Jan 2, 2023
643dc33
metallic shader desc
conswang Jan 2, 2023
56df53c
add normal mapping to read me
conswang Jan 2, 2023
6aeb740
bvh analysis
conswang Jan 3, 2023
677f8b7
sorting
conswang Jan 3, 2023
43e85ee
caching
conswang Jan 3, 2023
6ef3f2b
update debug
conswang Jan 3, 2023
d92376c
update macro list
conswang Jan 3, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
168 changes: 163 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,169 @@ CUDA Path Tracer

**University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 3**

* (TODO) YOUR NAME HERE
* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab)
![](img/cover-image.png)
`motorcycle.txt, motorcycle.gltf`: 5000 samples, depth 8, 960 x 720 px

### (TODO: Your README)
Constance Wang
* [LinkedIn](https://www.linkedin.com/in/conswang/)

*DO NOT* leave the README to the last minute! It is a crucial part of the
project, and we will not be able to grade you without a good README.
Tested on AORUS 15P XD laptop with specs:
- Windows 11 22000.856
- 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz 2.30 GHz
- NVIDIA GeForce RTX 3070 Laptop GPU

### Features
This is a Monte-Carlo pathtracer with GPU-accelerated intersection tests, shading, and path culling in CUDA.

- Core features
- Diffuse and perfect specular shaders
- Performance optimizations
- Sorting rays by material
- Path termination using stream compaction
- Cache first bounce intersections
- Additional features
- Gltf 2.0 loading & rendering
- Texture mapping & bump mapping
- Metallic shader
- Bounding volume hierarchy
- Stochastic sampled anti-aliasing

### Usage
The base code has been modified to take two arguments. The first argument is a filepath to the original txt scene format, and the second, optional argument is a filepath to a gltf file.

```
./pathtracer.exe [motorcycle.txt] [motorcycle.gltf]
```

#### Dependencies
- Clone and add [tinygltf.h](https://github.com/syoyo/tinygltf) to external includes

### Feature Toggles
All macros are defined in `sceneStructs.h`.
- Performance
- `SORT_BY_MATERIALS`
- `BVH`: toggle bounding volume hierarchy
- `CACHE_FIRST_BOUNCE`
- Visual
- `ANTI_ALIAS`
- `ROUGHNESS_METALLIC`: render metallic shader
- Debugging
- `SHOW_NORMALS`: render normals as colour
- `SHOW_METALLIC`: render metallicness as colour
- `DEBUG_GLTF_TEXTURES`: render Lambert instead of gltf texture
- `MEASURE_PERF`: print runtime

### Core

The dragon and ball are perfect specular surfaces; the walls and box are diffuse surfaces.

![](img/specular-dragon.png)
`avocado_cornell.txt, low_res_dragon.gltf`: 2000 samples, depth 8, 800 x 800 px

Stream compaction is used to filter out rays that don't hit anything, at each depth.

### Sorting Rays by Material
I added a toggle to sort the path segments by material before shading them. In theory, this should improve performance because more threads that are shading the same material will be in the same block. Since they are shading the same material, there won't be any divergence, which will increase the amount of parallellism, whereas if there were random materials in each block, there would be more divergence, causing the threads to delay each other. The effects should be more noticeable when there are more materials (up until each ray hits a different material; in that case there's no reason to sort...).

![](img/sorting-perf.png)

However, it seems that the overhead of shading is too costly - at 5 materials for the cornell box, sorting is twice as slow as not sorting. At 26 materials in the motorcycle scene, sorting is marginally better than not sorting.

I even tried using a scene with as many materials as I could find (40 materials in `many_materials.gltf` + 7 materials in `avocado_cornell.txt` = 47 materials), and the runtime is very similar with and without sorting. However, many of these materials don't diverge in terms of how they are actually shaded - eg. they are different materials but simply index into the image buffer at a different place.

The shading stage may not be complex enough for sorting to be helpful. In addition, this optimization would be effective if done in constant time without an extra kernel sort, which would probably be wavefront pathtracing.

### First Bounce Caching
The first ray cast from the camera to the scene will always be the same, so it makes sense to cache it (unless the rays should vary due to depth of field or anti-aliasing).

However, I found that the performance is almost exactly the same, with or without caching, even at depth = 1.

![](img/caching-perf.png)

Looking at the calculations in `generateRayFromCamera`, they are all pretty lightweight, so it seems that this kernel has about the same runtime as a cudaMemcpy of the cache into the path segments buffer.

### GLTF
Most arbitrary gltf files (.gltf file + separate textures) exported from Blender can be loaded and rendered without errors. The base code's file parser is used to load the lights and camera while tinygltf is used to load meshes.

- Scene graph traversal is supported
- Both matrix and translation/rotation/scale attributes are supported to describe local transformations of nodes
- See `motorcycle.gltf` for an example of a complex scene with many nodes in a tree-like structure
- Copies position, normal, tangent, UV, and index buffers into an interleaved array on the GPU
- Texture loading

### Texture Mapping
Tinygltf loads the images into arrays, which are then copied over to the GPU in one contiguous images array (dev_imageBuffers). The dimensions of the image are stored separately and used to index into the images buffer. UV's are interpolated and converted to pixel space to sample the image buffer for the colour.

#### Normal Map
Gltf normal textures must be in tangent space. They are transformed into world space using a TBN matrix. Intersection surface normals and tangents are interpolated from the vertex normal and tangent buffers from the file. The resulting normals can be debugged as colours by setting `SHOW_NORMALS` to 1.

| Surface normals | Normal texture map | Combined: TBN * normal texture map |
| ----------------| ----------------- | -----------------|
|![](img/surface-normals.png) | ![](img/normal-texture.png) | ![](img/resulting-normals.png) |

With diffuse shaders, a normal map is hard to notice because the light bounces in a random hemisphere around the normal anyway; the resulting bounce won't be too different from before. That's why I also tested the normal map with the metallic shader.

| Metallic shading with surface normals | Metallic shading with normal map |
| ----------------| ----------------- |
| ![](img/metal-no-normal-map.png) | ![](img/metal-with-normal-texture.png) |

`metal.txt, metal.gltf`: 2000 samples, depth 8, 800 x 800 px

The render with the normal map has more highlights and the highlights line up better with the texture, making it look more like a rusty piece of metal than the render using interpolated surface normals.

### Metallic Shader
I partially implemented gltf's microfacet (PBR metallic/roughness) workflow by adding a metallic shader. The metallic value from 0 to 1 comes from either the gltf material's `pbrMetallicRoughness.metallicFactor` or is read from a texture, where the blue channel is the metallic value. The metallic value is used to interpolate the diffuse and metallic shaders. The metallic shader is simplified to a specular shader multiplied by the tint of the base color.

![](img/metallic_box.png)
`box.txt, Box With Spaces.gltf`: 1000 samples, depth 8
z
According to the box's texture, only the GLTF letters are metallic, so there is no highlight on the black part. By setting `#define SHOW_METALLIC 1`, we can debug the metallic value. Here is what it looks like for the motorcycle scene. Brighter blue means more metallic. On the motorcycle, the metallic factor is defined for the entire material, whereas on the vending machine, the metallic factor comes from a texture.

![](img/metallic-debug.png)

### Bounding Volume Hierarchy

As we get into hundreds of thousands of triangles, testing ray intersections quickly becomes a bottle-neck. I implemented a bounding volume hierarchy (BVH) data structure and sorted vertices by their bounding boxes. The constructor in `bvh.cpp` creates one BVH for each primitive `Geom`, which is necessary because primitives can have different transformations. One challenge was to serialize the tree structure of the BVH, since it's unpleasant to copy data structures with pointers from the CPU to GPU.

The resulting code was adapted from [this site](https://jacco.ompf2.com/2022/04/13/how-to-build-a-bvh-part-1-basics/), except the BVH tree traversal is implemented on the GPU not using recursion. Recursively traversing the BVH on the GPU quickly causes a stack overflow, even on the Avocado mesh, which has only 83 total nodes, which shows how expensive local memory is. Instead, I traversed the GPU using a fixed-size stack to keep track of the next BVH nodes to search.

![](img/bvh-perf.png)

Bounding volumes are split by their longest axis, which worked better for the motorcycle scene than others like the table.

The tree structures can give some insight. Each scene can have multiple meshes, and therefore multiple BVHs with different structures. If there are multiple BVHs then, the values in each column correspond to the worst case, aka. the maximum out of all the BVH trees generated for the scene.

| Gltf Scene | Number of BVHs | Total Number of nodes | Depth | Maximum leaf size |
|-----|-----|-----|----| --- |
| Avocado | 1 | 83 | 10 | 191 |
| Low Res Dragon | 1 | 1731 | 18 | 1534 |
| Motorcycle | 98 | 1777 | 21 | 1242 |
| Table | 3 | 3265 | 24 | 10607 |

Because the threads need to sync after one iteration of path-tracing, as long as there is a ray passing through the bounding box of a large leaf, the other threads will have to wait for those intersection tests to finish. As a result, each iterations' speed is bottlenecked by the max leaf size. With more time, it would be ideal to implement a better BVH splitting algorithm, or convert the BVH into a 4-nary or 8-nary tree to flatten it a bit more.

### Anti-Aliasing
Implemented anti-aliasing by jittering the camera ray in the up and right directions by the amount `boxSize`, aka. jitter ~ uniform(-boxSize/2, boxSize/2). This looks visually pleasing enough that it wasn't worth using a Gaussian distribution, since calculating its pdf would be much more expensive.

When anti-aliasing is ON, first bounce caching must be turned OFF. AA has a negligible performance cost, its actual cost is that we need to recompute the first bounce each time.

| boxSize | Scene | Close-up |
|--------|------|-------|
| 0 (no AA) |![](img/antialias_cornell_avocado_0.png) | ![](img/aa-0-zoom.png) |
|1|![](img/antialias_cornell_avocado_1.png) | ![](img/aa-1-zoom.png)|
|2| ![](img/antialias_cornell_avocado_2.png) | ![](img/aa-2-zoom.png)

`avocado_cornell.txt, avocado.gltf`: 5000 samples, depth 8, 800 x 800

### Object sources
I grabbed objects from Sketchfab or the [gltf sample models repo](https://github.com/KhronosGroup/glTF-Sample-Models), re-arranged them in Blender, and re-exported as gltf.
- [Avocado](https://github.com/KhronosGroup/glTF-Sample-Models/tree/master/2.0/Avocado)
- [Soda machines](https://sketchfab.com/3d-models/soda-machines-d0b81fdb4e514859bfcc95165144e8c7)
- [Motorcycle](https://sketchfab.com/3d-models/motorcycle-38404e2077ca4b209cd2f1db30541b94)
- [Rusty metal grate](https://sketchfab.com/3d-models/rusty-metal-grate-d814366c9dd24463bfc753a88f4d3ad0)
- [Gltf cube](https://github.com/KhronosGroup/glTF-Sample-Models/tree/master/2.0/Box%20With%20Spaces)
- [Low res stanford dragon](https://sketchfab.com/3d-models/stanford-dragon-vrip-res4-4c0714c7a68444f4b8a51cb5edda68aa)
- [Many materials](https://sketchfab.com/3d-models/gltf-test-pbr-material-2fe88c82edf24a9f8b608c11a0eb6920)

### Bloopers
They're all [here](https://docs.google.com/document/d/1BJmclri4VJY_IXbsLU8Er_CQihQnfmzTQRi5cz9FthM/edit#heading=h.3ah9h2xfckz8).
Loading