Name	Name	Last commit message	Last commit date
Latest commit dwha print cuda_error_string Apr 16, 2021 947d5de · Apr 16, 2021 History 50 Commits
cifar-10	cifar-10	+ cifar-10 classifier, + ready for adding conv2d layer	Sep 2, 2020
mnist	mnist	minor changes	Sep 1, 2020
.gitignore	.gitignore	Added main.cpp	Aug 31, 2020
CMakeLists.txt	CMakeLists.txt	debugged, + data augmentation, + batch normalization	Sep 15, 2020
Common.props	Common.props	Added main.cpp	Aug 31, 2020
README.md	README.md	updated	Sep 15, 2020
SimpleCudaNn.sln	SimpleCudaNn.sln	Added main.cpp	Aug 31, 2020
SimpleCudaNn.vcxproj	SimpleCudaNn.vcxproj	+ cifar-10 classifier, + ready for adding conv2d layer	Sep 2, 2020
SimpleCudaNn.vcxproj.filters	SimpleCudaNn.vcxproj.filters	+ cifar-10 classifier, + ready for adding conv2d layer	Sep 2, 2020
cifar10.cpp	cifar10.cpp	+ better data augmentation	Sep 15, 2020
ffCudaNn.cpp	ffCudaNn.cpp	print cuda_error_string	Apr 16, 2021
ffCudaNn.h	ffCudaNn.h	Added QuatNormLayer	Sep 24, 2020
main.cpp	main.cpp	minor change	Sep 24, 2020
mnist.cpp	mnist.cpp	+ Batch norm layer, + Data augmentation	Sep 15, 2020

Repository files navigation

SimpleCudaNeuralNet

This is for studying both neural network and CUDA.

I focused on simplicity and conciseness while coding. That means there is no error handling but assertations. It is a self-study result for better understanding of back-propagation algorithm. It'd be good if this C++ code fragment helps someone who has an interest in deep learning. CS231n from Stanford provides a good starting point to learn deep learning.

Status

Weight layers

2D Convolutional
Fully connected
Batch normalization

Non-linearity

Relu

Regularisation

Max pooling
Dropout

Loss

Mean squared error
Cross entropy loss

Optimizer

Adam

Result

Handwritten digit recognition

After basic components for deep learning implemented, I built a handwritten digit recognizer using MNIST database. A simple 2-layer FCNN(1000 hidden unit) could achieve 1.56% Top-1 error rate after 14 epochs which take less than 20 seconds of training time on RTX 2070 graphics card. (See mnist.cpp)

CIFAR-10 photo classification

In cifar10.cpp, you can find a VGG-like convolutional network which has 8 weight layers. CIFAR-10 dataset is used to train the model. It achieves 12.3% top-1 error rate after 31 epoches. It took 26.5 seconds of training time per epoch on my RTX 2070. If you try a larger model and have enough time to train you can improve it.

Notes

Even naive CUDA implementation easily speeds up by 700x more than single-core/no-SIMD CPU version.
Double precision floating point on the CUDA kernels was 3~4x slower than single precision operations.
Training performance is not comparable to PyTorch. PyTorch is much faster (x7~) to train the same model.
Coding this kind of numerical algorithms is tricky and even hard to figure out if there is a bug or not. Thorough unit testing of every functions strongly recommended if you try.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SimpleCudaNeuralNet

Status

Weight layers

Non-linearity

Regularisation

Loss

Optimizer

Result

Handwritten digit recognition

CIFAR-10 photo classification

Notes

About

Releases 1

Packages

Languages

dwha/SimpleCudaNeuralNet

Folders and files

Latest commit

History

Repository files navigation

SimpleCudaNeuralNet

Status

Weight layers

Non-linearity

Regularisation

Loss

Optimizer

Result

Handwritten digit recognition

CIFAR-10 photo classification

Notes

About

Topics

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages