Skip to content

Commit 06c6ee5

Browse files
committed
updated
1 parent 37d0d06 commit 06c6ee5

File tree

1 file changed

+41
-41
lines changed

1 file changed

+41
-41
lines changed

README.md

+41-41
Original file line numberDiff line numberDiff line change
@@ -1,41 +1,41 @@
1-
# SimpleCudaNeuralNet
2-
This is for studying both neural network and CUDA.
3-
4-
I focused on simplicity and conciseness while coding. That means there is no error handling but assertations. It is a self-study result for better understanding of back-propagation algorithm. It'd be good if this C++ code fragment helps someone who has an interest in deep learning. [CS231n](http://cs231n.stanford.edu/2017/syllabus) from Stanford provides a good starting point to learn deep learning.
5-
6-
## Status
7-
#### Weight layers
8-
* 2D Convolutional
9-
* Fully connected
10-
* Batch normalization
11-
12-
#### Non-linearity
13-
* Relu
14-
15-
#### Regularisation
16-
* Max pooling
17-
* Dropout
18-
19-
#### Loss
20-
* Mean squared error
21-
* Cross entropy loss
22-
23-
#### Optimizer
24-
* Adam
25-
26-
## Result
27-
### Handwritten digit recognition
28-
![ffCudaNn](https://user-images.githubusercontent.com/670560/91796552-735ee780-ec5b-11ea-88fc-0f0a343ce8d6.png)
29-
30-
After basic components for deep learning implemented, I built a handwritten digit recognizer using [MNIST database](http://yann.lecun.com/exdb/mnist/). A simple 2-layer FCNN(1000 hidden unit) could achieve 1.56% Top-1 error rate after 14 epochs which take less than 20 seconds of training time on RTX 2070 graphics card. (See [mnist.cpp](mnist.cpp))
31-
32-
### CIFAR-10 photo classification
33-
![top1_err_87_67](https://user-images.githubusercontent.com/670560/93242618-0e120700-f7c2-11ea-807a-f3188dcfc092.png)
34-
35-
In [cifar10.cpp](cifar10.cpp), you can find a VGG-like convolutional network which has 8 weight layers. [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html) dataset is used to train the model. It achieves 12.3% top-1 error rate after 31 epoches. It took 26.5 seconds of training time per epoch on my RTX 2070. If you try a larger model and have enough time to train you can improve it.
36-
37-
### Notes
38-
- Even naive CUDA implementation easily speeds up by 700x more than single-core/no-SIMD CPU version.
39-
- Double precision floating point on the CUDA kernels was 3~4x slower than single precision operations.
40-
- Training performance is not comparable to PyTorch. PyTorch is much faster (x7~) to train the same model.
41-
- Coding this kind of numerical algorithms is tricky and even hard to figure out if there is a bug or not. Thorough unit testing of every functions strongly recommended if you try.
1+
# SimpleCudaNeuralNet
2+
This is for studying both neural network and CUDA.
3+
4+
I focused on simplicity and conciseness while coding. That means there is no error handling but assertations. It is a self-study result for better understanding of back-propagation algorithm. It'd be good if this C++ code fragment helps someone who has an interest in deep learning. [CS231n](http://cs231n.stanford.edu/2017/syllabus) from Stanford provides a good starting point to learn deep learning.
5+
6+
## Status
7+
#### Weight layers
8+
* 2D Convolutional
9+
* Fully connected
10+
* Batch normalization
11+
12+
#### Non-linearity
13+
* Relu
14+
15+
#### Regularisation
16+
* Max pooling
17+
* Dropout
18+
19+
#### Loss
20+
* Mean squared error
21+
* Cross entropy loss
22+
23+
#### Optimizer
24+
* Adam
25+
26+
## Result
27+
### Handwritten digit recognition
28+
![mnist_result](https://user-images.githubusercontent.com/670560/93243500-3cdcad00-f7c3-11ea-985b-5af1117dd0f4.png)
29+
30+
After basic components for deep learning implemented, I built a handwritten digit recognizer using [MNIST database](http://yann.lecun.com/exdb/mnist/). A simple 2-layer FCNN(1000 hidden unit) could achieve 1.56% Top-1 error rate after 14 epochs which take less than 20 seconds of training time on RTX 2070 graphics card. (See [mnist.cpp](mnist.cpp))
31+
32+
### CIFAR-10 photo classification
33+
![top1_err_87_67](https://user-images.githubusercontent.com/670560/93243007-909ac680-f7c2-11ea-8deb-2abc5704fa29.png)
34+
35+
In [cifar10.cpp](cifar10.cpp), you can find a VGG-like convolutional network which has 8 weight layers. [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html) dataset is used to train the model. It achieves 12.3% top-1 error rate after 31 epoches. It took 26.5 seconds of training time per epoch on my RTX 2070. If you try a larger model and have enough time to train you can improve it.
36+
37+
### Notes
38+
- Even naive CUDA implementation easily speeds up by 700x more than single-core/no-SIMD CPU version.
39+
- Double precision floating point on the CUDA kernels was 3~4x slower than single precision operations.
40+
- Training performance is not comparable to PyTorch. PyTorch is much faster (x7~) to train the same model.
41+
- Coding this kind of numerical algorithms is tricky and even hard to figure out if there is a bug or not. Thorough unit testing of every functions strongly recommended if you try.

0 commit comments

Comments
 (0)