Skip to content
This repository was archived by the owner on Jan 3, 2023. It is now read-only.

Change the structure of notebooks directory. #48

Open
wants to merge 18 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 14 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,20 +3,20 @@
Step-by-step Deep Learning Tutorials on Apache Spark using [BigDL](https://github.com/intel-analytics/BigDL/). The tutorials are inspired by [Apache Spark examples](http://spark.apache.org/examples.html), the [Theano Tutorials](https://github.com/Newmu/Theano-Tutorials) and the [Tensorflow tutorials](https://github.com/nlintz/TensorFlow-Tutorials).

### Topics
1. [RDD](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/spark_basics/RDD.ipynb)
2. [DataFrame](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/spark_basics/DataFrame.ipynb)
3. [SparkSQL](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/spark_basics/spark_sql.ipynb)
4. [StructureStreaming](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/spark_basics/structured_streaming.ipynb)
5. [Forward and backward](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/neural_networks/forward_and_backward.ipynb)
6. [Linear Regression](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/neural_networks/linear_regression.ipynb)
7. [Introduction to MNIST](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/neural_networks/introduction_to_mnist.ipynb)
8. [Logistic Regression](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/neural_networks/logistic_regression.ipynb)
9. [Feedforward Neural Network](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/neural_networks/deep_feed_forward_neural_network.ipynb)
10. [Convolutional Neural Network](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/neural_networks/cnn.ipynb)
11. [Recurrent Neural Network](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/neural_networks/rnn.ipynb)
12. [LSTM](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/neural_networks/lstm.ipynb)
13. [Bi-directional RNN](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/neural_networks/birnn.ipynb)
14. [Auto-encoder](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/neural_networks/autoencoder.ipynb)
1. RDD [[Python](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/python/spark_basics/RDD.ipynb)]
2. DataFrame [[Python](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/python/spark_basics/DataFrame.ipynb)]
3. SparkSQL [[Python](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/python/spark_basics/spark_sql.ipynb)]
4. StructureStreaming [[Python](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/python/spark_basics/structured_streaming.ipynb)]
5. Forward and backward [[Python](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/python/neural_networks/forward_and_backward.ipynb)]
6. Linear Regression [[Python](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/python/neural_networks/linear_regression.ipynb) | [Scala](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/scala/neural_networks/linear_regression.ipynb)]
7. Introduction to MNIST [[Python](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/python/neural_networks/introduction_to_mnist.ipynb) | [Scala](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/scala/neural_networks/introduction_to_mnist.ipynb)]
8. Logistic Regression [[Python](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/python/neural_networks/logistic_regression.ipynb) | [Scala](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/scala/neural_networks/logistic_regression.ipynb)]
9. Feedforward Neural Network [[Python](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/python/neural_networks/deep_feed_forward_neural_network.ipynb)]
10. Convolutional Neural Network [[Python](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/python/neural_networks/cnn.ipynb)]
11. Recurrent Neural Network [[Python](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/python/neural_networks/rnn.ipynb)]
12. LSTM [[Python](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/python/neural_networks/lstm.ipynb)]
13. Bi-directional RNN [[Python](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/python/neural_networks/birnn.ipynb)]
14. Auto-encoder [[Python](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/python/neural_networks/autoencoder.ipynb)]

### Environment
+ Python 2.7
Expand Down
201 changes: 201 additions & 0 deletions notebooks/scala/neural_networks/introduction_to_mnist.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,201 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Introduction to the MNIST database"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the following tutorials, we are going to use the MNIST database of handwritten digits. MNIST is a simple computer vision dataset of handwritten digits. It has 60,000 training examles and 10,000 test examples. \"It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting.\" For more details of this database, please checkout the website [MNIST](http://yann.lecun.com/exdb/mnist/)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In BigDL, we need to write a function to download and read the MNIST data when using Scala."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import java.nio.ByteBuffer\n",
"import java.nio.file.{Files, Path, Paths}\n",
"\n",
"import com.intel.analytics.bigdl.dataset.ByteRecord\n",
"import com.intel.analytics.bigdl.utils.File\n",
"import scopt.OptionParser\n",
"\n",
"def load(featureFile: String, labelFile: String): Array[ByteRecord] = {\n",
" val featureBuffer = ByteBuffer.wrap(Files.readAllBytes(Paths.get(featureFile)))\n",
" val labelBuffer = ByteBuffer.wrap(Files.readAllBytes(Paths.get(labelFile)))\n",
" \n",
" val labelMagicNumber = labelBuffer.getInt()\n",
" require(labelMagicNumber == 2049)\n",
" val featureMagicNumber = featureBuffer.getInt()\n",
" require(featureMagicNumber == 2051)\n",
"\n",
" val labelCount = labelBuffer.getInt()\n",
" val featureCount = featureBuffer.getInt()\n",
" require(labelCount == featureCount)\n",
"\n",
" val rowNum = featureBuffer.getInt()\n",
" val colNum = featureBuffer.getInt()\n",
"\n",
" val result = new Array[ByteRecord](featureCount)\n",
" var i = 0\n",
" while (i < featureCount) {\n",
" val img = new Array[Byte]((rowNum * colNum))\n",
" var y = 0\n",
" while (y < rowNum) {\n",
" var x = 0\n",
" while (x < colNum) {\n",
" img(x + y * colNum) = featureBuffer.get()\n",
" x += 1\n",
" }\n",
" y += 1\n",
" }\n",
" result(i) = ByteRecord(img, labelBuffer.get().toFloat + 1.0f)\n",
" i += 1\n",
" }\n",
"\n",
" result\n",
"}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First, we need to import the necessary packages and initialize the engine."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import org.apache.log4j.{Level, Logger}\n",
"import org.apache.spark.SparkContext\n",
"\n",
"import com.intel.analytics.bigdl.utils._\n",
"import com.intel.analytics.bigdl.dataset.DataSet\n",
"import com.intel.analytics.bigdl.dataset.image.{BytesToGreyImg, GreyImgNormalizer, GreyImgToBatch, GreyImgToSample}\n",
"import com.intel.analytics.bigdl.nn.{ClassNLLCriterion, Module}\n",
"import com.intel.analytics.bigdl.models.lenet.Utils._\n",
"import com.intel.analytics.bigdl.nn.{ClassNLLCriterion, Linear, LogSoftMax, Sequential, Reshape}\n",
"import com.intel.analytics.bigdl.numeric.NumericFloat\n",
"import com.intel.analytics.bigdl.optim.{SGD, Top1Accuracy}\n",
"import com.intel.analytics.bigdl.utils.{Engine, LoggerFilter, T, Table}\n",
"import com.intel.analytics.bigdl.tensor.Tensor\n",
"\n",
"Engine.init"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, the paths of training data and validation data should be set."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"val trainData = \"../datasets/mnist/train-images-idx3-ubyte\"\n",
"val trainLabel = \"../datasets/mnist/train-labels-idx1-ubyte\"\n",
"val validationData = \"../datasets/mnist/t10k-images-idx3-ubyte\"\n",
"val validationLabel = \"../datasets/mnist/t10k-labels-idx1-ubyte\""
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"Then, we need to define some parameters for loading the MINST data."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"//Parameters\n",
"val batchSize = 2048\n",
"val learningRate = 0.2\n",
"val maxEpochs = 15\n",
"\n",
"//Network Parameters\n",
"val nInput = 784 //MNIST data input (img shape: 28*28)\n",
"val nClasses = 10 //MNIST total classes (0-9 digits)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, we can use predefined function to load and serialize MNIST data. If you want to output the data, some modifications on the funtion should be applied."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"val trainSet = \n",
" DataSet.array(load(trainData, trainLabel), sc) -> BytesToGreyImg(28, 28) -> GreyImgNormalizer(trainMean, trainStd) -> GreyImgToBatch(batchSize)\n",
"val validationSet = \n",
" DataSet.array(load(validationData, validationLabel), sc) -> BytesToGreyImg(28, 28) -> GreyImgNormalizer(testMean, testStd) -> GreyImgToBatch(batchSize)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"sc.stop()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Apache Toree - Scala",
"language": "scala",
"name": "apache_toree_scala"
},
"language_info": {
"file_extension": ".scala",
"name": "scala",
"version": "2.11.8"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
43 changes: 0 additions & 43 deletions start_toree.sh

This file was deleted.