intel · Calculuser · Jan 17, 2018 · Jan 17, 2018 · Jan 29, 2018 · Feb 1, 2018
diff --git a/README.md b/README.md
@@ -3,20 +3,20 @@
 Step-by-step Deep Learning Tutorials on Apache Spark using [BigDL](https://github.com/intel-analytics/BigDL/). The tutorials are inspired by [Apache Spark examples](http://spark.apache.org/examples.html), the [Theano Tutorials](https://github.com/Newmu/Theano-Tutorials) and the [Tensorflow tutorials](https://github.com/nlintz/TensorFlow-Tutorials).
 
 ### Topics
-1. [RDD](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/spark_basics/RDD.ipynb) 
-2. [DataFrame](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/spark_basics/DataFrame.ipynb)
-3. [SparkSQL](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/spark_basics/spark_sql.ipynb)
-4. [StructureStreaming](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/spark_basics/structured_streaming.ipynb)
-5. [Forward and backward](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/neural_networks/forward_and_backward.ipynb)
-6. [Linear Regression](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/neural_networks/linear_regression.ipynb)
-7. [Introduction to MNIST](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/neural_networks/introduction_to_mnist.ipynb)
-8. [Logistic Regression](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/neural_networks/logistic_regression.ipynb)
-9. [Feedforward Neural Network](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/neural_networks/deep_feed_forward_neural_network.ipynb)
-10. [Convolutional Neural Network](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/neural_networks/cnn.ipynb)
-11. [Recurrent Neural Network](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/neural_networks/rnn.ipynb)
-12. [LSTM](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/neural_networks/lstm.ipynb)
-13. [Bi-directional RNN](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/neural_networks/birnn.ipynb)
-14. [Auto-encoder](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/neural_networks/autoencoder.ipynb)
+1. RDD [[Python](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/python/spark_basics/RDD.ipynb)]
+2. DataFrame [[Python](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/python/spark_basics/DataFrame.ipynb)]
+3. SparkSQL [[Python](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/python/spark_basics/spark_sql.ipynb)]
+4. StructureStreaming [[Python](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/python/spark_basics/structured_streaming.ipynb)]
+5. Forward and backward [[Python](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/python/neural_networks/forward_and_backward.ipynb)]
+6. Linear Regression [[Python](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/python/neural_networks/linear_regression.ipynb) | [Scala](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/scala/neural_networks/linear_regression.ipynb)]
+7. Introduction to MNIST [[Python](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/python/neural_networks/introduction_to_mnist.ipynb) | [Scala](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/scala/neural_networks/introduction_to_mnist.ipynb)]
+8. Logistic Regression [[Python](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/python/neural_networks/logistic_regression.ipynb) | [Scala](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/scala/neural_networks/logistic_regression.ipynb)]
+9. Feedforward Neural Network [[Python](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/python/neural_networks/deep_feed_forward_neural_network.ipynb)]
+10. Convolutional Neural Network [[Python](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/python/neural_networks/cnn.ipynb)]
+11. Recurrent Neural Network [[Python](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/python/neural_networks/rnn.ipynb)]
+12. LSTM [[Python](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/python/neural_networks/lstm.ipynb)]
+13. Bi-directional RNN [[Python](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/python/neural_networks/birnn.ipynb)]
+14. Auto-encoder [[Python](https://github.com/intel-analytics/BigDL-Tutorials/blob/master/notebooks/python/neural_networks/autoencoder.ipynb)]
 
 ### Environment
 + Python 2.7

diff --git a/notebooks/neural_networks/autoencoder.ipynb → .../python/neural_networks/autoencoder.ipynb b/notebooks/neural_networks/autoencoder.ipynb → .../python/neural_networks/autoencoder.ipynb
diff --git a/notebooks/neural_networks/birnn.ipynb → notebooks/python/neural_networks/birnn.ipynb b/notebooks/neural_networks/birnn.ipynb → notebooks/python/neural_networks/birnn.ipynb
diff --git a/notebooks/neural_networks/cnn.ipynb → notebooks/python/neural_networks/cnn.ipynb b/notebooks/neural_networks/cnn.ipynb → notebooks/python/neural_networks/cnn.ipynb
diff --git a/...ks/deep_feed_forward_neural_network.ipynb → ...ks/deep_feed_forward_neural_network.ipynb b/...ks/deep_feed_forward_neural_network.ipynb → ...ks/deep_feed_forward_neural_network.ipynb
diff --git a/...eural_networks/forward_and_backward.ipynb → ...eural_networks/forward_and_backward.ipynb b/...eural_networks/forward_and_backward.ipynb → ...eural_networks/forward_and_backward.ipynb
diff --git a/...ural_networks/introduction_to_mnist.ipynb → ...ural_networks/introduction_to_mnist.ipynb b/...ural_networks/introduction_to_mnist.ipynb → ...ural_networks/introduction_to_mnist.ipynb
diff --git a/...s/neural_networks/linear_regression.ipynb → ...n/neural_networks/linear_regression.ipynb b/...s/neural_networks/linear_regression.ipynb → ...n/neural_networks/linear_regression.ipynb
diff --git a/...neural_networks/logistic_regression.ipynb → ...neural_networks/logistic_regression.ipynb b/...neural_networks/logistic_regression.ipynb → ...neural_networks/logistic_regression.ipynb
diff --git a/notebooks/neural_networks/lstm.ipynb → notebooks/python/neural_networks/lstm.ipynb b/notebooks/neural_networks/lstm.ipynb → notebooks/python/neural_networks/lstm.ipynb
diff --git a/notebooks/neural_networks/rnn.ipynb → notebooks/python/neural_networks/rnn.ipynb b/notebooks/neural_networks/rnn.ipynb → notebooks/python/neural_networks/rnn.ipynb
diff --git a/...Bi-directional_RNN/Bi-directional_RNN.jpg → ...Bi-directional_RNN/Bi-directional_RNN.jpg b/...Bi-directional_RNN/Bi-directional_RNN.jpg → ...Bi-directional_RNN/Bi-directional_RNN.jpg
diff --git a/...images/autoencoder/autoencoder_schema.jpg → ...images/autoencoder/autoencoder_schema.jpg b/...images/autoencoder/autoencoder_schema.jpg → ...images/autoencoder/autoencoder_schema.jpg
diff --git a/...ed_forward_NN/feedforwardNN_structure.png → ...ed_forward_NN/feedforwardNN_structure.png b/...ed_forward_NN/feedforwardNN_structure.png → ...ed_forward_NN/feedforwardNN_structure.png
diff --git a/notebooks/neural_networks/utils.py → notebooks/python/neural_networks/utils.py b/notebooks/neural_networks/utils.py → notebooks/python/neural_networks/utils.py
diff --git a/notebooks/spark_basics/DataFrame.ipynb → ...books/python/spark_basics/DataFrame.ipynb b/notebooks/spark_basics/DataFrame.ipynb → ...books/python/spark_basics/DataFrame.ipynb
diff --git a/notebooks/spark_basics/RDD.ipynb → notebooks/python/spark_basics/RDD.ipynb b/notebooks/spark_basics/RDD.ipynb → notebooks/python/spark_basics/RDD.ipynb
diff --git a/notebooks/spark_basics/spark_sql.ipynb → ...books/python/spark_basics/spark_sql.ipynb b/notebooks/spark_basics/spark_sql.ipynb → ...books/python/spark_basics/spark_sql.ipynb
diff --git a/...s/spark_basics/structured_streaming.ipynb → ...n/spark_basics/structured_streaming.ipynb b/...s/spark_basics/structured_streaming.ipynb → ...n/spark_basics/structured_streaming.ipynb
diff --git a/notebooks/scala/neural_networks/introduction_to_mnist.ipynb b/notebooks/scala/neural_networks/introduction_to_mnist.ipynb
@@ -0,0 +1,201 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Introduction to the MNIST database"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In the following tutorials, we are going to use the MNIST database of handwritten digits. MNIST is a simple computer vision dataset of handwritten digits. It has 60,000 training examles and 10,000 test examples. \"It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting.\" For more details of this database, please checkout the website [MNIST](http://yann.lecun.com/exdb/mnist/)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In BigDL, we need to write a function to download and read the MNIST data when using Scala."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "import java.nio.ByteBuffer\n",
+    "import java.nio.file.{Files, Path, Paths}\n",
+    "\n",
+    "import com.intel.analytics.bigdl.dataset.ByteRecord\n",
+    "import com.intel.analytics.bigdl.utils.File\n",
+    "import scopt.OptionParser\n",
+    "\n",
+    "def load(featureFile: String, labelFile: String): Array[ByteRecord] = {\n",
+    "    val featureBuffer = ByteBuffer.wrap(Files.readAllBytes(Paths.get(featureFile)))\n",
+    "    val labelBuffer = ByteBuffer.wrap(Files.readAllBytes(Paths.get(labelFile)))\n",
+    "    \n",
+    "    val labelMagicNumber = labelBuffer.getInt()\n",
+    "    require(labelMagicNumber == 2049)\n",
+    "    val featureMagicNumber = featureBuffer.getInt()\n",
+    "    require(featureMagicNumber == 2051)\n",
+    "\n",
+    "    val labelCount = labelBuffer.getInt()\n",
+    "    val featureCount = featureBuffer.getInt()\n",
+    "    require(labelCount == featureCount)\n",
+    "\n",
+    "    val rowNum = featureBuffer.getInt()\n",
+    "    val colNum = featureBuffer.getInt()\n",
+    "\n",
+    "    val result = new Array[ByteRecord](featureCount)\n",
+    "    var i = 0\n",
+    "    while (i < featureCount) {\n",
+    "      val img = new Array[Byte]((rowNum * colNum))\n",
+    "      var y = 0\n",
+    "      while (y < rowNum) {\n",
+    "        var x = 0\n",
+    "        while (x < colNum) {\n",
+    "          img(x + y * colNum) = featureBuffer.get()\n",
+    "          x += 1\n",
+    "        }\n",
+    "        y += 1\n",
+    "      }\n",
+    "      result(i) = ByteRecord(img, labelBuffer.get().toFloat + 1.0f)\n",
+    "      i += 1\n",
+    "    }\n",
+    "\n",
+    "    result\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "First, we need to import the necessary packages and initialize the engine."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "import org.apache.log4j.{Level, Logger}\n",
+    "import org.apache.spark.SparkContext\n",
+    "\n",
+    "import com.intel.analytics.bigdl.utils._\n",
+    "import com.intel.analytics.bigdl.dataset.DataSet\n",
+    "import com.intel.analytics.bigdl.dataset.image.{BytesToGreyImg, GreyImgNormalizer, GreyImgToBatch, GreyImgToSample}\n",
+    "import com.intel.analytics.bigdl.nn.{ClassNLLCriterion, Module}\n",
+    "import com.intel.analytics.bigdl.models.lenet.Utils._\n",
+    "import com.intel.analytics.bigdl.nn.{ClassNLLCriterion, Linear, LogSoftMax, Sequential, Reshape}\n",
+    "import com.intel.analytics.bigdl.numeric.NumericFloat\n",
+    "import com.intel.analytics.bigdl.optim.{SGD, Top1Accuracy}\n",
+    "import com.intel.analytics.bigdl.utils.{Engine, LoggerFilter, T, Table}\n",
+    "import com.intel.analytics.bigdl.tensor.Tensor\n",
+    "\n",
+    "Engine.init"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Next, the paths of training data and validation data should be set."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "val trainData = \"../datasets/mnist/train-images-idx3-ubyte\"\n",
+    "val trainLabel = \"../datasets/mnist/train-labels-idx1-ubyte\"\n",
+    "val validationData = \"../datasets/mnist/t10k-images-idx3-ubyte\"\n",
+    "val validationLabel = \"../datasets/mnist/t10k-labels-idx1-ubyte\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": true
+   },
+   "source": [
+    "Then, we need to define some parameters for loading the MINST data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "//Parameters\n",
+    "val batchSize = 2048\n",
+    "val learningRate = 0.2\n",
+    "val maxEpochs = 15\n",
+    "\n",
+    "//Network Parameters\n",
+    "val nInput = 784 //MNIST data input (img shape: 28*28)\n",
+    "val nClasses = 10  //MNIST total classes (0-9 digits)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Finally, we can use predefined function to load and serialize MNIST data. If you want to output the data, some modifications on the funtion should be applied."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "val trainSet = \n",
+    "    DataSet.array(load(trainData, trainLabel), sc) -> BytesToGreyImg(28, 28) -> GreyImgNormalizer(trainMean, trainStd) -> GreyImgToBatch(batchSize)\n",
+    "val validationSet = \n",
+    "    DataSet.array(load(validationData, validationLabel), sc) -> BytesToGreyImg(28, 28) -> GreyImgNormalizer(testMean, testStd) -> GreyImgToBatch(batchSize)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "sc.stop()"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Apache Toree - Scala",
+   "language": "scala",
+   "name": "apache_toree_scala"
+  },
+  "language_info": {
+   "file_extension": ".scala",
+   "name": "scala",
+   "version": "2.11.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/start_toree.sh b/start_toree.sh