Skip to content
This repository was archived by the owner on Jan 3, 2023. It is now read-only.

add scala notebooks & starting scripts. #39

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4,160 changes: 4,160 additions & 0 deletions notebooks/bigdl-tutorials-scala/autoencoder.ipynb

Large diffs are not rendered by default.

19,308 changes: 19,308 additions & 0 deletions notebooks/bigdl-tutorials-scala/birnn.ipynb

Large diffs are not rendered by default.

3,059 changes: 3,059 additions & 0 deletions notebooks/bigdl-tutorials-scala/cnn.ipynb

Large diffs are not rendered by default.

2,417 changes: 2,417 additions & 0 deletions notebooks/bigdl-tutorials-scala/deep_feed_forward_neural_network.ipynb

Large diffs are not rendered by default.

192 changes: 192 additions & 0 deletions notebooks/bigdl-tutorials-scala/forward_and_backward.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,192 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Forward and backward"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this section, we will unveil the basic mechanism of the computational process of BigDL using a simple example. In this example, we show that how to obtain the gradients with a single forward and backward pass for updating."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We first need to import the necessary modules and initialize the engine."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import org.apache.log4j.{Level, Logger}\n",
"import org.apache.spark.SparkContext\n",
"\n",
"import com.intel.analytics.bigdl.nn._\n",
"import com.intel.analytics.bigdl.utils.{Engine, LoggerFilter, T, Table}\n",
"import com.intel.analytics.bigdl.nn.{AbsCriterion}\n",
"import com.intel.analytics.bigdl.tensor._\n",
"import com.intel.analytics.bigdl.numeric.NumericFloat\n",
"\n",
"Engine.init"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then we create a simple linear regression which can be formulized as *y = Wx + b*, where *W = [w1,w2]* are weight parameters and *b* is the bias."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Weight and Bias:\n",
"0.3978135\t-0.2532979\t\n",
"[com.intel.analytics.bigdl.tensor.DenseTensor of size 1x2]\n",
"0.66053367\n",
"[com.intel.analytics.bigdl.tensor.DenseTensor of size 1]\n",
"GradWeight and gradBias:\n",
"0.0\t0.0\t\n",
"[com.intel.analytics.bigdl.tensor.DenseTensor of size 1x2]\n",
"0.0\n",
"[com.intel.analytics.bigdl.tensor.DenseTensor of size 1]\n"
]
}
],
"source": [
"// the input data size is 2*1, the output size is 1*1\n",
"val linear = Linear(2, 1)\n",
"// print the randomly initialized parameters\n",
"val (param1, param2) = linear.parameters()\n",
"println(\"Weight and Bias:\")\n",
"param1.foreach(println)\n",
"println(\"GradWeight and gradBias:\")\n",
"param2.foreach(println)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1.564943"
]
}
],
"source": [
"val input = Tensor(T(T(1f, -2f)))\n",
"// forward to output\n",
"val output = linear.updateOutput(input)\n",
"print(output.valueAt(1, 1))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then we backpropagate the error of the predicted output to the input."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"loss: 1.564943\n",
"Weight and Bias:\n",
"0.3978135\t-0.2532979\t\n",
"[com.intel.analytics.bigdl.tensor.DenseTensor of size 1x2]\n",
"0.66053367\n",
"[com.intel.analytics.bigdl.tensor.DenseTensor of size 1]\n",
"GradWeight and gradBias:\n",
"0.0\t0.0\t\n",
"[com.intel.analytics.bigdl.tensor.DenseTensor of size 1x2]\n",
"0.0\n",
"[com.intel.analytics.bigdl.tensor.DenseTensor of size 1]\n"
]
}
],
"source": [
"// mean absolute error\n",
"val mae = AbsCriterion()\n",
"val target = Tensor(1).fill(0)\n",
"\n",
"val loss = mae.updateOutput(output, target)\n",
"printf(\"loss: %s\\n\", loss.toString)\n",
" \n",
"val gradOutput = mae.updateGradInput(output, target)\n",
"linear.updateGradInput(input, gradOutput)\n",
"\n",
"val (param1, param2) = linear.parameters()\n",
"println(\"Weight and Bias:\")\n",
"param1.foreach(println)\n",
"println(\"GradWeight and gradBias:\")\n",
"param2.foreach(println)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, the Spark should be stopped."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"sc.stop()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"From above we can see that the backward pass has computed the gradient of the weights in respect to the loss. Therefore we can update the weights with the gradients using algorithms such as *stochastic gradient descent*. However in practice you **should** use *optimizer.optimize()* to circumvent the details."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Apache Toree - Scala",
"language": "scala",
"name": "apache_toree_scala"
},
"language_info": {
"file_extension": ".scala",
"name": "scala",
"version": "2.11.8"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
201 changes: 201 additions & 0 deletions notebooks/bigdl-tutorials-scala/introduction_to_mnist.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,201 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Introduction to the MNIST database"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the following tutorials, we are going to use the MNIST database of handwritten digits. MNIST is a simple computer vision dataset of handwritten digits. It has 60,000 training examles and 10,000 test examples. \"It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting.\" For more details of this database, please checkout the website [MNIST](http://yann.lecun.com/exdb/mnist/)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In BigDL, we need to write a function to download and read the MNIST data when using Scala."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import java.nio.ByteBuffer\n",
"import java.nio.file.{Files, Path, Paths}\n",
"\n",
"import com.intel.analytics.bigdl.dataset.ByteRecord\n",
"import com.intel.analytics.bigdl.utils.File\n",
"import scopt.OptionParser\n",
"\n",
"def load(featureFile: String, labelFile: String): Array[ByteRecord] = {\n",
" val featureBuffer = ByteBuffer.wrap(Files.readAllBytes(Paths.get(featureFile)))\n",
" val labelBuffer = ByteBuffer.wrap(Files.readAllBytes(Paths.get(labelFile)))\n",
" \n",
" val labelMagicNumber = labelBuffer.getInt()\n",
" require(labelMagicNumber == 2049)\n",
" val featureMagicNumber = featureBuffer.getInt()\n",
" require(featureMagicNumber == 2051)\n",
"\n",
" val labelCount = labelBuffer.getInt()\n",
" val featureCount = featureBuffer.getInt()\n",
" require(labelCount == featureCount)\n",
"\n",
" val rowNum = featureBuffer.getInt()\n",
" val colNum = featureBuffer.getInt()\n",
"\n",
" val result = new Array[ByteRecord](featureCount)\n",
" var i = 0\n",
" while (i < featureCount) {\n",
" val img = new Array[Byte]((rowNum * colNum))\n",
" var y = 0\n",
" while (y < rowNum) {\n",
" var x = 0\n",
" while (x < colNum) {\n",
" img(x + y * colNum) = featureBuffer.get()\n",
" x += 1\n",
" }\n",
" y += 1\n",
" }\n",
" result(i) = ByteRecord(img, labelBuffer.get().toFloat + 1.0f)\n",
" i += 1\n",
" }\n",
"\n",
" result\n",
"}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First, we need to import the necessary packages and initialize the engine."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import org.apache.log4j.{Level, Logger}\n",
"import org.apache.spark.SparkContext\n",
"\n",
"import com.intel.analytics.bigdl.utils._\n",
"import com.intel.analytics.bigdl.dataset.DataSet\n",
"import com.intel.analytics.bigdl.dataset.image.{BytesToGreyImg, GreyImgNormalizer, GreyImgToBatch, GreyImgToSample}\n",
"import com.intel.analytics.bigdl.nn.{ClassNLLCriterion, Module}\n",
"import com.intel.analytics.bigdl.models.lenet.Utils._\n",
"import com.intel.analytics.bigdl.nn.{ClassNLLCriterion, Linear, LogSoftMax, Sequential, Reshape}\n",
"import com.intel.analytics.bigdl.numeric.NumericFloat\n",
"import com.intel.analytics.bigdl.optim.{SGD, Top1Accuracy}\n",
"import com.intel.analytics.bigdl.utils.{Engine, LoggerFilter, T, Table}\n",
"import com.intel.analytics.bigdl.tensor.Tensor\n",
"\n",
"Engine.init"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, the paths of training data and validation data should be set."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"val trainData = \"../../datasets/mnist/train-images-idx3-ubyte\"\n",
"val trainLabel = \"../../datasets/mnist/train-labels-idx1-ubyte\"\n",
"val validationData = \"../../datasets/mnist/t10k-images-idx3-ubyte\"\n",
"val validationLabel = \"../../datasets/mnist/t10k-labels-idx1-ubyte\""
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"Then, we need to define some parameters for loading the MINST data."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"//Parameters\n",
"val batchSize = 2048\n",
"val learningRate = 0.2\n",
"val maxEpochs = 15\n",
"\n",
"//Network Parameters\n",
"val nInput = 784 //MNIST data input (img shape: 28*28)\n",
"val nClasses = 10 //MNIST total classes (0-9 digits)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, we can use predefined function to load and serialize MNIST data. If you want to output the data, some modifications on the funtion should be applied."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"val trainSet = \n",
" DataSet.array(load(trainData, trainLabel), sc) -> BytesToGreyImg(28, 28) -> GreyImgNormalizer(trainMean, trainStd) -> GreyImgToBatch(batchSize)\n",
"val validationSet = \n",
" DataSet.array(load(validationData, validationLabel), sc) -> BytesToGreyImg(28, 28) -> GreyImgNormalizer(testMean, testStd) -> GreyImgToBatch(batchSize)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"sc.stop()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Apache Toree - Scala",
"language": "scala",
"name": "apache_toree_scala"
},
"language_info": {
"file_extension": ".scala",
"name": "scala",
"version": "2.11.8"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Loading