atcemgil
diff --git a/Diff for: ‎DataSets.ipynb
+1-1 b/Diff for: ‎DataSets.ipynb
+1-1
diff --git a/Diff for: ‎DecisionTree.ipynb
+240 b/Diff for: ‎DecisionTree.ipynb
+240
diff --git a/Diff for: ‎DrawGraphs.ipynb
+20-16 b/Diff for: ‎DrawGraphs.ipynb
+20-16
diff --git a/Diff for: ‎DynamicalSystems.ipynb
+985 b/Diff for: ‎DynamicalSystems.ipynb
+985
diff --git a/Diff for: ‎HiddenMarkovModel.ipynb
+71-131 b/Diff for: ‎HiddenMarkovModel.ipynb
+71-131
diff --git a/Diff for: ‎KNN.ipynb
+19-2 b/Diff for: ‎KNN.ipynb
+19-2
diff --git a/Diff for: ‎KalmanFilter.ipynb
+132-81 b/Diff for: ‎KalmanFilter.ipynb
+132-81
diff --git a/Diff for: ‎MixtureModels.ipynb
+163 b/Diff for: ‎MixtureModels.ipynb
+163
@@ -5269,7 +5269,7 @@
     "\n",
     "### http://mldata.org/repository/data/viewslug/well-log/\n",
     "\n",
-    "Consists of 4050 nuclear magnetic resonance measurement taken from drill while drilling a well\n",
+    "Consists of 4050 nuclear magnetic resonance measurements taken from drill while drilling a well\n",
     "\n"
    ]
   },
 
@@ -14,10 +14,13 @@
     "\n",
     "Just store the dataset and for a new observed point $x$, find it's nearest neighbor $i^*$ and report $c_{i^*}$ \n",
     "\n",
-    "\n",
     "$$\n",
     "i^* = \\arg\\min_{i=1\\dots N} D(x_i, x)\n",
-    "$$\n"
+    "$$\n",
+    "\n",
+    "## KNN: K nearest neighbors\n",
+    "\n",
+    "Find the $k$ nearest neighbors and do a majority voting.\n"
    ]
   },
   {
@@ -655,6 +658,13 @@
     "print c[:5]"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The choice of the distance function (divergence) can be important. In practice, a popular choice is the Euclidian distance but this is by no means the only one. "
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 42,
@@ -688,6 +698,13 @@
     "Divergence([0,0],[1,1],p=W)\n"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Equal distance contours"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 27,
 
@@ -4,32 +4,82 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Derivation of the Kalman Filter\n",
+    "# The Linear Dynamical System and the Kalman Filter\n",
     "\n",
+    "The Kalman filter is a recursive algorithm for estimating the latent state of a linear dynamical system given the observations.\n",
     "\n",
-    "Implement the Kalman filter with parameters $\\mu, P, A, C, Q, R$. \n",
+    "The linear dynamical system is a key model and understanding it as a generative model is key in understanding the principles and limitations of the Kalman filter.\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## The Linear Dynamical System (LDS)\n",
+    "\n",
+    "A (discrete time) linear dynamical system describes the evolution of the state of a system and\n",
+    "the observations that can be obtained from the state.\n",
+    "\n",
+    "The system is described at times $t=1,2,\\dots$. At each time $t$, the state is denoted by an $N \\times 1$ state vector $x_t$. We don't directly observe the states but obtain observations that are related to the states. The observations are denoted by $M \\times 1$ observation vector $y_t$. The mathematical model is \n",
     "\n",
     "\\begin{eqnarray}\n",
     "x_0 & \\sim & {\\mathcal N}(\\mu, P) \\\\\n",
-    "x_{t} & \\sim & {\\mathcal N}(A x_{t-1}, Q) \\\\\n",
-    "y_{t} & \\sim & {\\mathcal N}(C x_{t}, R) \n",
+    "x_{t} & = & A x_{t-1} + \\epsilon_t \\\\\n",
+    "y_{t} & = & C x_{t} + \\nu_t \n",
     "\\end{eqnarray}\n",
     "\n",
+    "here, $\\epsilon_t$ and $\\nu_t$ are assumed to be zero mean Gaussian random variables with variance $Q$ and $P$. \n",
+    "\n",
+    "Equivalently, we can express the system as a hierarchical generative model\n",
+    "\\begin{eqnarray}\n",
+    "x_0 & \\sim & {\\mathcal N}(\\mu, P) \\\\\n",
+    "x_{t}|x_{t-1} & \\sim & {\\mathcal N}(A x_{t-1}, Q) \\\\\n",
+    "y_{t}|x_{t} & \\sim & {\\mathcal N}(C x_{t}, R) \n",
+    "\\end{eqnarray}\n",
     "for $t=1\\dots T$.\n",
-    "The known Parameters:\n",
     "\n",
-    "$\\mu$ is $N \\times 1$ \n",
+    "Both formulations are equivalent; where the former is more often used in engineering and the latter in statistics. Later, we will describe a few small extensions to the model.\n",
+    "\n",
+    "We can try to understand the meaning of each parameter to understand qualitatively what the LDS is doing. The parameters reflect our knowledge about \n",
     "\n",
-    "$P$ is $N \\times N$ and diagonal, positive semidefinite\n",
+    "* the initial state: $\\mu, P$\n",
+    "* the inner working of the system dynamics, the state transition model: $A, Q$\n",
+    "* how the states are observed, the observation model $C, R$ \n",
     "\n",
-    "$A$ is $N \\times N$\n",
+    "The initial state mean $\\mu$ is a $N \\times 1$ vector, and the initial state covariance $P$ is an $N \\times N$ positive semidefinite matrix.\n",
+    "These two parameters reflect our prior knowledge about the initial state of the process: the expected state and the uncertainty around it as characterized by the covariance matrix.\n",
     "\n",
-    "$C$ is $M \\times N$\n",
+    "Think of the pair $(\\mu, \\Sigma)$ defining an ellipse and the initial state $x_0$ is most likely located somewhere in this ellipse.\n",
+    "\n",
+    "The state transition matrix $A$ is $N \\times N$, and the state transition noise covariance matrix $Q$ is $N \\times N$, diagonal, positive semidefinite. The state transition matrix $A$ describes the dynamic behaviour of the linear dynamic system: the system would simply undergo a linear transformation if there was no uncertainty about the dynamics\n",
+    "$$\n",
+    "x_t = A x_{t-1}\n",
+    "$$\n",
+    "The linear dynamic system model also includes an additive noise term that enables us to model random deviations from an exactly deterministic linear transformation. The deviations are assumed to be zero mean and have the covariance matrix $Q$.\n",
     "\n",
-    "$Q$ is $N \\times N$ and diagonal, positive semidefinite\n",
+    "Given the previous state $x_{t-1}$ at time $t-1$, think of the pair $(A x_{t-1}, Q)$ defining an ellipse and the current state $x_t$ is most likely located somewhere in this ellipse.\n",
     "\n",
-    "$R$ is $M \\times M$ and diagonal, positive semidefinite\n",
+    "Finally, the observation matrix \n",
     "\n",
+    "$C$ is $M \\times N$, observation matrix\n",
+    "$R$ is $M \\times M$ and diagonal, positive semidefinite, observation noise covariance"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Example: A point object moving with (almost) constant velocity.\n",
+    "\n",
+    "\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
     "The input to the algorithm are the parameters and a sequence of \n",
     "observations $y_t$ for $t=1 \\dots T$\n",
     "\n",
@@ -49,9 +99,10 @@
     "\n",
     "-- Covariances: $\\Sigma_{t|t}$\n",
     "\n",
-    "-- A sequence of loglikelihoods $l_k$\n",
+    "-- A sequence of loglikelihoods $l_k = p(y_k| y_{1:k-1})$\n",
+    "\n",
     "\n",
-    "## Kalman Filter\n",
+    "## The algorithm\n",
     "\n",
     "The Kalman filtering algorithm is as follows:\n",
     "\n",
@@ -86,6 +137,74 @@
     "End For"
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": 24,
+   "metadata": {
+    "collapsed": false,
+    "scrolled": false
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "mu=\n",
+      " [[ 10.]\n",
+      " [ 10.]]\n",
+      "P=\n",
+      " [[ 100.    0.]\n",
+      " [   0.  100.]]\n",
+      "A=\n",
+      " [[ 12.   4.]\n",
+      " [  1.  -3.]]\n",
+      "C=\n",
+      " [[-3.  5.]\n",
+      " [-4.  2.]\n",
+      " [ 4. -6.]]\n",
+      "Q=\n",
+      " [[ 0.1  0. ]\n",
+      " [ 0.   0.1]]\n",
+      "R=\n",
+      " [[ 2.  0.  0.]\n",
+      " [ 0.  2.  0.]\n",
+      " [ 0.  0.  2.]]\n",
+      "observations=\n",
+      " [[-1. -5.  6.]\n",
+      " [ 3. -0. -5.]\n",
+      " [ 1. -1. -8.]]\n"
+     ]
+    }
+   ],
+   "source": [
+    "import numpy as np\n",
+    "\n",
+    "N = 2\n",
+    "M = 3\n",
+    "\n",
+    "A = np.matrix(np.ceil(5*np.random.randn(N,N)))\n",
+    "C = np.matrix(np.ceil(5*np.random.randn(M,N)))\n",
+    "R = np.matrix(2*np.eye(M))\n",
+    "Q = np.matrix(0.1*np.eye(N))\n",
+    "\n",
+    "mu = np.matrix(10*np.ones((N,1)))\n",
+    "P = np.matrix(100*np.eye(N))\n",
+    "\n",
+    "print('mu=\\n',mu)\n",
+    "print('P=\\n',P)\n",
+    "\n",
+    "print('A=\\n',A)\n",
+    "print('C=\\n',C)\n",
+    "print('Q=\\n',Q)\n",
+    "print('R=\\n',R)\n",
+    "\n",
+    "T = 3;\n",
+    "\n",
+    "y = np.matrix(np.ceil(5*np.random.randn(M,T)))\n",
+    "\n",
+    "print('observations=\\n',y)"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -200,74 +319,6 @@
     "All the parameters will be specified when constructing the object and observations will be provided to the filter as a $M \\times T$ matrix."
    ]
   },
-  {
-   "cell_type": "code",
-   "execution_count": 24,
-   "metadata": {
-    "collapsed": false,
-    "scrolled": false
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "mu=\n",
-      " [[ 10.]\n",
-      " [ 10.]]\n",
-      "P=\n",
-      " [[ 100.    0.]\n",
-      " [   0.  100.]]\n",
-      "A=\n",
-      " [[ 12.   4.]\n",
-      " [  1.  -3.]]\n",
-      "C=\n",
-      " [[-3.  5.]\n",
-      " [-4.  2.]\n",
-      " [ 4. -6.]]\n",
-      "Q=\n",
-      " [[ 0.1  0. ]\n",
-      " [ 0.   0.1]]\n",
-      "R=\n",
-      " [[ 2.  0.  0.]\n",
-      " [ 0.  2.  0.]\n",
-      " [ 0.  0.  2.]]\n",
-      "observations=\n",
-      " [[-1. -5.  6.]\n",
-      " [ 3. -0. -5.]\n",
-      " [ 1. -1. -8.]]\n"
-     ]
-    }
-   ],
-   "source": [
-    "import numpy as np\n",
-    "\n",
-    "N = 2\n",
-    "M = 3\n",
-    "\n",
-    "A = np.matrix(np.ceil(5*np.random.randn(N,N)))\n",
-    "C = np.matrix(np.ceil(5*np.random.randn(M,N)))\n",
-    "R = np.matrix(2*np.eye(M))\n",
-    "Q = np.matrix(0.1*np.eye(N))\n",
-    "\n",
-    "mu = np.matrix(10*np.ones((N,1)))\n",
-    "P = np.matrix(100*np.eye(N))\n",
-    "\n",
-    "print('mu=\\n',mu)\n",
-    "print('P=\\n',P)\n",
-    "\n",
-    "print('A=\\n',A)\n",
-    "print('C=\\n',C)\n",
-    "print('Q=\\n',Q)\n",
-    "print('R=\\n',R)\n",
-    "\n",
-    "T = 3;\n",
-    "\n",
-    "y = np.matrix(np.ceil(5*np.random.randn(M,T)))\n",
-    "\n",
-    "print('observations=\\n',y)"
-   ]
-  },
   {
    "cell_type": "code",
    "execution_count": 18,
Original file line number	Diff line number	Diff line change
`@@ -5269,7 +5269,7 @@`
`5269`	`5269`	`"\n",`
`5270`	`5270`	`"### http://mldata.org/repository/data/viewslug/well-log/\n",`
`5271`	`5271`	`"\n",`
`5272`		`- "Consists of 4050 nuclear magnetic resonance measurement taken from drill while drilling a well\n",`
	`5272`	`+ "Consists of 4050 nuclear magnetic resonance measurements taken from drill while drilling a well\n",`
`5273`	`5273`	`"\n"`
`5274`	`5274`	`]`
`5275`	`5275`	`},`