GitNotebooks
diff --git a/‎pandas-example.ipynb
+273 b/‎pandas-example.ipynb
+273
diff --git a/‎sklearn-example.ipynb
+197 b/‎sklearn-example.ipynb
+197
@@ -0,0 +1,273 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "# Operating on Data in Pandas"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Copied from https://github.com/jakevdp/PythonDataScienceHandbook"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "One of the essential pieces of NumPy is the ability to perform quick element-wise operations, both with basic arithmetic (addition, subtraction, multiplication, etc.) and with more sophisticated operations (trigonometric functions, exponential and logarithmic functions, etc.).\n",
+        "Pandas inherits much of this functionality from NumPy, and the ufuncs that we introduced in [Computation on NumPy Arrays: Universal Functions](02.03-Computation-on-arrays-ufuncs.ipynb) are key to this.\n",
+        "\n",
+        "Pandas includes a couple useful twists, however: for unary operations like negation and trigonometric functions, these ufuncs will *preserve index and column labels* in the output, and for binary operations such as addition and multiplication, Pandas will automatically *align indices* when passing the objects to the ufunc.\n",
+        "This means that keeping the context of data and combining data from different sources–both potentially error-prone tasks with raw NumPy arrays–become essentially foolproof ones with Pandas.\n",
+        "We will additionally see that there are well-defined operations between one-dimensional ``Series`` structures and two-dimensional ``DataFrame`` structures."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 121,
+      "metadata": {
+        "collapsed": true
+      },
+      "outputs": [],
+      "source": [
+        "import pandas as pd\n",
+        "import numpy as np"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Any item for which one or the other does not have an entry is marked with ``NaN``, or \"Not a Number,\" which is how Pandas marks missing data (see further discussion of missing data in [Handling Missing Data](03.04-Missing-Values.ipynb)).\n",
+        "This index matching is implemented this way for any of Python's built-in arithmetic expressions; any missing values are filled in with NaN by default:"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 122,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [],
+      "source": [
+        "A = pd.Series([2, 4, 6], index=[0, 1, 2])\n",
+        "B = pd.Series([1, 3, 5], index=[1, 2, 3])"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "If using NaN values is not the desired behavior, the fill value can be modified using appropriate object methods in place of the operators.\n",
+        "For example, calling ``A.add(B)`` is equivalent to calling ``A + B``, but allows optional explicit specification of the fill value for any elements in ``A`` or ``B`` that might be missing:"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 123,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [
+        {
+          "data": {
+            "text/plain": [
+              "0    2.0\n",
+              "1    5.0\n",
+              "2    9.0\n",
+              "3    5.0\n",
+              "dtype: float64"
+            ]
+          },
+          "execution_count": 123,
+          "metadata": {},
+          "output_type": "execute_result"
+        }
+      ],
+      "source": [
+        "A.add(B, fill_value=0)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "Notice that indices are aligned correctly irrespective of their order in the two objects, and indices in the result are sorted.\n",
+        "As was the case with ``Series``, we can use the associated object's arithmetic method and pass any desired ``fill_value`` to be used in place of missing entries.\n",
+        "Here we'll fill with the mean of all values in ``A`` (computed by first stacking the rows of ``A``):"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 29,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "rng = np.random.RandomState(42)\n",
+        "A = pd.DataFrame(rng.randint(0, 20, (2, 2)),\n",
+        "                 columns=list('AB'))\n",
+        "B = pd.DataFrame(rng.randint(0, 10, (3, 3)),\n",
+        "                 columns=list('BAC'))"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 127,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [
+        {
+          "data": {
+            "text/html": [
+              "<div>\n",
+              "<style scoped>\n",
+              "    .dataframe tbody tr th:only-of-type {\n",
+              "        vertical-align: middle;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe tbody tr th {\n",
+              "        vertical-align: top;\n",
+              "    }\n",
+              "\n",
+              "    .dataframe thead th {\n",
+              "        text-align: right;\n",
+              "    }\n",
+              "</style>\n",
+              "<table border=\"1\" class=\"dataframe\">\n",
+              "  <thead>\n",
+              "    <tr style=\"text-align: right;\">\n",
+              "      <th></th>\n",
+              "      <th>A</th>\n",
+              "      <th>B</th>\n",
+              "      <th>C</th>\n",
+              "    </tr>\n",
+              "  </thead>\n",
+              "  <tbody>\n",
+              "    <tr>\n",
+              "      <th>0</th>\n",
+              "      <td>19.00</td>\n",
+              "      <td>20.00</td>\n",
+              "      <td>16.75</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>1</th>\n",
+              "      <td>8.00</td>\n",
+              "      <td>3.00</td>\n",
+              "      <td>12.75</td>\n",
+              "    </tr>\n",
+              "    <tr>\n",
+              "      <th>2</th>\n",
+              "      <td>16.75</td>\n",
+              "      <td>10.75</td>\n",
+              "      <td>12.75</td>\n",
+              "    </tr>\n",
+              "  </tbody>\n",
+              "</table>\n",
+              "</div>"
+            ],
+            "text/plain": [
+              "       A      B      C\n",
+              "0  19.00  20.00  16.75\n",
+              "1   8.00   3.00  12.75\n",
+              "2  16.75  10.75  12.75"
+            ]
+          },
+          "execution_count": 127,
+          "metadata": {},
+          "output_type": "execute_result"
+        }
+      ],
+      "source": [
+        "fill = A.stack().mean()\n",
+        "A.add(B, fill_value=fill)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## Ufuncs: Operations Between DataFrame and Series\n",
+        "\n",
+        "When performing operations between a ``DataFrame`` and a ``Series``, the index and column alignment is similarly maintained.\n",
+        "Operations between a ``DataFrame`` and a ``Series`` are similar to operations between a two-dimensional and one-dimensional NumPy array.\n",
+        "Consider one common operation, where we find the difference of a two-dimensional array and one of its rows:"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 128,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [
+        {
+          "data": {
+            "text/plain": [
+              "array([[1, 5, 5, 9],\n",
+              "       [3, 5, 1, 9],\n",
+              "       [1, 9, 3, 7]])"
+            ]
+          },
+          "execution_count": 128,
+          "metadata": {},
+          "output_type": "execute_result"
+        }
+      ],
+      "source": [
+        "A = rng.randint(10, size=(3, 4))\n",
+        "A"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 129,
+      "metadata": {
+        "collapsed": false
+      },
+      "outputs": [
+        {
+          "data": {
+            "text/plain": [
+              "array([[ 0,  0,  0,  0],\n",
+              "       [ 2,  0, -4,  0],\n",
+              "       [ 0,  4, -2, -2]])"
+            ]
+          },
+          "execution_count": 129,
+          "metadata": {},
+          "output_type": "execute_result"
+        }
+      ],
+      "source": [
+        "A - A[0]"
+      ]
+    }
+  ],
+  "metadata": {
+    "anaconda-cloud": {},
+    "kernelspec": {
+      "display_name": "Python 3",
+      "language": "python",
+      "name": "python3"
+    },
+    "language_info": {
+      "codemirror_mode": {
+        "name": "ipython",
+        "version": 3
+      },
+      "file_extension": ".py",
+      "mimetype": "text/x-python",
+      "name": "python",
+      "nbconvert_exporter": "python",
+      "pygments_lexer": "ipython3",
+      "version": "3.10.0"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}