|
27 | 27 | "import platform # some of the subsequent code depends on operating system\n",
|
28 | 28 | "\n",
|
29 | 29 | "import pandas as pd\n",
|
30 |
| - "import matplotlib.pyplot as plt" |
31 |
| - ] |
32 |
| - }, |
33 |
| - { |
34 |
| - "cell_type": "code", |
35 |
| - "execution_count": null, |
36 |
| - "metadata": {}, |
37 |
| - "outputs": [], |
38 |
| - "source": [ |
39 |
| - "%matplotlib inline" |
| 30 | + "import matplotlib.pyplot as plt\n", |
| 31 | + "import seaborn as sns" |
40 | 32 | ]
|
41 | 33 | },
|
42 | 34 | {
|
|
147 | 139 | "df.head()"
|
148 | 140 | ]
|
149 | 141 | },
|
| 142 | + { |
| 143 | + "cell_type": "markdown", |
| 144 | + "metadata": {}, |
| 145 | + "source": [ |
| 146 | + "Now lets explore this data a little, first, how many records do we have?" |
| 147 | + ] |
| 148 | + }, |
150 | 149 | {
|
151 | 150 | "cell_type": "code",
|
152 | 151 | "execution_count": null,
|
153 | 152 | "metadata": {},
|
154 | 153 | "outputs": [],
|
| 154 | + "source": [] |
| 155 | + }, |
| 156 | + { |
| 157 | + "cell_type": "markdown", |
| 158 | + "metadata": {}, |
155 | 159 | "source": [
|
156 |
| - "# How many records do we have?\n", |
157 |
| - "len(df)" |
| 160 | + "Now lets look at a specific name, lets make a new dataframe that includes only your name and look at the first 5 rows" |
158 | 161 | ]
|
159 | 162 | },
|
| 163 | + { |
| 164 | + "cell_type": "code", |
| 165 | + "execution_count": null, |
| 166 | + "metadata": { |
| 167 | + "scrolled": true |
| 168 | + }, |
| 169 | + "outputs": [], |
| 170 | + "source": [] |
| 171 | + }, |
160 | 172 | {
|
161 | 173 | "cell_type": "markdown",
|
162 | 174 | "metadata": {},
|
163 | 175 | "source": [
|
164 |
| - "Now that we have the data in a dataframe, we want to move the year and sex columns into the index, leaving only columns for name and birth count. We can use the `set_index` method of the dataframe for this." |
| 176 | + "Lets now look at some stats for your name" |
165 | 177 | ]
|
166 | 178 | },
|
167 | 179 | {
|
168 | 180 | "cell_type": "code",
|
169 | 181 | "execution_count": null,
|
170 | 182 | "metadata": {},
|
171 | 183 | "outputs": [],
|
172 |
| - "source": [ |
173 |
| - "df = df.set_index(keys=['year', 'sex'])\n", |
174 |
| - "df.head()" |
175 |
| - ] |
| 184 | + "source": [] |
176 | 185 | },
|
177 | 186 | {
|
178 | 187 | "cell_type": "markdown",
|
179 | 188 | "metadata": {},
|
180 | 189 | "source": [
|
181 |
| - "Now we need a function that, given a name and a sex, returns a series containing the number of births by year." |
| 190 | + "When was your name at peak popularity?" |
182 | 191 | ]
|
183 | 192 | },
|
184 | 193 | {
|
185 | 194 | "cell_type": "code",
|
186 | 195 | "execution_count": null,
|
187 | 196 | "metadata": {},
|
188 | 197 | "outputs": [],
|
| 198 | + "source": [] |
| 199 | + }, |
| 200 | + { |
| 201 | + "cell_type": "markdown", |
| 202 | + "metadata": {}, |
189 | 203 | "source": [
|
190 |
| - "def get_births_series(df, name, sex):\n", |
191 |
| - " single_sex_df = df.xs(sex, level='sex')\n", |
192 |
| - " return single_sex_df[single_sex_df.name == name]['births']" |
| 204 | + "How can we convert the raw birth numbers into percent of births that year? Lets make a new column for that" |
193 | 205 | ]
|
194 | 206 | },
|
195 | 207 | {
|
196 | 208 | "cell_type": "code",
|
197 | 209 | "execution_count": null,
|
198 | 210 | "metadata": {},
|
199 | 211 | "outputs": [],
|
| 212 | + "source": [] |
| 213 | + }, |
| 214 | + { |
| 215 | + "cell_type": "markdown", |
| 216 | + "metadata": {}, |
200 | 217 | "source": [
|
201 |
| - "matthews = get_births_series(df, 'Matthew', 'M')\n", |
202 |
| - "matthews.head()" |
| 218 | + "Wow, some of these percentages are really small, why dont we change it to number of births of a given name per million births that year" |
203 | 219 | ]
|
204 | 220 | },
|
205 | 221 | {
|
206 | 222 | "cell_type": "code",
|
207 | 223 | "execution_count": null,
|
208 | 224 | "metadata": {},
|
209 | 225 | "outputs": [],
|
| 226 | + "source": [] |
| 227 | + }, |
| 228 | + { |
| 229 | + "cell_type": "markdown", |
| 230 | + "metadata": {}, |
210 | 231 | "source": [
|
211 |
| - "plt.style.use('seaborn')\n", |
212 |
| - "matthews.plot(title='Annual count of births for name %s' % 'Matthew')" |
| 232 | + "Why dont we make a graph of how common your name is over the years" |
213 | 233 | ]
|
214 | 234 | },
|
| 235 | + { |
| 236 | + "cell_type": "code", |
| 237 | + "execution_count": null, |
| 238 | + "metadata": {}, |
| 239 | + "outputs": [], |
| 240 | + "source": [] |
| 241 | + }, |
215 | 242 | {
|
216 | 243 | "cell_type": "markdown",
|
217 | 244 | "metadata": {},
|
218 | 245 | "source": [
|
219 |
| - "Now one last function to output a plot of the series. Just the bare minimum for now." |
| 246 | + "If your name is like mine, there is actually a bunch of shading indicating variance, why would that be?\n", |
| 247 | + "\n", |
| 248 | + "\n", |
| 249 | + "Its because this data is also split on gender, so there is a chance to have the name listed twice because of gender. The gender split could be interesting though, so lets look at it graphically" |
220 | 250 | ]
|
221 | 251 | },
|
222 | 252 | {
|
223 | 253 | "cell_type": "code",
|
224 | 254 | "execution_count": null,
|
225 | 255 | "metadata": {},
|
226 | 256 | "outputs": [],
|
| 257 | + "source": [] |
| 258 | + }, |
| 259 | + { |
| 260 | + "cell_type": "markdown", |
| 261 | + "metadata": {}, |
| 262 | + "source": [ |
| 263 | + "There is a actually a really good breakdown of different name trends by Tim Urban at https://waitbutwhy.com/2013/12/how-to-name-baby.html\n", |
| 264 | + "\n", |
| 265 | + "so lets look quickly at a couple of the interesting trends he found with our code" |
| 266 | + ] |
| 267 | + }, |
| 268 | + { |
| 269 | + "cell_type": "markdown", |
| 270 | + "metadata": {}, |
227 | 271 | "source": [
|
228 |
| - "def create_births_figure(s, sex, name):\n", |
229 |
| - " plt.style.use('seaborn')\n", |
230 |
| - " sex_full = 'female'\n", |
231 |
| - " if sex == 'M':\n", |
232 |
| - " sex_full = 'male'\n", |
233 |
| - " plot = s.plot(title='Annual count of US %s births for name %s' % (sex_full, name))\n", |
234 |
| - " return plot.get_figure()" |
| 272 | + "### Name Fads\n", |
| 273 | + "\n", |
| 274 | + "A name fad is when a specific name gets really popular for a specific generation, causing a person's age to be reasonable guessed based on their name alone.\n", |
| 275 | + "\n", |
| 276 | + "Check out Jennifer, Ashley, or Shirley for some examples" |
235 | 277 | ]
|
236 | 278 | },
|
237 | 279 | {
|
238 | 280 | "cell_type": "code",
|
239 | 281 | "execution_count": null,
|
240 | 282 | "metadata": {},
|
241 | 283 | "outputs": [],
|
| 284 | + "source": [] |
| 285 | + }, |
| 286 | + { |
| 287 | + "cell_type": "markdown", |
| 288 | + "metadata": {}, |
242 | 289 | "source": [
|
243 |
| - "fig = create_births_figure(matthews, 'M', 'Matthew')" |
| 290 | + "### Gender Takeovers\n", |
| 291 | + "\n", |
| 292 | + "Sometimes a name that is uncommon but solely one gender becomes extremely popular for the other gender, to the point that the original gender stops using it\n", |
| 293 | + "\n", |
| 294 | + "Check out Lynn or Aubrey" |
244 | 295 | ]
|
245 | 296 | },
|
246 | 297 | {
|
|
267 | 318 | "name": "python",
|
268 | 319 | "nbconvert_exporter": "python",
|
269 | 320 | "pygments_lexer": "ipython3",
|
270 |
| - "version": "3.6.5" |
| 321 | + "version": "3.7.3" |
271 | 322 | }
|
272 | 323 | },
|
273 | 324 | "nbformat": 4,
|
|
0 commit comments