Skip to content

Commit 265867b

Browse files
chennesyjt14den
authored andcommitted
fix tidy closing tag
1 parent 06640a5 commit 265867b

File tree

1 file changed

+5
-4
lines changed

1 file changed

+5
-4
lines changed

episodes/tidy.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,6 @@ To address this we can reshape our data in a long format. This is sometimes call
5656
## Tidy Data
5757

5858
Tidy data is a standard way of organizing data values within a dataset, making it easier to work with. Here are the key principles of tidy data:
59-
6059
1. Every column holds a single variable, like "month" or "temperature."
6160
2. Every row represents a single observation, like circulation counts by branch and month.
6261
3. Every cell contains a single value.
@@ -70,7 +69,6 @@ R for Data Science [12.1](https://r4ds.had.co.nz/tidy-data.html#fig:tidy-structu
7069
### Benefits of Tidy Data
7170

7271
Transforming our data into a tidy data format provides several advantages:
73-
7472
- Python operations, such as visualization, filtering, and statistical analysis libraries, work better with data in a tidy format.
7573
- Tidy data makes transforming, summarizing, and visualizing information easier. For instance, comparing monthly trends or calculating annual averages becomes more straightforward.
7674
- As datasets grow, tidy data ensures that they remain manageable and analyses remain accurate.
@@ -338,7 +336,6 @@ Let's save `df_long` to use in the next episode.
338336
```python
339337
df.to_pickle('data/df_long.pkl')
340338
```
341-
342339
::::::::::::::::::::::::::::::::::::::: challenge
343340

344341
## Tidy Data Principles
@@ -399,10 +396,11 @@ low_circ.sort_values(by='circulation', ascending=False)
399396
:::::::::::::::::::::::::
400397

401398
::::::::::::::::::::::::::::::::::::::::::::::::::
399+
402400
::::::::::::::::::::::::::::::::::::::: challenge
403401

404402
## Group and aggregate for circulation by year
405-
How would you create a DataFrame that sums up the circulation by year across all branches? In other words you want a DataFrame that includes one row for each year, and columns for 'year' and 'sum', the latter of which is the sum of all circulation figures for the entire year.
403+
How would you create a subset of `df_long` that sums up the circulation by year across all branches? In other words you want a view of the DataFrame that includes one row for each year, and columns for 'year' and 'sum', the latter of which shows the sum of circulation for all branches in each year.
406404

407405

408406
::::::::::::::: solution
@@ -427,12 +425,15 @@ df_long.groupby(['year'])['circulation'].agg(['sum'])
427425
| 2020 | 2726156 |
428426
| 2021 | 3184327 |
429427
| 2022 | 3342472 |
428+
430429
:::::::::::::::::::::::::
431430

432431
::::::::::::::::::::::::::::::::::::::::::::::::::
432+
433433
:::::::::::::::::::::::::::::::::::::::: keypoints
434434

435435
- In tidy data each variable forms a column, each observation forms a row, and each type of observational unit forms a table.
436436
- Using pandas for data manipulation to reshape data is fundamental for preparing data for analysis.
437437

438438
::::::::::::::::::::::::::::::::::::::::::::::::::
439+

0 commit comments

Comments
 (0)