You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: episodes/tidy.md
+5-4Lines changed: 5 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -56,7 +56,6 @@ To address this we can reshape our data in a long format. This is sometimes call
56
56
## Tidy Data
57
57
58
58
Tidy data is a standard way of organizing data values within a dataset, making it easier to work with. Here are the key principles of tidy data:
59
-
60
59
1. Every column holds a single variable, like "month" or "temperature."
61
60
2. Every row represents a single observation, like circulation counts by branch and month.
62
61
3. Every cell contains a single value.
@@ -70,7 +69,6 @@ R for Data Science [12.1](https://r4ds.had.co.nz/tidy-data.html#fig:tidy-structu
70
69
### Benefits of Tidy Data
71
70
72
71
Transforming our data into a tidy data format provides several advantages:
73
-
74
72
- Python operations, such as visualization, filtering, and statistical analysis libraries, work better with data in a tidy format.
75
73
- Tidy data makes transforming, summarizing, and visualizing information easier. For instance, comparing monthly trends or calculating annual averages becomes more straightforward.
76
74
- As datasets grow, tidy data ensures that they remain manageable and analyses remain accurate.
@@ -338,7 +336,6 @@ Let's save `df_long` to use in the next episode.
How would you create a DataFrame that sums up the circulation by year across all branches? In other words you want a DataFrame that includes one row for each year, and columns for 'year' and 'sum', the latter of which is the sum of all circulation figures for the entire year.
403
+
How would you create a subset of `df_long`that sums up the circulation by year across all branches? In other words you want a view of the DataFrame that includes one row for each year, and columns for 'year' and 'sum', the latter of which shows the sum of circulation for all branches in each year.
0 commit comments