Skip to content

Commit 23561fe

Browse files
author
tiffanychu90
committed
add exercise 6, 8 feedback
1 parent 1fa20dd commit 23561fe

File tree

1 file changed

+136
-1
lines changed

1 file changed

+136
-1
lines changed

shweta-folder/feedback.md

+136-1
Original file line numberDiff line numberDiff line change
@@ -146,4 +146,139 @@ def make_chart(df, x_col, y_col, colorscale):
146146
chart = styleguide.preset_chart_config(chart)
147147
display(chart)
148148
```
149-
* Take a look [in this notebook](https://github.com/cal-itp/data-analyses/blob/main/bus_service_increase/competitive-routes.ipynb) to see how you could also weave in HTML and Markdown with `display(HTML())` and `display(Markdown())` to programmatically generate captions. This is the Jupyter notebook equivalent of creating RMarkdown docs the way [Urban Institute creates their fact sheets](https://urban-institute.medium.com/iterated-fact-sheets-with-r-markdown-d685eb4eafce).
149+
* Take a look [in this notebook](https://github.com/cal-itp/data-analyses/blob/main/bus_service_increase/competitive-routes.ipynb) to see how you could also weave in HTML and Markdown with `display(HTML())` and `display(Markdown())` to programmatically generate captions. This is the Jupyter notebook equivalent of creating RMarkdown docs the way [Urban Institute creates their fact sheets](https://urban-institute.medium.com/iterated-fact-sheets-with-r-markdown-d685eb4eafce).
150+
151+
## Exercise 6
152+
* Re-using the same function to make multiple charts. If it's the same kind of chart (bar chart), then you can probably set up the function differently to take different arguments. This takes a little practice.
153+
* This function, `make_chart`, takes Operators_Name as an argument, but I don't see it used anywhere inside the function. I see `"Operators_Name"`, which is a string of that exact phrase, but not a variable (no quotation marks).
154+
```
155+
def make_chart(Operators_Name, colorscale):
156+
chart = (alt.Chart(operators_county)
157+
.mark_bar()
158+
.encode(
159+
x=alt.X("county_name", title="County"),
160+
y=alt.Y("Operators_Name", title="Number of Operators"),
161+
color = alt.Color("Operators_Name",
162+
scale = alt.Scale(range=colorscale),
163+
),
164+
tooltip = ["county_name", "Operators_Name"]
165+
).properties(title="Operators by County")
166+
.interactive()
167+
)
168+
chart = styleguide.preset_chart_config(chart)
169+
display(chart)
170+
171+
make_chart(operators_district, cp.CALITP_CATEGORY_BRIGHT_COLORS)
172+
```
173+
* Instead, I would set up a `make_chart` function that takes a df and an x-column (district or county). Since you're plotting the count of operators for the chart, the y-column is shared. Look at all the places where `df`, `x_col` is used. Also, note how I specified `chart_title` within the function and used that variable later.
174+
```
175+
def make_chart(df: pd.DataFrame, x_col: str):
176+
# Let's create chart_title as a variable, and we will use it later
177+
# we want 2 different titles depending on the kind of chart
178+
179+
if x_col == "county_name":
180+
chart_title = "Operators by County"
181+
elif x_col == "Caltrans_District":
182+
chart_title = "Operators by District"
183+
184+
chart = (alt.Chart(df)
185+
.mark_bar()
186+
.encode(
187+
x=alt.X(x_col, title = f"{x_col}.replace('_', ' ').title()"),
188+
y=alt.Y("Operators_Name", title="Number of Operators"),
189+
color = alt.Color(
190+
"Operators_Name",
191+
scale = alt.Scale(range=cp.CALITP_CATEGORY_BRIGHT_COLORS),
192+
),
193+
tooltip = [x_col "Operators_Name"]
194+
).properties(title = chart_title)
195+
.interactive()
196+
)
197+
198+
return chart
199+
200+
district_chart = make_chart(operators_district, "Caltrans_District")
201+
202+
county_chart = make_chart(operators_county, "county_name")
203+
```
204+
* Add a couple more things to the `altair` chart
205+
```
206+
# adding tooltip (hover over to display the list of columns)
207+
# and interactive (can scroll with mouse to zoom)
208+
209+
chart = (alt.Chart(operators_county)
210+
.mark_bar()
211+
.encode(
212+
x=alt.X("county_name", title="County"),
213+
y=alt.Y("Operators_Name", title="Number of Operators"),
214+
color = alt.Color("Operators_Name",
215+
scale = alt.Scale(range=colorscale),
216+
),
217+
tooltip = ["county_name", "Operators_Name"]
218+
).properties(title="Operators by County")
219+
.interactive()
220+
)
221+
```
222+
* You can make a grouped bar chart in `altair` with your `operators_caltransdistrict` df.
223+
```
224+
# faceting is one way to get a grouped bar chart
225+
# https://github.com/altair-viz/altair/issues/1221
226+
# but the complicated part is that faceting is a more complex chart,
227+
# so will have to use `apply_chart_config(chart)` for a pared down
228+
# chart formatting instead of
229+
# `preset_chart_config(chart)`, which will error. you'll have to add
230+
# additional things like sizing yourself at the end.
231+
232+
chart = (alt.Chart(operators_caltransdistrict)
233+
.mark_bar()
234+
.encode(
235+
x=alt.X("county_name:O", title=""),
236+
y=alt.Y("Operators_Name:Q", title="Number of Operators"),
237+
color = alt.Color("county_name:O",
238+
scale = alt.Scale(
239+
range = cp.CALITP_CATEGORY_BOLD_COLORS)),
240+
tooltip = ["county_name", "Operators_Name"]
241+
).facet(
242+
column = alt.Column('Caltrans_District', title = "District")
243+
).properties(title="Operators by District and County")
244+
.resolve_scale(x="independent")
245+
.interactive()
246+
)
247+
248+
chart = styleguide.apply_chart_config(chart)
249+
250+
display(chart)
251+
```
252+
253+
## Exercise 8
254+
* Whenever you do spatial join, overlay, etc, you might get a new column called `index_right`, which I like to drop. We tend not to use the index anyway, and it's holding the index from your right df.
255+
```
256+
railstops_ca = gpd.sjoin(rail_stops.to_crs("EPSG:2229"),
257+
ca.to_crs("EPSG:2229"),
258+
how = "inner",
259+
predicate = "intersects"
260+
).drop(columns = "index_right")
261+
```
262+
* When you dissolve, you don't need to keep `Shape__Length` or `Shape__Area` from ESRI. Those units are usually not discernible (if it's in WGS 84, it's decimal degrees, and we won't use it anyway). Instead, generate your own `length` or `area`, because you would have projected the CRS and you would be clear whether the units are meters, feet, etc.
263+
* I prefer this line: `overlay_percentage = overlay_dissolve.assign(percent = (overlay_dissolve.geom_length / overlay_dissolve.rail_length)*100)`....which basically gets your percent in one line, and you no longer need the step to create the `half_rail_length` (more roundabout).
264+
* Since you're just assigning a new column, you can keep doing stuff on `overlay_dissolve`.
265+
```
266+
overlay_dissolve = overlay_dissolve.assign(
267+
percent = blah blah
268+
)
269+
270+
# this column name is the same name as the function...avoid because it's sharing the same name
271+
overlay_dissolve['railroutescheck'] = overlay_dissolve.apply(railroutescheck, axis=1)
272+
273+
#
274+
alternate way to write your apply function is:
275+
overlay_dissolve = overlay_dissolve.assign(
276+
rail_route_category = overlay_dissolve.apply(
277+
lambda x: 'less than half' if (x.percent > 0 and x.percent < 50)
278+
else 'more than half' if x.percent >= 50
279+
else 'never intersects SHN',
280+
axis=1)
281+
)
282+
```
283+
284+
## Exercise 9

0 commit comments

Comments
 (0)