Skip to content

Commit 58937a0

Browse files
authored
Merge pull request #11 from projectpythia-mystmd/agoose77/chore-add-metrics
📊 Add metrics workflow
2 parents efe4bda + 64b3de1 commit 58937a0

File tree

3 files changed

+382
-1
lines changed

3 files changed

+382
-1
lines changed

.github/workflows/deploy.yml

+11-1
Original file line numberDiff line numberDiff line change
@@ -36,8 +36,18 @@ jobs:
3636
node-version: 18.x
3737
- name: Install MyST Markdown
3838
run: npm install -g mystmd
39+
- uses: actions/setup-python@v5
40+
with:
41+
python-version: '3.12'
42+
cache: 'pip'
43+
- name: Install Python dependencies
44+
run: pip install -r requirements.txt
3945
- name: Build HTML Assets
40-
run: myst build --html
46+
run: myst build --html --execute
47+
# Only expose secrets here
48+
env:
49+
PRIVATE_KEY: ${{ secrets.PRIVATE_KEY }}
50+
PRIVATE_KEY_ID: ${{ secrets.PRIVATE_KEY_ID }}
4151
- name: Upload artifact
4252
uses: actions/upload-pages-artifact@v3
4353
with:

portal/metrics.md

+366
Original file line numberDiff line numberDiff line change
@@ -1 +1,367 @@
1+
---
2+
kernelspec:
3+
name: python3
4+
display_name: Python 3
5+
---
6+
17
# Metrics
8+
9+
```{code-cell} python3
10+
:tags: [remove-cell]
11+
12+
import datetime
13+
import json
14+
import os
15+
16+
import cartopy
17+
import google
18+
import matplotlib
19+
import matplotlib.cm as cm
20+
import matplotlib.colors as colors
21+
import matplotlib.pyplot as plt
22+
import numpy as np
23+
from google.analytics.data_v1beta import BetaAnalyticsDataClient
24+
from google.analytics.data_v1beta.types import DateRange, Dimension, Metric, RunReportRequest
25+
26+
# Project ID Numbers
27+
PORTAL_ID = '266784902'
28+
FOUNDATIONS_ID = '281776420'
29+
COOKBOOKS_ID = '324070631'
30+
31+
# Access Secrets
32+
PRIVATE_KEY_ID = os.environ.get('PRIVATE_KEY_ID')
33+
# Ensure GH secrets doesn't introduce extra '\' new line characters (related to '\' being an escape character)
34+
PRIVATE_KEY = os.environ.get('PRIVATE_KEY').replace('\\n', '\n')
35+
36+
credentials_dict = {
37+
'type': 'service_account',
38+
'project_id': 'cisl-vast-pythia',
39+
'private_key_id': PRIVATE_KEY_ID,
40+
'private_key': PRIVATE_KEY,
41+
'client_email': '[email protected]',
42+
'client_id': '113402578114110723940',
43+
'auth_uri': 'https://accounts.google.com/o/oauth2/auth',
44+
'token_uri': 'https://oauth2.googleapis.com/token',
45+
'auth_provider_x509_cert_url': 'https://www.googleapis.com/oauth2/v1/certs',
46+
'client_x509_cert_url': 'https://www.googleapis.com/robot/v1/metadata/x509/pythia-metrics-api%40cisl-vast-pythia.iam.gserviceaccount.com',
47+
'universe_domain': 'googleapis.com',
48+
}
49+
50+
try:
51+
client = BetaAnalyticsDataClient.from_service_account_info(credentials_dict)
52+
except google.auth.exceptions.MalformedError as e:
53+
print('Malformed Error:', repr(e))
54+
# Insight into reason for failure without exposing secret key
55+
# 0: Secret not found, else malformed
56+
# 706: extra quote, 732: extra '\', 734: both
57+
print('Length of PRIVATE_KEY:', len(PRIVATE_KEY))
58+
59+
pre_project_date = '2020-03-31'
60+
```
61+
62+
63+
Last Updated: {eval}`str(datetime.datetime.now())`
64+
65+
This metrics page provides an overview of user activity collected by Google Analytics across the three pillars of Project Pythia: our portal which includes information about the project as well as our resource gallery, our Foundations book, and our Cookbooks gallery. Information is either all-time (from a pre-project start date of March 2020) or year-to-date as indicated and is updated nightly to provide real-time and automated insights into our engagement, impact, and audience reach. If you would like to request a different metrics analysis, timeframe, or view, please [open a GitHub issue](https://github.com/ProjectPythia/projectpythia.github.io/issues/new/choose).
66+
67+
## Table of Total Active Users by Project
68+
69+
```{code-cell} python3
70+
:tags: [remove-cell]
71+
72+
def _format_rounding(value):
73+
"""
74+
Helper function for rounding string displays. 1,232 -> 1.2K
75+
"""
76+
return f'{round(value / 1000, 1):.1f}K'
77+
78+
79+
# The rest of this file alternates between functions for requesting information from Google Analytics
80+
# And functions that use that request image to form either a .json or a .png file to be used in write-metrics-md.py
81+
def _run_total_users_report(property_id):
82+
"""
83+
Function for requesting cumulative active users from a project since project start.
84+
"""
85+
request = RunReportRequest(
86+
property=f'properties/{property_id}',
87+
dimensions=[],
88+
metrics=[Metric(name='activeUsers')],
89+
date_ranges=[DateRange(start_date=pre_project_date, end_date='today')],
90+
)
91+
response = client.run_report(request)
92+
93+
total_users = 0
94+
for row in response.rows:
95+
total_users += int(row.metric_values[0].value)
96+
97+
return _format_rounding(total_users)
98+
```
99+
100+
This table displays the total active users of our 3 Pythia projects over the life of Project Pythia. Google analytics defines active users as the number of unique people who have visited the site and met certain [engagement requirements](https://support.google.com/analytics/answer/9234069?sjid=8697784525616937194-NC). You can read more from the [GA4 "Understand User Metrics" documentation](https://support.google.com/analytics/answer/12253918?hl=en).
101+
102+
```{code-cell} python3
103+
:tags: [remove-cell]
104+
105+
portal_users = _run_total_users_report(PORTAL_ID)
106+
foundations_users = _run_total_users_report(FOUNDATIONS_ID)
107+
cookbooks_users = _run_total_users_report(COOKBOOKS_ID)
108+
```
109+
110+
(table-total-users)=
111+
| Project | All-Time Users |
112+
| ----------- | ------------------------- |
113+
| Portal | {eval}`portal_users` |
114+
| Foundations | {eval}`foundations_users` |
115+
| Cookbooks | {eval}`cookbooks_users` |
116+
117+
## Chart of Active Users by Project Since Year Start
118+
119+
```{code-cell} python3
120+
:tags: [remove-cell]
121+
122+
def _run_active_users_this_year(property_id):
123+
"""
124+
Function for requesting active users by day from a project since year start.
125+
"""
126+
current_year = datetime.datetime.now().year
127+
start_date = f'{current_year}-01-01'
128+
129+
request = RunReportRequest(
130+
property=f'properties/{property_id}',
131+
dimensions=[Dimension(name='date')],
132+
metrics=[Metric(name='activeUsers')],
133+
date_ranges=[DateRange(start_date=start_date, end_date='today')],
134+
)
135+
response = client.run_report(request)
136+
137+
dates = []
138+
user_counts = []
139+
for row in response.rows:
140+
date_str = row.dimension_values[0].value
141+
date = datetime.datetime.strptime(date_str, '%Y%m%d')
142+
dates.append(date)
143+
user_counts.append(int(row.metric_values[0].value))
144+
145+
# Days need to be sorted chronologically
146+
return zip(*sorted(zip(dates, user_counts), key=lambda x: x[0]))
147+
148+
149+
def plot_projects_this_year(PORTAL_ID, FOUNDATIONS_ID, COOKBOOKS_ID):
150+
"""
151+
Function for taking year-to-date active users by day and plotting it for each project.
152+
"""
153+
portal_dates, portal_users = _run_active_users_this_year(PORTAL_ID)
154+
foundations_dates, foundations_users = _run_active_users_this_year(FOUNDATIONS_ID)
155+
cookbooks_dates, cookbooks_users = _run_active_users_this_year(COOKBOOKS_ID)
156+
157+
# Plotting code
158+
plt.figure(figsize=(10, 5.5))
159+
plt.title('Year-to-Date Pythia Active Users', fontsize=15)
160+
161+
plt.plot(portal_dates, portal_users, color='purple', label='Portal')
162+
plt.plot(foundations_dates, foundations_users, color='royalblue', label='Foundations')
163+
plt.plot(cookbooks_dates, cookbooks_users, color='indianred', label='Cookbooks')
164+
165+
plt.legend(fontsize=12, loc='upper right')
166+
167+
plt.xlabel('Date', fontsize=12)
168+
plt.show()
169+
170+
```
171+
172+
This line plot displays active users for our 3 Pythia projects (Portal in purple, Foundations in blue, and Cookbooks in salmon) since January 1st of the current year.
173+
174+
```{code-cell} python3
175+
:tags: [remove-input]
176+
:name: plot-active-users
177+
:caption: Chart of active users by project since year start.
178+
179+
plot_projects_this_year(PORTAL_ID, FOUNDATIONS_ID, COOKBOOKS_ID)
180+
```
181+
182+
## Chart of Top 5 Pages by Project
183+
184+
```{code-cell} python3
185+
:tags: [remove-cell]
186+
187+
def _run_top_pages_report(property_id):
188+
"""
189+
Function for requesting top 5 pages from a project.
190+
"""
191+
request = RunReportRequest(
192+
property=f'properties/{property_id}',
193+
dimensions=[Dimension(name='pageTitle')],
194+
date_ranges=[DateRange(start_date=pre_project_date, end_date='today')],
195+
metrics=[Metric(name='screenPageViews')],
196+
)
197+
response = client.run_report(request)
198+
199+
views_dict = {}
200+
for row in response.rows:
201+
page = row.dimension_values[0].value
202+
views = int(row.metric_values[0].value)
203+
views_dict[page] = views
204+
205+
# Sort by views and grab the top 5
206+
top_pages = sorted(views_dict.items(), key=lambda item: item[1], reverse=True)[:5]
207+
# String manipulation on page titles "Cartopy - Pythia Foundations" -> "Cartopy"
208+
pages = [page.split('—')[0] for page, _ in top_pages]
209+
views = [views for _, views in top_pages]
210+
211+
# Reverse order of lists, so they'll plot with most visited page on top (i.e. last)
212+
return pages[::-1], views[::-1]
213+
def plot_top_pages(PORTAL_ID, FOUNDATIONS_ID, COOKBOOKS_ID):
214+
"""
215+
Function that takes the top 5 viewed pages for all 3 projects and plot them on a histogram.
216+
"""
217+
portal_pages, portal_views = _run_top_pages_report(PORTAL_ID)
218+
foundations_pages, foundations_views = _run_top_pages_report(FOUNDATIONS_ID)
219+
cookbooks_pages, cookbooks_views = _run_top_pages_report(COOKBOOKS_ID)
220+
221+
# Plotting code
222+
fig, ax = plt.subplots(figsize=(10, 5.5))
223+
plt.title('All-Time Top Pages', fontsize=15)
224+
225+
y = np.arange(5) # 0-4 for Cookbooks
226+
y2 = np.arange(6, 11) # 6-10 for Foundations
227+
y3 = np.arange(12, 17) # 12-16 for Portal
228+
229+
bar1 = ax.barh(y3, portal_views, align='center', label='Portal', color='purple')
230+
bar2 = ax.barh(y2, foundations_views, align='center', label='Foundations', color='royalblue')
231+
bar3 = ax.barh(y, cookbooks_views, align='center', label='Cookbooks', color='indianred')
232+
233+
y4 = np.append(y, y2)
234+
y4 = np.append(y4, y3) # 0-4,6-19,12-6 for page labels to have a gap between projects
235+
pages = cookbooks_pages + foundations_pages + portal_pages # List of all pages
236+
ax.set_yticks(y4, labels=pages, fontsize=12)
237+
238+
# Adds round-formatted views label to end of each bar
239+
ax.bar_label(bar1, fmt=_format_rounding, padding=5, fontsize=10)
240+
ax.bar_label(bar2, fmt=_format_rounding, padding=5, fontsize=10)
241+
ax.bar_label(bar3, fmt=_format_rounding, padding=5, fontsize=10)
242+
243+
ax.set_xscale('log')
244+
ax.set_xlim([10, 10**5]) # set_xlim must be after setting xscale to log
245+
ax.set_xlabel('Page Views', fontsize=12)
246+
247+
plt.legend(fontsize=12, loc='lower right')
248+
plt.show()
249+
```
250+
251+
This bar-chart displays the top 5 pages by project over the life of Project Pythia, as determined by screen page views. Screen page views refers to the number of times users viewed a page, including repeated visits. To learn more visit the [GA4 "API Dimensions & Metrics" page](https://developers.google.com/analytics/devguides/reporting/data/v1/api-schema).
252+
253+
```{code-cell} python3
254+
:tags: [remove-input]
255+
:name: chart-top-five-pages
256+
:caption: Bar chart of the top five pages by project over the life of Project Pythia
257+
258+
plot_top_pages(PORTAL_ID, FOUNDATIONS_ID, COOKBOOKS_ID)
259+
```
260+
261+
## Map of Total Foundation Active Users by Country
262+
263+
```{code-cell} python3
264+
:tags: [remove-cell]
265+
266+
def _run_usersXcountry_report(property_id):
267+
"""
268+
Function for requesting users by country for a project.
269+
"""
270+
request = RunReportRequest(
271+
property=f'properties/{property_id}',
272+
dimensions=[Dimension(name='country')],
273+
metrics=[Metric(name='activeUsers')],
274+
date_ranges=[DateRange(start_date=pre_project_date, end_date='today')],
275+
)
276+
response = client.run_report(request)
277+
278+
user_by_country = {}
279+
for row in response.rows:
280+
country = row.dimension_values[0].value
281+
users = int(row.metric_values[0].value)
282+
user_by_country[country] = user_by_country.get(country, 0) + users
283+
284+
return user_by_country
285+
def plot_usersXcountry(FOUNDATIONS_ID):
286+
"""
287+
Function for taking users by country for Pythia Foundations and plotting them on a map.
288+
"""
289+
users_by_country = _run_usersXcountry_report(FOUNDATIONS_ID)
290+
291+
# Google API Country names do not match Cartopy Country Shapefile names
292+
dict_api2cartopy = {
293+
'Tanzania': 'United Republic of Tanzania',
294+
'United States': 'United States of America',
295+
'Congo - Kinshasa': 'Democratic Republic of the Congo',
296+
'Bahamas': 'The Bahamas',
297+
'Timor-Leste': 'East Timor',
298+
'C\u00f4te d\u2019Ivoire': 'Ivory Coast',
299+
'Bosnia & Herzegovina': 'Bosnia and Herzegovina',
300+
'Serbia': 'Republic of Serbia',
301+
'Trinidad & Tobago': 'Trinidad and Tobago',
302+
}
303+
304+
for key in dict_api2cartopy:
305+
users_by_country[dict_api2cartopy[key]] = users_by_country.pop(key)
306+
307+
# Sort by views and grab the top 10 countries for a text box
308+
top_10_countries = sorted(users_by_country.items(), key=lambda item: item[1], reverse=True)[:10]
309+
top_10_text = '\n'.join(
310+
f'{country}: {_format_rounding(value)}' for i, (country, value) in enumerate(top_10_countries)
311+
)
312+
313+
# Plotting code
314+
fig = plt.figure(figsize=(10, 4))
315+
ax = plt.axes(projection=cartopy.crs.PlateCarree(), frameon=False)
316+
ax.set_title('All-Time Pythia Foundations Users by Country', fontsize=15)
317+
318+
shapefile = cartopy.io.shapereader.natural_earth(category='cultural', resolution='110m', name='admin_0_countries')
319+
reader = cartopy.io.shapereader.Reader(shapefile)
320+
countries = reader.records()
321+
322+
colormap = plt.get_cmap('Blues')
323+
newcmp = colors.ListedColormap(colormap(np.linspace(0.2, 1, 128))) # Truncate colormap to remove white hues
324+
newcmp.set_extremes(under='grey')
325+
326+
norm = colors.LogNorm(vmin=1, vmax=max(users_by_country.values())) # Plot on log scale
327+
mappable = cm.ScalarMappable(norm=norm, cmap=newcmp)
328+
329+
# Loop through countries and plot their color
330+
for country in countries:
331+
country_name = country.attributes['SOVEREIGNT']
332+
if country_name in users_by_country.keys():
333+
facecolor = newcmp(norm(users_by_country[country_name]))
334+
ax.add_geometries(
335+
[country.geometry],
336+
cartopy.crs.PlateCarree(),
337+
facecolor=facecolor,
338+
edgecolor='white',
339+
linewidth=0.7,
340+
norm=matplotlib.colors.LogNorm(),
341+
)
342+
else:
343+
ax.add_geometries(
344+
[country.geometry], cartopy.crs.PlateCarree(), facecolor='grey', edgecolor='white', linewidth=0.7
345+
)
346+
347+
# Add colorbar
348+
cax = fig.add_axes([0.05, -0.015, 0.7, 0.03]) # [x0, y0, width, height]
349+
cbar = fig.colorbar(mappable=mappable, cax=cax, spacing='uniform', orientation='horizontal', extend='min')
350+
cbar.set_label('Unique Users')
351+
352+
# Add top 10 countries text
353+
props = dict(boxstyle='round', facecolor='white', edgecolor='white')
354+
ax.text(1.01, 0.5, top_10_text, transform=ax.transAxes, fontsize=12, verticalalignment='center', bbox=props)
355+
356+
plt.show()
357+
```
358+
359+
This map displays the number of active users per country for Pythia Foundations for the entire life of Project Pythia.
360+
361+
```{code-cell} python3
362+
:tags: [remove-input]
363+
:name: map-active-users-country
364+
:caption: Map of the number of active users per country for Pythia Foundations for the entire life of Project Pythia.
365+
366+
plot_usersXcountry(FOUNDATIONS_ID)
367+
```

requirements.txt

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
google-analytics-data
2+
cartopy
3+
matplotlib
4+
jupyter-server
5+
ipykernel

0 commit comments

Comments
 (0)