Skip to content

Commit 61e55f7

Browse files
committed
What's your ML test score?
1 parent 4be6e3a commit 61e55f7

File tree

6 files changed

+71
-4
lines changed

6 files changed

+71
-4
lines changed

Gemfile

+2-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
source 'https://rubygems.org'
22

33
gem 'github-pages'
4-
gem 'jekyll-redirect-from'
4+
gem 'jekyll-compose', group: [:jekyll_plugins]
5+
gem 'jekyll-redirect-from', group: [:jekyll_plugins]

Gemfile.lock

+3
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,8 @@ GEM
8888
jekyll (~> 3.0)
8989
jekyll-coffeescript (1.0.1)
9090
coffee-script (~> 2.2)
91+
jekyll-compose (0.5.0)
92+
jekyll (>= 3.0.0)
9193
jekyll-default-layout (0.1.4)
9294
jekyll (~> 3.0)
9395
jekyll-feed (0.9.2)
@@ -211,6 +213,7 @@ PLATFORMS
211213

212214
DEPENDENCIES
213215
github-pages
216+
jekyll-compose
214217
jekyll-redirect-from
215218

216219
BUNDLED WITH

_layouts/post.html

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ <h1>{{ page.title }}</h1>
99
{% if page.subtitle %}
1010
<h2>{{ page.subtitle }}</h2>
1111
{% endif %}
12-
<div class="byline">{{ page.date | date: "%B %d, %Y" }}{% if page.author %} · by {{ page.author}}{% endif %} . in {% for category in page.categories %}<a href="/blog/categories.html#{{category}}" class="category">{{ category }}</a>{% endfor %}</div>
12+
<div class="byline">{{ page.date | date: "%B %d, %Y" }}{% if page.author %} · by {{ page.author}}{% endif %} . {% for category in page.categories %}{% if forloop.index0 != 0 %}, {% endif %}<a href="/blog/categories.html#{{category}}" class="category">{{ category }}</a>{% endfor %}</div>
1313
</div>
1414

1515
{% include toc.html %}

_posts/2017-03-26-the-startup-of-you.md

+3-1
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,9 @@
11
---
22
layout: post
33
title: The Start-up of You
4-
category: Books
4+
category:
5+
- Books
6+
- Entrepreneurship
57
excerpt: Nuggets from Reid Hoffman's the Start up of You.
68
---
79

Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
---
2+
layout: post
3+
title: What's your ML test score?
4+
category:
5+
- Machine Learning
6+
- Papers
7+
excerpt: "Paper summary: What's your ML test score? A rubric for ML testing and monitoring of production systems."
8+
---
9+
10+
Using machine-learning systems in production is very different from running offline experiments as you run into problems such as train/test skew, latency and resource requirements. The paper [What’s your ML Test Score?](https://research.google.com/pubs/pub45742.html) by Eric Breck etc. provides a rubric for measuring the quality of ML system design.
11+
12+
The rubric has four sections so we'll go over each of them.
13+
14+
### Tests for Features and Data
15+
16+
- Distributions of each feature
17+
- Features are same in both the training and serving stack
18+
- Relationship between different features and targets
19+
- Privacy control in model training
20+
- Cost of computing each feature
21+
- Does not contain features determined unsuitable for use
22+
- Time to add new features to production
23+
24+
Points around expensive and redundant features are important as they can affect the ability of the system to meet the desired latency and throughput requirements. One options to solve such issues is to pre-cache expensive features and use them at prediction time but this can yield to a lot of redundant compute.
25+
26+
I liked the point around making sure we're not using features determined un-suitable in the context of ML Fairness as we could potentially ban features such as gender etc.
27+
28+
### Tests for Model Development
29+
30+
- Model code goes through code review
31+
- Offline proxy metrics are measuring what will be A/B tested
32+
- Hyperparameter tuning
33+
- Effect of model staleness
34+
- Simple models as a baseline
35+
- Model performs well across different data slices
36+
- Test for implicit bias in the model or data
37+
38+
Touches on aspects of good design principles such as optimizing for the right metrics, measuring staleness and updating the model on time. The point around good performance on different data slices is specially valid when majority data to the website might come from English speaking or developed countries etc.
39+
40+
### Tests for ML Infrastructure
41+
42+
- Reproducibility of model training
43+
- Integrations tests for the ML systems
44+
- Quality tests before deployment of the model
45+
- Ability to rollback deployed models
46+
- Testing via a canary process
47+
48+
Here most points are easy to follow in this list but something that I have found hard in experience is the quality tests. One example being recommendation systems, since the output of the system may change from time to time. It is hard to write automated tests that measure quality; interested in learning how others solve this problem.
49+
50+
### Monitoring Tests for ML Systems
51+
52+
- Upstream instability in features, both in training and serving
53+
- Data invariants hold in training and serving inputs
54+
- Model staleness
55+
- Train/Test skew in features and inputs
56+
- Slow leak regression in latency, throughput etc.
57+
- Regression in prediction quality
58+
59+
This was a fantastic list as it covers a lot of hidden problems in model serving. As models get larger it can get expensive to serve them or features get more expensive to compute. Useful tools could be monitoring success metrics as a time-series and seeing if we hit consistent performance. Another could be to always have a small A/B test running against the old / baseline model.
60+
61+
The paper touches on basic problems that you run into quite often but are not talked about much in the ML community. Curious to know, how the problems around feature engineering and model complexity evolve with advent of Deep Learning models.

blog/index.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ layout: blog
99
<div class="hentry post">
1010
<div class="sticky-header">
1111
<h2 class="entry-title"><a class="spec" href="{{ post.url }}" title="{{ post.title }}" rel="bookmark">{{ post.title }}</a></h2>
12-
<div class="byline">{{ post.date | date: "%B %d, %Y" }}{% if post.author %} · by {{ post.author}}{% endif %} . in {% for category in post.categories %}<a href="/blog/categories.html#{{category}}" class="category">{{ category }}</a>{% endfor %}</div>
12+
<div class="byline">{{ post.date | date: "%B %d, %Y" }}{% if post.author %} · by {{ post.author}}{% endif %} . {% for category in post.categories %}{% if forloop.index0 != 0 %}, {% endif %}<a href="/blog/categories.html#{{category}}" class="category">{{ category }}</a>{% endfor %}</div>
1313
</div><!-- .sticky-header -->
1414

1515
<div class="entry-summary">

0 commit comments

Comments
 (0)