What's your ML test score?

sb2nov · sb2nov · commit 61e55f752cab · 2017-09-07T22:43:50.000-07:00
diff --git a/Gemfile b/Gemfile
@@ -1,4 +1,5 @@
 source 'https://rubygems.org'
 
 gem 'github-pages'
-gem 'jekyll-redirect-from'
+gem 'jekyll-compose', group: [:jekyll_plugins]
+gem 'jekyll-redirect-from', group: [:jekyll_plugins]
diff --git a/Gemfile.lock b/Gemfile.lock
@@ -88,6 +88,8 @@ GEM
       jekyll (~> 3.0)
     jekyll-coffeescript (1.0.1)
       coffee-script (~> 2.2)
+    jekyll-compose (0.5.0)
+      jekyll (>= 3.0.0)
     jekyll-default-layout (0.1.4)
       jekyll (~> 3.0)
     jekyll-feed (0.9.2)
@@ -211,6 +213,7 @@ PLATFORMS
 
 DEPENDENCIES
   github-pages
+  jekyll-compose
   jekyll-redirect-from
 
 BUNDLED WITH
diff --git a/_layouts/post.html b/_layouts/post.html
@@ -9,7 +9,7 @@ <h1>{{ page.title }}</h1>
           {% if page.subtitle %}
           <h2>{{ page.subtitle }}</h2>
           {% endif %}
-          <div class="byline">{{ page.date | date: "%B %d, %Y" }}{% if page.author %} · by {{ page.author}}{% endif %} . in {% for category in page.categories %}<a href="/blog/categories.html#{{category}}" class="category">{{ category }}</a>{% endfor %}</div>
+          <div class="byline">{{ page.date | date: "%B %d, %Y" }}{% if page.author %} · by {{ page.author}}{% endif %} . {% for category in page.categories %}{% if forloop.index0 != 0 %}, {% endif %}<a href="/blog/categories.html#{{category}}" class="category">{{ category }}</a>{% endfor %}</div>
         </div>
 
         {% include toc.html %}
diff --git a/_posts/2017-03-26-the-startup-of-you.md b/_posts/2017-03-26-the-startup-of-you.md
@@ -1,7 +1,9 @@
 ---
 layout: post
 title: The Start-up of You
-category: Books
+category:
+- Books
+- Entrepreneurship
 excerpt: Nuggets from Reid Hoffman's the Start up of You.
 ---
 
diff --git a/_posts/2017-09-07-what-s-your-ml-test-score.md b/_posts/2017-09-07-what-s-your-ml-test-score.md
@@ -0,0 +1,61 @@
+---
+layout: post
+title: What's your ML test score?
+category:
+- Machine Learning
+- Papers
+excerpt: "Paper summary: What's your ML test score? A rubric for ML testing and monitoring of production systems."
+---
+
+Using machine-learning systems in production is very different from running offline experiments as you run into problems such as train/test skew, latency and resource requirements. The paper [What’s your ML Test Score?](https://research.google.com/pubs/pub45742.html) by Eric Breck etc. provides a rubric for measuring the quality of ML system design.
+
+The rubric has four sections so we'll go over each of them.
+
+### Tests for Features and Data
+
+- Distributions of each feature
+- Features are same in both the training and serving stack
+- Relationship between different features and targets
+- Privacy control in model training
+- Cost of computing each feature
+- Does not contain features determined unsuitable for use
+- Time to add new features to production
+
+Points around expensive and redundant features are important as they can affect the ability of the system to meet the desired latency and throughput requirements. One options to solve such issues is to pre-cache expensive features and use them at prediction time but this can yield to a lot of redundant compute.
+
+I liked the point around making sure we're not using features determined un-suitable in the context of ML Fairness as we could potentially ban features such as gender etc.
+
+### Tests for Model Development
+
+- Model code goes through code review
+- Offline proxy metrics are measuring what will be A/B tested
+- Hyperparameter tuning
+- Effect of model staleness
+- Simple models as a baseline
+- Model performs well across different data slices
+- Test for implicit bias in the model or data
+
+Touches on aspects of good design principles such as optimizing for the right metrics, measuring staleness and updating the model on time. The point around good performance on different data slices is specially valid when majority data to the website might come from English speaking or developed countries etc.
+
+### Tests for ML Infrastructure
+
+- Reproducibility of model training
+- Integrations tests for the ML systems
+- Quality tests before deployment of the model
+- Ability to rollback deployed models
+- Testing via a canary process
+
+Here most points are easy to follow in this list but something that I have found hard in experience is the quality tests. One example being recommendation systems, since the output of the system may change from time to time. It is hard to write automated tests that measure quality; interested in learning how others solve this problem.
+
+### Monitoring Tests for ML Systems
+
+- Upstream instability in features, both in training and serving
+- Data invariants hold in training and serving inputs
+- Model staleness
+- Train/Test skew in features and inputs
+- Slow leak regression in latency, throughput etc.
+- Regression in prediction quality
+
+This was a fantastic list as it covers a lot of hidden problems in model serving. As models get larger it can get expensive to serve them or features get more expensive to compute. Useful tools could be monitoring success metrics as a time-series and seeing if we hit consistent performance. Another could be to always have a small A/B test running against the old / baseline model.
+
+The paper touches on basic problems that you run into quite often but are not talked about much in the ML community. Curious to know, how the problems around feature engineering and model complexity evolve with advent of Deep Learning models.
diff --git a/blog/index.md b/blog/index.md
@@ -9,7 +9,7 @@ layout: blog
     <div class="hentry post">
       <div class="sticky-header">
         <h2 class="entry-title"><a class="spec" href="{{ post.url }}" title="{{ post.title }}" rel="bookmark">{{ post.title }}</a></h2>
-        <div class="byline">{{ post.date | date: "%B %d, %Y" }}{% if post.author %} · by {{ post.author}}{% endif %} . in {%  for category in post.categories %}<a href="/blog/categories.html#{{category}}" class="category">{{ category }}</a>{% endfor %}</div>
+        <div class="byline">{{ post.date | date: "%B %d, %Y" }}{% if post.author %} · by {{ post.author}}{% endif %} . {% for category in post.categories %}{% if forloop.index0 != 0 %}, {% endif %}<a href="/blog/categories.html#{{category}}" class="category">{{ category }}</a>{% endfor %}</div>
       </div><!-- .sticky-header -->
 
       <div class="entry-summary">