The results of offline testing can differ dramatically from the results obtained via online testing done at system runtime with real users [@Said2013]. In particular, the recommender systems research community is reassessing the dominance of offline testing focused on evaluating accuracy metrics. It is becoming more common to emphasize online testing and non-accuracy metrics, such as recommendation diversity.
Click-through can be reinterpreted as implicit positive rating.
A/B testing
- Click-through rate