What a long, strange trip it's been. Full write up here:
https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/How_Wrong_Would_Using_Out_of_Date_Page_View_Data_Be%3F
Summary:
- We can't reliably catch day-by-day outliers by using the page view information that comes along with edits because not enough edits happen.
- Weekly averages (rather than day-by-day counts) don't usually move that much (i.e., by more than a factor of 2). If we can capture daily or weekly page view stats, that should keep us reasonably up-to-date overall, esp. if these moderate swings don't affect scoring much.
- We could gather daily statistics from the page view API and store the high mark over the last 3-7 for the top 1K to 50K most-viewed articles. The ranking algorithm could use either the rolling daily average or the high mark (which ever is higher).
- For "Trending" topics, looking at the top 1K page views every hour (unfortunately not currently available through the PageviewAPI) would be the best way to catch suddenly trending topics if we want to be more responsive, but it isn't clear that it's worth it.