What a long, strange trip it's been. Full write up here:
https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/How_Wrong_Would_Usin…
Summary:
- We can't reliably catch day-by-day outliers by using the page view
information that comes along with edits because not enough edits happen.
- Weekly averages (rather than day-by-day counts) don't usually move
that much (i.e., by more than a factor of 2). If we can capture daily or
weekly page view stats, that should keep us reasonably up-to-date overall,
esp. if these moderate swings don't affect scoring much.
- We could gather daily statistics from the page view API and store the
high mark over the last 3-7 for the top 1K to 50K most-viewed articles. The
ranking algorithm could use either the rolling daily average or the high
mark (which ever is higher).
- For "Trending" topics, looking at the top 1K page views every hour
(unfortunately not currently available through the PageviewAPI) would be
the best way to catch suddenly trending topics if we want to be more
responsive, but it isn't clear that it's worth it.
—Trey
Trey Jones
Software Engineer, Discovery
Wikimedia Foundation