On 23 March 2018 at 07:02, Ahmed Fasih <wuzzyview@gmail.com> wrote:
Neil, thank you so much for your insightful comments!

No problem. It's always a good feeling when you know the answer to someone else's question :)
 
I was able to use Quarry to get the number of edits on English
Wikipedia yesterday, so I can indeed get recent data from it—hooray!!!

I also used it to cross-check against the REST API for February 28th:

https://quarry.wmflabs.org/query/25783

and I see that Quarry reports 168668 while the REST API reports 169754
edits for the same period (less than 1% error). I'll do some digging
to see if the difference is from the denormalization you mentioned, or
other reasons why they disagree.

The first thing to consider is that when a Wikipedia page is deleted, all the corresponding rows from the revision table are moved to a separate archive table (probably for reasons that made much more sense years ago). However, in the Data Lake and therefore the REST API, there's no such separation.

This query is one way to get a combined count: https://quarry.wmflabs.org/query/25794

However, combining the two tables yields 171 346 edits, which makes the Data Lake count about 1% lower than the application database count.

At the moment, I can't think of a good reason for that, but I'm sure others on this list know.