Hi Ahmed and Neil, Super interesting project you have Ahmed :) Thanks Neil for the very precise you had to Ahmed's question !
Some comments about number disparity below:
and I see that Quarry reports 168668 while the REST API reports 169754 edits for the same period (less than 1% error).
Those two metrics (quarry and API) refer to the exact same datatet: revisions from any user type on any page type for 2018-02-28 day, on enwiki.
The first thing to consider is that when a Wikipedia page is deleted, all the corresponding rows from the revision table are moved to a separate archive table https://www.mediawiki.org/wiki/Manual:Archive_table (probably for reasons that made much more sense years ago). However, in the Data Lake and therefore the REST API, there's no such separation.
This query is one way to get a combined count: https://quarry.wmflabs.org/query/25794
However, combining the two tables yields 171 346 edits, which makes the Data Lake count about 1% *lower *than the application database count.
When computing revisions with deleted ones on the datalake, we end up with the same exact number found by the Quaryy query: 171346
Now about the difference between Quarry and API on revisions without deletes, it is mostly due to recently deleted data (there still are 126 revisions difference that I don't understand https://quarry.wmflabs.org/query/25796) . Cheers ! Joseph