Hi Ahmed,
In my opinion the 126 discrepancy is due to deletes/restores complex patterns. The notion of 'fixed' is not super clear to me here :)
About the data being updated monthly because of a full history scan, you're mostly right. Here is a summary of my view on it:
- user and page tables maintain 'states', and we wanted to be able to show historical values (what was the page-title of that page at the time of that revision) - This process uses the log table, is quite complex, and was originally designed for the whole history.
- Indeed sometimes history is updated in mediawiki, and we want to reflect those patches.
- In any case, even if using more up-to-date data from stream or regular queries to the database, a full history-reconstruction would have been needed as a starting point - And that's what we have now.
It has always been in the plan to use streams to be more reactive in updating recent data, but it has not yet been done.
Joseph