It should be updated soon, the jobs are all done successfully.  But currently we do expect this kind of lag, I'll explain why.

When we started we were sqooping at the beginning of the month and the processing takes something like 4 days total, most of it sqooping.  But this put too much load on the database serves too close to the beginning of the month when a bunch of other stuff is running.  So we had to move it back to the 5th of the month [1].  Add 4 days onto that and we end up finishing around the 9th of the month.  We don't like this at all and we're trying to figure out a better way to import the data incrementally so we can just start processing when we have all of it.  It's unfortunate but we couldn't foresee the infrastructure limitation, too much was up in the air about even where we would sqoop from when we started this work.  Joseph and I have a weekly meeting to discuss moving towards a more incremental approach, and this task is the parent task to watch for now: https://phabricator.wikimedia.org/T193650 (priority is low because we have too many other commitments, but it's something I'd love to see before we call wikistats 2 "production" quality)

[1] https://github.com/wikimedia/puppet/blob/28b78985d3612a6e19720be1fe8eef5f0dfc2ed7/modules/profile/manifests/analytics/refinery/job/sqoop_mediawiki.pp#L43

On Wed, Oct 10, 2018 at 10:00 PM Neil Patel Quinn <nquinn@wikimedia.org> wrote:
Hey there!

I just wrote a script that fetches data from the AQS new pages endpoint in order to prepare the our monthly health metrics (T199459).

However, it seems like that endpoint doesn't yet have monthly data for September. For example, a query for Commons with a start of July 1 and and an end of October 1 returns only data for July and August. What's the schedule for updating this data?

To be honest, I feel pretty frustrated by this. Wikistats 1 generates data on content pages with a delay of 10-15 days after the end of the month, which has made it difficult for us to provide timely metrics to executives and the board. I had assumed (to a degree that I didn't even check) that by switching to this API, we would instead only have to deal with the delay in generating the mediawiki_history snapshot (5-7 days after the end of the month). But that doesn't seem to be the case.
--
Neil Patel Quinn (he/him/his)
product analyst, Wikimedia Foundation
_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics