I have followed that process, been subscribed to https://phabricator.wikimedia.org/T44259 which I just reread and thus rather surprised by your comment. I have never seen any technical reason mentioned in the bug. It would have been very helpful, because someone might have come up with a fix in the two years when it was "on our roadmap" un- til you overcame them.
It wasn't my intention to dig up this history, just to point out that the real story is always more complex. That applies to whatever explanation I give here, as well. It's from my perspective, and the nuance is endless. Anyway, I'm more than happy to try and shed light, hopefully this helps for future work we do together.
The technical challenge was, basically, moving off of our udp2log based logging infrastructure to Kafka. I think it's fair to say that the Analytics team didn't have the full trust and confidence of WMF until Toby started turning that around. We were submitted to some painful agile coaching and were not allowed to implement the correct solution (Kafka) fully, we were working with a patchwork system that still had single points of failure and data loss. Once we gained that trust, it still took while to sort out how to tune Kafka so it reliably received traffic logs from all of our caching centers, and let us know when it had loss or duplication of data. This work was in really good shape, if memory serves, by the end of summer, 2014. I incorrectly summarized that solely as a technical challenge, it was a pretty tricky technical challenge combined with an organizational one. For the latter, if it helps, Sue and Erik both acknowledged responsibility and things were much smoother after that. (I always had tremendous respect for the two of them, but that acknowledgement was pretty amazing, and unique in my 12 years of experience).
At that point, October 2014, some of us, myself included, wanted to start work on the pageview API. We didn't get push-back as much as a strong push to focus on Event Logging instead. The Event Logging system, developed by Ori, was also experiencing some pretty serious growing pains. Outages were becoming very frequent due to the increased traffic and lack of automated monitoring and management. Over the next few months we improved performance and upgraded it to use Kafka as well, and solved those problems. Looking back, that's still a bittersweet choice for me. This work on Event Logging was absolutely key to the experiments that led to Visual Editor's successful roll-out in 2015. As one of many examples, this dashboard would not have been possible without a stable Event Logging platform: https://edit-analysis.wmflabs.org/compare/. And, perhaps this was Toby's strategic vision that I didn't see at the time, and very important for us to keep our newly gained trust and independence within WMF. But, of course, it meant we had to delay the pageview API yet again. That's the 6 month delay I mentioned. And we didn't leave the community hanging, we made the higher quality raw data available with mobile traffic in this new dataset: http://dumps.wikimedia.org/other/pagecounts-all-sites/ as well as gave Henrik some support with stats.grok.se
Some of these things are mentioned on the epic T44259 https://phabricator.wikimedia.org/T44259, but some I didn't even truly understand at the time, and some might have not been constructive to mention. I'm personally all ears at this point. What of this should we have noted on the task? Like I said above, there's lots of detail, but at some point it would feel like I'm a news reporter instead of an engineer :) Also, I'm not sure I would have seen it the same way. Even a few months ago when we released the pageview API I was still a bit bitter that the Event Logging work was prioritized, and now I think that was me being short-sighted to some extent.
Instead, I read for example Toby's comment at Magnus's blog
(http://magnusmanske.de/wordpress/?p=173#comment-290):
| […]
| We’ve been prioritizing and working on these projects as our | resources allow and it’s important to understand that the | team has not been idle. While we’ve done a less than stel- | lar job in communicating our progress to the community, in- | formation on what we’ve been doing is available via our | planning pages on mediawiki. In the future, we will be more | proactive in communicating with the community regarding our | goals and projects.
as meaning that there were no technical obstacles, but lim- ited resources that were directed to other projects (and ap- parently none that matched the popularity of a pageviews API).
Both can be true, and are true. The challenge was great, from what I understand what we accomplished took Twitter orders of magnitude more money and people, a fact which makes me look at my teammates with complete awe (they're amazing). And, as I explained above, we also had to prioritize other work.
My interpretation may have been biased by Magnus's report above that:
| […]
| Like others, I have tried to get the Foundation to provide | the page view data in a more accessible and local (as in | toolserver/Labs) way. Like others, I failed. The last it- | eration was a video meeting with the Analytics team (newly | restarted, as the previous Analytics team didn’t really work | out for a reason; I didn’t inquire too deeply), which ended | with a promise to get this done Real Soon Now™, and the gen- | erous offer to use the page view data from their hadoop | cluster. Except the cluster turned out to be empty; I then | was encouraged to import the view data myself. (No, this is | not a joke. I have the emails to prove it.) As much as I | enjoy working with and around the Wikiverse, I do have nei- | ther the time, the bandwidth, nor the inclination to do your | paid jobs for you, thank you very much.
| […]
which seems to indicate that it was indeed a problem of WMF allocating (human) resources.
No, that video call was my fault. I felt like I was sitting on burning hot coals and I couldn't stand having some of the data in the cluster and not being able to make it available publicly any more. So I tried to offer Magnus access to the cluster and dedicate my volunteer time to help him get to the data. This mainly failed because I lost my volunteer time to a personal crisis that I can't get into here (it had absolutely nothing to do with the foundation, it was just unfortunate timing from Magnus's point of view).
I hope that helps. Above all, getting this project done is my proudest professional moment, and I think in some sense the delays only made it better when it finally came out. The members of the Analytics team that are involved in the pageview API now are ten times smarter and more equipped to handle the project than I would have been by myself in October 2014.
Respectfully,
Dan