Hi Everyone,
Many of you have read Magnus' post on his bloghttp://magnusmanske.de/wordpress/?p=173. I've commented on his blog and I wanted to repost here and address Magnus' concerns directly.
First of all, I'm sorry that we let Magnus and other folks down on the page view APIs -- we made some commitments late last year that we weren't able to meet. Not only that, these failures echoed previous points of frustration with the Foundation.
I do want to note that we actively support the infrastructure that feeds data to stats.grok.se. We've fixed a number of issues with that pipeline, most recently last week. We understand the importance of this data to the community.
The page view API project has been challenging for a number of reasons -- the size of the data, the fact that definitions of page views have not been updated to stay in line with the changing traffic (mobile, bots, API requests, etc) and the challenges in aggregating various aliases. We've needed to revisit our definitions of page views in order to get this right as well as design and build a global architecture for collecting these and other metrics. In addition, we've tried to do this with a perspective of privacy and respect for our users.
To this end, we presented an approach to measuring page views in MediaWiki at FOSDEM in January and have made progress towards our new infrastructure by deploying middleware delivering unsampled page view data from mobile devices from our globally distributed datacenters to our compute cluster for analysis.
However, these initiatives are complex and will take several months to complete at the earliest. In the meantime, we're working with Henrik to scale up stats.grok.se.
I also want to call out that the Analytics team has been supporting a wide range of users and stakeholders during the year. We've developed WikiMetrics, a tool for measuring editor productivity that is used by WMF program evaluation and community members; provided dashboards and support for Wikipedia Zero, our program to partner with our mobile partners to enable mobile Wikipedia access free from data charges; and supported product teams and researchers both inside and outside of the foundation.
We've been prioritizing and working on these projects as our resources allow and it's important to understand that the team has not been idle. While we've done a less than stellar job in communicating our progress to the community, information on what we've been doing is available via our planning pages https://www.mediawiki.org/wiki/Analytics/Prioritization_Planning on mediawiki. In the future, we will be more proactive in communicating with the community regarding our goals and projects.
-Toby
On Wed, Feb 19, 2014 at 3:14 PM, Toby Negrin tnegrin@wikimedia.org wrote:
I also want to call out that the Analytics team has been supporting a wide range of users and stakeholders during the year. We've developed WikiMetrics, a tool for measuring editor productivity that is used by WMF program evaluation and community members; provided dashboards and support for Wikipedia Zero, our program to partner with our mobile partners to enable mobile Wikipedia access free from data charges; and supported product teams and researchers both inside and outside of the foundation.
Also legal - during this period Toby, Dario, and the rest of the team have spent a *lot* of hours working with the legal team on the privacy policy. They have never complained to me about it, but I'm sure it has distracted them. We appreciate that, and we hope it helps put us all on a firmer footing in the future.
Luis
On Wed, Feb 19, 2014 at 3:14 PM, Toby Negrin tnegrin@wikimedia.org wrote:
Many of you have read Magnus' post on his bloghttp://magnusmanske.de/wordpress/?p=173. I've commented on his blog and I wanted to repost here and address Magnus' concerns directly.
First of all, I'm sorry that we let Magnus and other folks down on the page view APIs -- we made some commitments late last year that we weren't able to meet. Not only that, these failures echoed previous points of frustration with the Foundation.
I do want to note that we actively support the infrastructure that feeds data to stats.grok.se. We've fixed a number of issues with that pipeline, most recently last week. We understand the importance of this data to the community.
The page view API project has been challenging for a number of reasons -- the size of the data, the fact that definitions of page views have not been updated to stay in line with the changing traffic (mobile, bots, API requests, etc) and the challenges in aggregating various aliases. We've needed to revisit our definitions of page views in order to get this right as well as design and build a global architecture for collecting these and other metrics. In addition, we've tried to do this with a perspective of privacy and respect for our users.
To this end, we presented an approach to measuring page views in MediaWiki at FOSDEM in January and have made progress towards our new infrastructure by deploying middleware delivering unsampled page view data from mobile devices from our globally distributed datacenters to our compute cluster for analysis.
However, these initiatives are complex and will take several months to complete at the earliest. In the meantime, we're working with Henrik to scale up stats.grok.se.
I also want to call out that the Analytics team has been supporting a wide range of users and stakeholders during the year. We've developed WikiMetrics, a tool for measuring editor productivity that is used by WMF program evaluation and community members; provided dashboards and support for Wikipedia Zero, our program to partner with our mobile partners to enable mobile Wikipedia access free from data charges; and supported product teams and researchers both inside and outside of the foundation.
We've been prioritizing and working on these projects as our resources allow and it's important to understand that the team has not been idle. While we've done a less than stellar job in communicating our progress to the community, information on what we've been doing is available via our planning pages https://www.mediawiki.org/wiki/Analytics/Prioritization_Planning on mediawiki. In the future, we will be more proactive in communicating with the community regarding our goals and projects.
-Toby
Toby,
I think this is reasonable. It's always been hard that the team's scope has ostensibly been "any analytical need in the Foundation and probably in the Wikimedia community at large".
I think the Analytics team deserves respect for an honest and realistic response in this case, which is that we care very much about the problem of accurately generating pageviews data at Wikimedia scale, but it's not going to happen overnight. And admitting that in the past where we've overestimated our abilities. That's something probably all of us are tempted to do in Wikimedia-land, where we care desperately about overcoming obstacles but aren't always resourced or prepared to tackle a certain problem. That includes me.
Keep on truckin'
Hi Toby, Everyone,
thanks for this update. I think the approach you describe, boosting stats.grok.se with hardware support while developing the "official", scalable WMF solution, is quite sensible in this situation.
I do understand having just too many things on your plate to deal with, and I also know the difficulties inherent to dealing with large amounts of data in a scalable fashion. No one expects you to "press a button" and view data appears as if by magic; dedicated developer time, steady progress with the occasional update, support of Henrik's current interim solution, and maybe some rough time estimates for milestones would be what I (and people from the GLAM community) would hope for.
Thanks, Magnus
On Wed, Feb 19, 2014 at 11:14 PM, Toby Negrin tnegrin@wikimedia.org wrote:
Hi Everyone,
Many of you have read Magnus' post on his bloghttp://magnusmanske.de/wordpress/?p=173. I've commented on his blog and I wanted to repost here and address Magnus' concerns directly.
First of all, I'm sorry that we let Magnus and other folks down on the page view APIs -- we made some commitments late last year that we weren't able to meet. Not only that, these failures echoed previous points of frustration with the Foundation.
I do want to note that we actively support the infrastructure that feeds data to stats.grok.se. We've fixed a number of issues with that pipeline, most recently last week. We understand the importance of this data to the community.
The page view API project has been challenging for a number of reasons -- the size of the data, the fact that definitions of page views have not been updated to stay in line with the changing traffic (mobile, bots, API requests, etc) and the challenges in aggregating various aliases. We've needed to revisit our definitions of page views in order to get this right as well as design and build a global architecture for collecting these and other metrics. In addition, we've tried to do this with a perspective of privacy and respect for our users.
To this end, we presented an approach to measuring page views in MediaWiki at FOSDEM in January and have made progress towards our new infrastructure by deploying middleware delivering unsampled page view data from mobile devices from our globally distributed datacenters to our compute cluster for analysis.
However, these initiatives are complex and will take several months to complete at the earliest. In the meantime, we're working with Henrik to scale up stats.grok.se.
I also want to call out that the Analytics team has been supporting a wide range of users and stakeholders during the year. We've developed WikiMetrics, a tool for measuring editor productivity that is used by WMF program evaluation and community members; provided dashboards and support for Wikipedia Zero, our program to partner with our mobile partners to enable mobile Wikipedia access free from data charges; and supported product teams and researchers both inside and outside of the foundation.
We've been prioritizing and working on these projects as our resources allow and it's important to understand that the team has not been idle. While we've done a less than stellar job in communicating our progress to the community, information on what we've been doing is available via our planning pages https://www.mediawiki.org/wiki/Analytics/Prioritization_Planning on mediawiki. In the future, we will be more proactive in communicating with the community regarding our goals and projects.
-Toby
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Your mileage may vary, but my good friend Hannes Mühleisen over at CWI Amsterdam maintains this tool: http://wikistats.ins.cwi.nl/. You can script it like so: http://wikistats.ins.cwi.nl/export.php?page=Lady_Gaga%7CBritney_Spears&c.... I let Hannes have the final word, but it seems it is not steadily updating. Could it?
Best, Tom
Looks nice, but:
"831.106 pages"
Considering there are ~4.5 million pages in the English article namespace alone, it won't do for GLAM purposes. May well be suitable for others, though.
Cheers, Magnus
On Thu, Feb 20, 2014 at 11:19 AM, Thomas Steiner tomac@google.com wrote:
Your mileage may vary, but my good friend Hannes Mühleisen over at CWI Amsterdam maintains this tool: http://wikistats.ins.cwi.nl/. You can script it like so:
http://wikistats.ins.cwi.nl/export.php?page=Lady_Gaga%7CBritney_Spears&c... . I let Hannes have the final word, but it seems it is not steadily updating. Could it?
Best, Tom
-- Thomas Steiner, Employee, Google Inc. http://blog.tomayac.com, http://twitter.com/tomayac
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux)
iFy0uwAntT0bE3xtRa5AfeCheCkthAtTh3reSabiGbl0ck0fjumBl3DCharaCTersAttH3b0ttom.hTtP5://xKcd.c0m/1181/ -----END PGP SIGNATURE-----
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Hello Magnus and List,
On 02/20/2014 12:55 PM, Magnus Manske wrote:
"831.106 pages"
Yes, these are the pages that collect a significant number (> 1000/week) of hits. Since the process involves a large amount of data, it is not fully automated yet, but it hopefully will be soon.
Considering there are ~4.5 million pages in the English article namespace alone, it won't do for GLAM purposes. May well be suitable for others, though.
If there is interest, I could add the rest as well.
Best,
Hannes
On Thu, Feb 20, 2014 at 11:19 AM, Thomas Steiner <tomac@google.com mailto:tomac@google.com> wrote:
Your mileage may vary, but my good friend Hannes Mühleisen over at CWI Amsterdam maintains this tool: http://wikistats.ins.cwi.nl/. You can script it like so: http://wikistats.ins.cwi.nl/export.php?page=Lady_Gaga%7CBritney_Spears&callback=myCallback. I let Hannes have the final word, but it seems it is not steadily updating. Could it? Best, Tom -- Thomas Steiner, Employee, Google Inc. http://blog.tomayac.com, http://twitter.com/tomayac -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iFy0uwAntT0bE3xtRa5AfeCheCkthAtTh3reSabiGbl0ck0fjumBl3DCharaCTersAttH3b0ttom.hTtP5://xKcd.c0m/1181/ -----END PGP SIGNATURE----- _______________________________________________ Analytics mailing list Analytics@lists.wikimedia.org <mailto:Analytics@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/analytics
-- undefined
Hi Magnus -- These are totally reasonable expectations and thanks for the understanding our situation. You can look for more detailed communications from us starting tomorrow.
Thanks,
-Toby
On Thu, Feb 20, 2014 at 2:51 AM, Magnus Manske magnusmanske@googlemail.comwrote:
Hi Toby, Everyone,
thanks for this update. I think the approach you describe, boosting stats.grok.se with hardware support while developing the "official", scalable WMF solution, is quite sensible in this situation.
I do understand having just too many things on your plate to deal with, and I also know the difficulties inherent to dealing with large amounts of data in a scalable fashion. No one expects you to "press a button" and view data appears as if by magic; dedicated developer time, steady progress with the occasional update, support of Henrik's current interim solution, and maybe some rough time estimates for milestones would be what I (and people from the GLAM community) would hope for.
Thanks, Magnus
On Wed, Feb 19, 2014 at 11:14 PM, Toby Negrin tnegrin@wikimedia.orgwrote:
Hi Everyone,
Many of you have read Magnus' post on his bloghttp://magnusmanske.de/wordpress/?p=173. I've commented on his blog and I wanted to repost here and address Magnus' concerns directly.
First of all, I'm sorry that we let Magnus and other folks down on the page view APIs -- we made some commitments late last year that we weren't able to meet. Not only that, these failures echoed previous points of frustration with the Foundation.
I do want to note that we actively support the infrastructure that feeds data to stats.grok.se. We've fixed a number of issues with that pipeline, most recently last week. We understand the importance of this data to the community.
The page view API project has been challenging for a number of reasons -- the size of the data, the fact that definitions of page views have not been updated to stay in line with the changing traffic (mobile, bots, API requests, etc) and the challenges in aggregating various aliases. We've needed to revisit our definitions of page views in order to get this right as well as design and build a global architecture for collecting these and other metrics. In addition, we've tried to do this with a perspective of privacy and respect for our users.
To this end, we presented an approach to measuring page views in MediaWiki at FOSDEM in January and have made progress towards our new infrastructure by deploying middleware delivering unsampled page view data from mobile devices from our globally distributed datacenters to our compute cluster for analysis.
However, these initiatives are complex and will take several months to complete at the earliest. In the meantime, we're working with Henrik to scale up stats.grok.se.
I also want to call out that the Analytics team has been supporting a wide range of users and stakeholders during the year. We've developed WikiMetrics, a tool for measuring editor productivity that is used by WMF program evaluation and community members; provided dashboards and support for Wikipedia Zero, our program to partner with our mobile partners to enable mobile Wikipedia access free from data charges; and supported product teams and researchers both inside and outside of the foundation.
We've been prioritizing and working on these projects as our resources allow and it's important to understand that the team has not been idle. While we've done a less than stellar job in communicating our progress to the community, information on what we've been doing is available via our planning pages https://www.mediawiki.org/wiki/Analytics/Prioritization_Planning on mediawiki. In the future, we will be more proactive in communicating with the community regarding our goals and projects.
-Toby
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
-- undefined
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Thanks for the update Toby! As far as prioritization of a Pageview API, this would be a hugely useful tool for the mobile team. Literally just yesterday I had to tell one of the designers that we couldn't implement the feature they wanted because we didn't have a Pageview API. Such an API would open up a world of new feature possibilities, many of which would be useful at promoting editor engagement. I'm sure you have a lot of other concerns to balance with such requests, but just wanted to throw in 2 cents from the mobile team.
Cheers, Ryan Kaldari
On Wed, Feb 19, 2014 at 3:14 PM, Toby Negrin tnegrin@wikimedia.org wrote:
Hi Everyone,
Many of you have read Magnus' post on his bloghttp://magnusmanske.de/wordpress/?p=173. I've commented on his blog and I wanted to repost here and address Magnus' concerns directly.
First of all, I'm sorry that we let Magnus and other folks down on the page view APIs -- we made some commitments late last year that we weren't able to meet. Not only that, these failures echoed previous points of frustration with the Foundation.
I do want to note that we actively support the infrastructure that feeds data to stats.grok.se. We've fixed a number of issues with that pipeline, most recently last week. We understand the importance of this data to the community.
The page view API project has been challenging for a number of reasons -- the size of the data, the fact that definitions of page views have not been updated to stay in line with the changing traffic (mobile, bots, API requests, etc) and the challenges in aggregating various aliases. We've needed to revisit our definitions of page views in order to get this right as well as design and build a global architecture for collecting these and other metrics. In addition, we've tried to do this with a perspective of privacy and respect for our users.
To this end, we presented an approach to measuring page views in MediaWiki at FOSDEM in January and have made progress towards our new infrastructure by deploying middleware delivering unsampled page view data from mobile devices from our globally distributed datacenters to our compute cluster for analysis.
However, these initiatives are complex and will take several months to complete at the earliest. In the meantime, we're working with Henrik to scale up stats.grok.se.
I also want to call out that the Analytics team has been supporting a wide range of users and stakeholders during the year. We've developed WikiMetrics, a tool for measuring editor productivity that is used by WMF program evaluation and community members; provided dashboards and support for Wikipedia Zero, our program to partner with our mobile partners to enable mobile Wikipedia access free from data charges; and supported product teams and researchers both inside and outside of the foundation.
We've been prioritizing and working on these projects as our resources allow and it's important to understand that the team has not been idle. While we've done a less than stellar job in communicating our progress to the community, information on what we've been doing is available via our planning pages https://www.mediawiki.org/wiki/Analytics/Prioritization_Planning on mediawiki. In the future, we will be more proactive in communicating with the community regarding our goals and projects.
-Toby
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Hi Ryan,
Totally! There are lots of really cool applications we can do with the API. A couple of comments -- first, we need to establish what constitutes a page view. We've done some work on this that we shared at FOSDEM and the next step is to start discussions with the community.
I do need to say that a bulk query interface is probably the first thing we'll build once we have the page views. A low-latency lookup system is a little bit harder technically and will take a little longer. Definitely on the radar though!
-Toby
On Thu, Feb 20, 2014 at 11:30 AM, Ryan Kaldari rkaldari@wikimedia.orgwrote:
Thanks for the update Toby! As far as prioritization of a Pageview API, this would be a hugely useful tool for the mobile team. Literally just yesterday I had to tell one of the designers that we couldn't implement the feature they wanted because we didn't have a Pageview API. Such an API would open up a world of new feature possibilities, many of which would be useful at promoting editor engagement. I'm sure you have a lot of other concerns to balance with such requests, but just wanted to throw in 2 cents from the mobile team.
Cheers, Ryan Kaldari
On Wed, Feb 19, 2014 at 3:14 PM, Toby Negrin tnegrin@wikimedia.orgwrote:
Hi Everyone,
Many of you have read Magnus' post on his bloghttp://magnusmanske.de/wordpress/?p=173. I've commented on his blog and I wanted to repost here and address Magnus' concerns directly.
First of all, I'm sorry that we let Magnus and other folks down on the page view APIs -- we made some commitments late last year that we weren't able to meet. Not only that, these failures echoed previous points of frustration with the Foundation.
I do want to note that we actively support the infrastructure that feeds data to stats.grok.se. We've fixed a number of issues with that pipeline, most recently last week. We understand the importance of this data to the community.
The page view API project has been challenging for a number of reasons -- the size of the data, the fact that definitions of page views have not been updated to stay in line with the changing traffic (mobile, bots, API requests, etc) and the challenges in aggregating various aliases. We've needed to revisit our definitions of page views in order to get this right as well as design and build a global architecture for collecting these and other metrics. In addition, we've tried to do this with a perspective of privacy and respect for our users.
To this end, we presented an approach to measuring page views in MediaWiki at FOSDEM in January and have made progress towards our new infrastructure by deploying middleware delivering unsampled page view data from mobile devices from our globally distributed datacenters to our compute cluster for analysis.
However, these initiatives are complex and will take several months to complete at the earliest. In the meantime, we're working with Henrik to scale up stats.grok.se.
I also want to call out that the Analytics team has been supporting a wide range of users and stakeholders during the year. We've developed WikiMetrics, a tool for measuring editor productivity that is used by WMF program evaluation and community members; provided dashboards and support for Wikipedia Zero, our program to partner with our mobile partners to enable mobile Wikipedia access free from data charges; and supported product teams and researchers both inside and outside of the foundation.
We've been prioritizing and working on these projects as our resources allow and it's important to understand that the team has not been idle. While we've done a less than stellar job in communicating our progress to the community, information on what we've been doing is available via our planning pages https://www.mediawiki.org/wiki/Analytics/Prioritization_Planning on mediawiki. In the future, we will be more proactive in communicating with the community regarding our goals and projects.
-Toby
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics