Forwarding comments from Wikimedia-l that may be of interest to a number of subscribers on other lists.
Pine ---------- Forwarded message ---------- From: "Erik Moeller" erik@wikimedia.org Date: Oct 25, 2014 5:59 PM Subject: Re: [Wikimedia-l] Chapters and GLAM tooling To: "Wikimedia Mailing List" wikimedia-l@lists.wikimedia.org Cc:
On Sat, Oct 25, 2014 at 7:16 AM, MZMcBride z@mzmcbride.com wrote:
Labs is a playground and Galleries, Libraries, Archives, and Museums are serious enough to warrant a proper investment of resources, in my view. Magnus and many others develop magnificent tools, but my sense is that they're largely proofs of concept, not final implementations.
Far from being treated as mere proofs of concept, Magnus' GLAM tools [1] have been used to measure and report success in the context of project grant and annual plan proposals and reports, ongoing project performance measurements, blog posts and press releases, etc. Daniel Mietchen has, to my knowledge, been the main person doing any systematic auditing or verification of the reports generated by these tools, and results can be found in his tool testing reports, the last one of which is unfortunately more than a year old. [2]
Integration with MediaWiki should IMO not be viewed as a runway that all useful developments must be pushed towards. Rather, we should seek to establish clearer criteria by which to decide that functionality benefits from this level of integration, to such an extent that it justifies the cost. Functionality that is not integrated in this manner should, then, not be dismissed as "proofs of concept" but rather judged on its own merits.
GWToolset [3] is a good example. It was built as a MediaWiki extension to manage GLAM batch uploads, but we should not regard this decision as sacrosanct, or the only correct way to develop this kind of functionality. The functionality it provides is of highly specialized interest, and indeed, the number of potential users to-date is 47 according to [4], most of whom have not performed significant uploads yet. Its user interface is highly specialized and special permissions + detailed instructions are required to use it. At the same time, it has been used to upload 322,911 files overall, an amazing number even without going into the quality and value of the individual collections.
So, why does it need to be a MediaWiki extension at all? When development began in 2012, OAuth support in MediaWiki did not exist, so it was impossible for an external tool (then running on toolserver) to manage an upload on the user's behalf without asking for the user's password, which would have been in violation of policy. But today, we have other options. It's possible that storage requirements or other specific desired integration points would make it impossible to create this as a Tool Labs tool -- but if we created the same tool today, we should carefully consider that.
Indeed, highly specialized tools for the cultural and education sector _are_ being developed and hosted inside Tool Labs or externally. Looking at the current OAuth consumer requests [5], there are submissions for a metadata editor developed by librarians at the University of Miami Libraries in Coral Gables, Florida, and an assignment creation wizard developed by the Wiki Education Foundation. There's nothing "improper" about that, as Marc-André pointed out.
As noted before, for tools like the ones used for GLAM reporting to get better, WMF has its role to play in providing more datasets and improved infrastructure. But there's nothing inherent in the development of those tools that forces them to live in production land, or that requires large development teams to move them forward. Auditing of numbers, improved scheduling/queuing of database requests, optimization of API calls and DB queries; all of this can be done by individual contributors, making this suitable work for even chapters with limited experience managing technical projects to take on.
On the analytics side, we're well aware that many users have asked for better access to the pageview data, either through MariaDB, or through a dedicated API. We have now said for some time that our focus is on modernizing the infrastructure for log analysis and collection, because the numbers collected by the old webstatscollector code were incomplete, and the infrastructure subject to frequent packet loss issues. In addition, our ability to meet additional requirements on the basis of simple pageview aggregation code was inherently constrained.
To this end, we have put into production use infrastructure to collect and analyze site traffic using Kafka/Hadoop/Hive. At our scale, this has been a tremendously complex infrastructure project which has included custom development such as varnishkafka [6]. While it's taken longer than we've wanted, this new infrastructure is being used to generate a public page count dataset as of this month, including article-level mobile traffic for the first time [7]. Using Hadoop/Hive, we'll be able to compile many more specialized reports, and this is only just beginning.
Giving community developers better access to this data needs to be prioritized relative to other ongoing analytics work, including but not limited to:
- Continued development and maintenance of the above infrastructure foundations;
- Development of "Vital Signs": public reports on editor activity, content contribution, sign-ups and other metrics. This tool gives us more timely access to key measures than WikiStats [9] (or the reportcard [10], which to-date still consumes Wikistats data). Rather than having to wait 4-6 weeks to know what's happening with regard to editor numbers, we can see continuous updates on a day-to-day basis.
- Development of Wikimetrics, which analyzes the editing activity of a group of editors, and which is essential for measuring all movement work that targets increased activity by a targeted group (e.g. editathon), and is a key tool used for grants evaluation (was a funded program worth the $$?). A lot of thought has gone into the development of standardized global metrics [12] for program work, much of it using this technology and dependent on its continued development.
- Measurement (instrumentation) of site actions and development/maintenance of associated infrastructure. As an example, in-depth data collection for features like Media Viewer (see dashboards at [13] ) is only possible because of the EventLogging extension developed by Ori Livneh, and the increasing use of this technology by WMF developers. EventLogging requires significant management, maintenance and teaching effort from the analytics team.
Lila is requesting visibility into all primary funnels on Wikimedia sites (e.g. sign-ups, edits/saves through wikitext, edits/saves through VisualEditor, etc.), and this will require lots of sustained effort from lots of people to get done. What it will give us is a better sense of where people succeed and fail to complete an action -- by way of example, see the initial UploadWizard funnel analysis here: https://www.mediawiki.org/wiki/UploadWizard/Funnel_analysis
- Improved software and infrastructure support for A/B testing, possibly including adoption of existing open source tooling such as Facebook's PlanOut library/interpreter [14].
- Improved readership metrics, possibly including a privacy-sensitive approach to estimating Unique Visitors, and better geographic breakdowns for readers/editors.
These are all complex problems, most of which are dependent on the small analytics team, and feedback on projects and priorities is very much welcome on the analytics mailing list: https://lists.wikimedia.org/mailman/listinfo/analytics
With regard to better embedding of graphs in wikis specifically, Yuri Astrakhan has led the development of a new extension, inspired by work by Dan Andreescu, to visualize data directly in wikis. This extension has been deployed already to Meta and MediaWiki.org and can be used for dynamic graphs where it's appropriate to not have a fallback to a static image, for example in grant reports. See: https://www.mediawiki.org/wiki/Extension:Graph https://www.mediawiki.org/wiki/Extension:Graph/Demo https://meta.wikimedia.org/wiki/Graph:User:Yurik_(WMF)/Obama
I agree this is the kind of functionality that should make its way into Wikipedia. Again, we need to judge throwing a full team behind that against the relative priority of other work. In the meantime, Yuri and others will continue to push it along and may even be able to get it all the way there in due time. The main blockers, from what I can tell, are generation of static fallback images for users without JavaScript, and a better way to manage the data sources.
In general, the point of my original message was this: All organizations that seek to improve Wikipedia and the other Wikimedia projects ultimately depend on technology to do so; to view WMF as the sole "tech provider" does not scale. Larger, well-funded chapters can take on big, hairy challenges like Wikidata; smaller, less-funded orgs are better positioned to work on specialized technical support for programmatic work.
I would caution against requesting WMF to work on highly specialized solutions for highly specialized problems. If such solutions are needed, I would caution against building them into MediaWiki unless they can be generalized to benefit a larger number of users, at which point it's appropriate to seek partnership with WMF, or to ask WMF for the relative priority of such work. But often, it's perfectly fine (and much faster) to build such tools and reports independently, and to ask WMF for help in providing APIs/services/data/infrastructure to get it done.
Cheers, Erik
[1] http://tools.wmflabs.org/glamtools/ [2] https://outreach.wikimedia.org/wiki/Category:This_Month_in_GLAM_Tool_testing... [3] https://www.mediawiki.org/wiki/Extension:GWToolset [4] https://commons.wikimedia.org/w/index.php?title=Special%3AListUsers&user... [5] https://www.mediawiki.org/wiki/Special:OAuthListConsumers?name=&publishe... [6] https://github.com/wikimedia/varnishkafka [7] https://wikitech.wikimedia.org/wiki/Analytics/Pagecounts-all-sites [8] https://metrics.wmflabs.org/static/public/dash/ [9] http://stats.wikimedia.org/ [10] http://reportcard.wmflabs.org/ [11] https://metrics.wmflabs.org/ [12] https://meta.wikimedia.org/wiki/Grants:Learning_%26_Evaluation/Global_metric... [13] http://multimedia-metrics.wmflabs.org/dashboards/mmv [14] https://github.com/facebook/planout -- Erik Möller VP of Product & Strategy, Wikimedia Foundation
_______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe
wikitech-l@lists.wikimedia.org