<p dir="ltr">Forwarding comments from Wikimedia-l that may be of interest to a number of subscribers on other lists.</p>
<p dir="ltr">Pine</p>
<div class="gmail_quote">---------- Forwarded message ----------<br>From: "Erik Moeller" <<a href="mailto:erik@wikimedia.org">erik@wikimedia.org</a>><br>Date: Oct 25, 2014 5:59 PM<br>Subject: Re: [Wikimedia-l] Chapters and GLAM tooling<br>To: "Wikimedia Mailing List" <<a href="mailto:wikimedia-l@lists.wikimedia.org">wikimedia-l@lists.wikimedia.org</a>><br>Cc: <br><br type="attribution">On Sat, Oct 25, 2014 at 7:16 AM, MZMcBride <<a href="mailto:z@mzmcbride.com">z@mzmcbride.com</a>> wrote:<br>
<br>
> Labs is a playground and Galleries, Libraries, Archives, and Museums are<br>
> serious enough to warrant a proper investment of resources, in my view.<br>
> Magnus and many others develop magnificent tools, but my sense is that<br>
> they're largely proofs of concept, not final implementations.<br>
<br>
Far from being treated as mere proofs of concept, Magnus' GLAM tools<br>
[1] have been used to measure and report success in the context of<br>
project grant and annual plan proposals and reports, ongoing project<br>
performance measurements, blog posts and press releases, etc. Daniel<br>
Mietchen has, to my knowledge, been the main person doing any<br>
systematic auditing or verification of the reports generated by these<br>
tools, and results can be found in his tool testing reports, the last<br>
one of which is unfortunately more than a year old. [2]<br>
<br>
Integration with MediaWiki should IMO not be viewed as a runway that<br>
all useful developments must be pushed towards. Rather, we should seek<br>
to establish clearer criteria by which to decide that functionality<br>
benefits from this level of integration, to such an extent that it<br>
justifies the cost. Functionality that is not integrated in this<br>
manner should, then, not be dismissed as "proofs of concept" but<br>
rather judged on its own merits.<br>
<br>
GWToolset [3] is a good example. It was built as a MediaWiki extension<br>
to manage GLAM batch uploads, but we should not regard this decision<br>
as sacrosanct, or the only correct way to develop this kind of<br>
functionality. The functionality it provides is of highly specialized<br>
interest, and indeed, the number of potential users to-date is 47<br>
according to [4], most of whom have not performed significant uploads<br>
yet. Its user interface is highly specialized and special permissions<br>
+ detailed instructions are required to use it. At the same time, it<br>
has been used to upload 322,911 files overall, an amazing number even<br>
without going into the quality and value of the individual<br>
collections.<br>
<br>
So, why does it need to be a MediaWiki extension at all? When<br>
development began in 2012, OAuth support in MediaWiki did not exist,<br>
so it was impossible for an external tool (then running on toolserver)<br>
to manage an upload on the user's behalf without asking for the user's<br>
password, which would have been in violation of policy. But today, we<br>
have other options. It's possible that storage requirements or other<br>
specific desired integration points would make it impossible to create<br>
this as a Tool Labs tool -- but if we created the same tool today, we<br>
should carefully consider that.<br>
<br>
Indeed, highly specialized tools for the cultural and education sector<br>
_are_ being developed and hosted inside Tool Labs or externally.<br>
Looking at the current OAuth consumer requests [5], there are<br>
submissions for a metadata editor developed by librarians at the<br>
University of Miami Libraries in Coral Gables, Florida, and an<br>
assignment creation wizard developed by the Wiki Education Foundation.<br>
There's nothing "improper" about that, as Marc-André pointed out.<br>
<br>
As noted before, for tools like the ones used for GLAM reporting to<br>
get better, WMF has its role to play in providing more datasets and<br>
improved infrastructure. But there's nothing inherent in the<br>
development of those tools that forces them to live in production<br>
land, or that requires large development teams to move them forward.<br>
Auditing of numbers, improved scheduling/queuing of database requests,<br>
optimization of API calls and DB queries; all of this can be done by<br>
individual contributors, making this suitable work for even chapters<br>
with limited experience managing technical projects to take on.<br>
<br>
On the analytics side, we're well aware that many users have asked for<br>
better access to the pageview data, either through MariaDB, or through<br>
a dedicated API. We have now said for some time that our focus is on<br>
modernizing the infrastructure for log analysis and collection,<br>
because the numbers collected by the old webstatscollector code were<br>
incomplete, and the infrastructure subject to frequent packet loss<br>
issues. In addition, our ability to meet additional requirements on<br>
the basis of simple pageview aggregation code was inherently<br>
constrained.<br>
<br>
To this end, we have put into production use infrastructure to collect<br>
and analyze site traffic using Kafka/Hadoop/Hive. At our scale, this<br>
has been a tremendously complex infrastructure project which has<br>
included custom development such as varnishkafka [6]. While it's taken<br>
longer than we've wanted, this new infrastructure is being used to<br>
generate a public page count dataset as of this month, including<br>
article-level mobile traffic for the first time [7]. Using<br>
Hadoop/Hive, we'll be able to compile many more specialized reports,<br>
and this is only just beginning.<br>
<br>
Giving community developers better access to this data needs to be<br>
prioritized relative to other ongoing analytics work, including but<br>
not limited to:<br>
<br>
- Continued development and maintenance of the above infrastructure foundations;<br>
<br>
- Development of "Vital Signs": public reports on editor activity,<br>
content contribution, sign-ups and other metrics. This tool gives us<br>
more timely access to key measures than WikiStats [9] (or the<br>
reportcard [10], which to-date still consumes Wikistats data). Rather<br>
than having to wait 4-6 weeks to know what's happening with regard to<br>
editor numbers, we can see continuous updates on a day-to-day basis.<br>
<br>
- Development of Wikimetrics, which analyzes the editing activity of a<br>
group of editors, and which is essential for measuring all movement<br>
work that targets increased activity by a targeted group (e.g.<br>
editathon), and is a key tool used for grants evaluation (was a funded<br>
program worth the $$?). A lot of thought has gone into the development<br>
of standardized global metrics [12] for program work, much of it<br>
using this technology and dependent on its continued development.<br>
<br>
- Measurement (instrumentation) of site actions and<br>
development/maintenance of associated infrastructure. As an example,<br>
in-depth data collection for features like Media Viewer (see<br>
dashboards at [13] ) is only possible because of the EventLogging<br>
extension developed by Ori Livneh, and the increasing use of this<br>
technology by WMF developers. EventLogging requires significant<br>
management, maintenance and teaching effort from the analytics team.<br>
<br>
Lila is requesting visibility into all primary funnels on Wikimedia<br>
sites (e.g. sign-ups, edits/saves through wikitext, edits/saves<br>
through VisualEditor, etc.), and this will require lots of sustained<br>
effort from lots of people to get done. What it will give us is a<br>
better sense of where people succeed and fail to complete an action --<br>
by way of example, see the initial UploadWizard funnel analysis here:<br>
<a href="https://www.mediawiki.org/wiki/UploadWizard/Funnel_analysis" target="_blank">https://www.mediawiki.org/wiki/UploadWizard/Funnel_analysis</a><br>
<br>
- Improved software and infrastructure support for A/B testing,<br>
possibly including adoption of existing open source tooling such as<br>
Facebook's PlanOut library/interpreter [14].<br>
<br>
- Improved readership metrics, possibly including a privacy-sensitive<br>
approach to estimating Unique Visitors, and better geographic<br>
breakdowns for readers/editors.<br>
<br>
These are all complex problems, most of which are dependent on the<br>
small analytics team, and feedback on projects and priorities is very<br>
much welcome on the analytics mailing list:<br>
<a href="https://lists.wikimedia.org/mailman/listinfo/analytics" target="_blank">https://lists.wikimedia.org/mailman/listinfo/analytics</a><br>
<br>
With regard to better embedding of graphs in wikis specifically, Yuri<br>
Astrakhan has led the development of a new extension, inspired by work<br>
by Dan Andreescu, to visualize data directly in wikis. This extension<br>
has been deployed already to Meta and MediaWiki.org and can be used<br>
for dynamic graphs where it's appropriate to not have a fallback to a<br>
static image, for example in grant reports. See:<br>
<a href="https://www.mediawiki.org/wiki/Extension:Graph" target="_blank">https://www.mediawiki.org/wiki/Extension:Graph</a><br>
<a href="https://www.mediawiki.org/wiki/Extension:Graph/Demo" target="_blank">https://www.mediawiki.org/wiki/Extension:Graph/Demo</a><br>
<a href="https://meta.wikimedia.org/wiki/Graph:User:Yurik_(WMF)/Obama" target="_blank">https://meta.wikimedia.org/wiki/Graph:User:Yurik_(WMF)/Obama</a><br>
<br>
I agree this is the kind of functionality that should make its way<br>
into Wikipedia. Again, we need to judge throwing a full team behind<br>
that against the relative priority of other work. In the meantime,<br>
Yuri and others will continue to push it along and may even be able to<br>
get it all the way there in due time. The main blockers, from what I<br>
can tell, are generation of static fallback images for users without<br>
JavaScript, and a better way to manage the data sources.<br>
<br>
In general, the point of my original message was this: All<br>
organizations that seek to improve Wikipedia and the other Wikimedia<br>
projects ultimately depend on technology to do so; to view WMF as the<br>
sole "tech provider" does not scale. Larger, well-funded chapters can<br>
take on big, hairy challenges like Wikidata; smaller, less-funded orgs<br>
are better positioned to work on specialized technical support for<br>
programmatic work.<br>
<br>
I would caution against requesting WMF to work on highly specialized<br>
solutions for highly specialized problems. If such solutions are<br>
needed, I would caution against building them into MediaWiki unless<br>
they can be generalized to benefit a larger number of users, at which<br>
point it's appropriate to seek partnership with WMF, or to ask WMF for<br>
the relative priority of such work. But often, it's perfectly fine<br>
(and much faster) to build such tools and reports independently, and<br>
to ask WMF for help in providing APIs/services/data/infrastructure to<br>
get it done.<br>
<br>
Cheers,<br>
Erik<br>
<br>
[1] <a href="http://tools.wmflabs.org/glamtools/" target="_blank">http://tools.wmflabs.org/glamtools/</a><br>
[2] <a href="https://outreach.wikimedia.org/wiki/Category:This_Month_in_GLAM_Tool_testing_reports" target="_blank">https://outreach.wikimedia.org/wiki/Category:This_Month_in_GLAM_Tool_testing_reports</a><br>
[3] <a href="https://www.mediawiki.org/wiki/Extension:GWToolset" target="_blank">https://www.mediawiki.org/wiki/Extension:GWToolset</a><br>
[4] <a href="https://commons.wikimedia.org/w/index.php?title=Special%3AListUsers&username=&group=gwtoolset&limit=50" target="_blank">https://commons.wikimedia.org/w/index.php?title=Special%3AListUsers&username=&group=gwtoolset&limit=50</a><br>
[5] <a href="https://www.mediawiki.org/wiki/Special:OAuthListConsumers?name=&publisher=&stage=0" target="_blank">https://www.mediawiki.org/wiki/Special:OAuthListConsumers?name=&publisher=&stage=0</a><br>
[6] <a href="https://github.com/wikimedia/varnishkafka" target="_blank">https://github.com/wikimedia/varnishkafka</a><br>
[7] <a href="https://wikitech.wikimedia.org/wiki/Analytics/Pagecounts-all-sites" target="_blank">https://wikitech.wikimedia.org/wiki/Analytics/Pagecounts-all-sites</a><br>
[8] <a href="https://metrics.wmflabs.org/static/public/dash/" target="_blank">https://metrics.wmflabs.org/static/public/dash/</a><br>
[9] <a href="http://stats.wikimedia.org/" target="_blank">http://stats.wikimedia.org/</a><br>
[10] <a href="http://reportcard.wmflabs.org/" target="_blank">http://reportcard.wmflabs.org/</a><br>
[11] <a href="https://metrics.wmflabs.org/" target="_blank">https://metrics.wmflabs.org/</a><br>
[12] <a href="https://meta.wikimedia.org/wiki/Grants:Learning_%26_Evaluation/Global_metrics" target="_blank">https://meta.wikimedia.org/wiki/Grants:Learning_%26_Evaluation/Global_metrics</a><br>
[13] <a href="http://multimedia-metrics.wmflabs.org/dashboards/mmv" target="_blank">http://multimedia-metrics.wmflabs.org/dashboards/mmv</a><br>
[14] <a href="https://github.com/facebook/planout" target="_blank">https://github.com/facebook/planout</a><br>
--<br>
Erik Möller<br>
VP of Product & Strategy, Wikimedia Foundation<br>
<br>
_______________________________________________<br>
Wikimedia-l mailing list, guidelines at: <a href="https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org" target="_blank">https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines<br>
Wikimedia-l@lists.wikimedia.org</a><br>
Unsubscribe: <a href="https://lists.wikimedia.org/mailman/listinfo/wikimedia-l" target="_blank">https://lists.wikimedia.org/mailman/listinfo/wikimedia-l</a>, <mailto:<a href="mailto:wikimedia-l-request@lists.wikimedia.org">wikimedia-l-request@lists.wikimedia.org</a>?subject=unsubscribe></div>