Hi everyone,
We are delighted to announce that Wiki Workshop 2020 will be held in
Taipei on April 20 or 21, 2020 (the date to be finalized soon) and as
part of the Web Conference 2020 [1]. In the past years, Wiki Workshop
has traveled to Oxford, Montreal, Cologne, Perth, Lyon, and San
Francisco.
You can read more about the call for papers and the workshops at
http://wikiworkshop.org/2020/#call. Please note that the deadline for
the submissions to be considered for proceedings is January 17. All
other submissions should be received by February 21.
If you have questions about the workshop, please let us know on this
list or at wikiworkshop(a)googlegroups.com.
Looking forward to seeing you in Taipei.
Best,
Miriam Redi, Wikimedia Foundation
Bob West, EPFL
Leila Zia, Wikimedia Foundation
[1] https://www2020.thewebconf.org/
Hello everyone - apologies for cross-posting! *TL;DR*: We would like your
feedback on our Metrics Kit project. Please have a look and comment on
Meta-Wiki:
https://meta.wikimedia.org/wiki/Community_health_initiative/Metrics_kit
The Wikimedia Foundation's Trust and Safety team, in collaboration with the
Community Health Initiative, is working on a Metrics Kit designed to
measure the relative "health"[1] of various communities that make up the
Wikimedia movement:
https://meta.wikimedia.org/wiki/Community_health_initiative/Metrics_kit
The ultimate outcome will be a public suite of statistics and data looking
at various aspects of Wikimedia project communities. This could be used by
both community members to make decisions on their community direction and
Wikimedia Foundation staff to point anti-harassment tool development in the
right direction.
We have a set of metrics we are thinking about including in the kit,
ranging from the ratio of active users to active administrators,
administrator confidence levels, and off-wiki factors such as freedom to
participate. It's ambitious, and our methods of collecting such data will
vary.
Right now, we'd like to know:
* Which metrics make sense to collect? Which don't? What are we missing?
* Where would such a tool ideally be hosted? Where would you normally look
for statistics like these?
* We are aware of the overlap in scope between this and Wikistats <
https://stats.wikimedia.org/v2/#/all-projects> — how might these tools
coexist?
Your opinions will help to guide this project going forward. We'll be
reaching out at different stages of this project, so if you're interested
in direct messaging going forward, please feel free to indicate your
interest by signing up on the consultation page.
Looking forward to reading your thoughts.
best,
Joe
P.S.: Please feel free to CC me in conversations that might happen on this
list!
[1] What do we mean by "health"? There is no standard definition of what
makes a Wikimedia community "healthy", but there are many indicators that
highlight where a wiki is doing well, and where it could improve. This
project aims to provide a variety of useful data points that will inform
community decisions that will benefit from objective data.
--
*Joe Sutherland* (he/him or they/them)
Trust and Safety Specialist
Wikimedia Foundation
joesutherland.rocks
Hi everybody,
the Analytics team is going to enable Kerberos authentication for Hadoop on
Monday December 2nd. The procedure will start around 10 AM CET and will
hopefully last 3/4 hours, but since this is an invasive change there might
be a possibility that it will last more. If you have anything important
that requires Hadoop on this date please let us know in advance.
The most visible change from the user's point of view is the introduction
of a new account/password to be able to use the Hadoop services (like
Hive/HDFS/Spark/Oozie). We created a user guide about what will change with
kerberos in
https://wikitech.wikimedia.org/wiki/Analytics/Systems/Kerberos/UserGuide.
There is also a task opened to track any doubt/question/special-use-cases
during the next two weeks: https://phabricator.wikimedia.org/T238560.
Feel free to reach out to IRC #wikimedia-analytics on Freenode too!
Thanks!
Luca (on behalf of the Analytics team)
FYI
---------- Forwarded message ---------
From: Ariel Glenn WMF <ariel(a)wikimedia.org>
Date: Wed, Nov 27, 2019 at 5:38 AM
Subject: [Wikitech-l] BREAKING CHANGE: schema update, xml dumps
To: Wikipedia Xmldatadumps-l <Xmldatadumps-l(a)lists.wikimedia.org>,
Wikimedia developers <wikitech-l(a)lists.wikimedia.org>
We plan to move to the new schema for xml dumps for the February 1, 2020
run. Update your scripts and apps accordingly!
The new schema contains an entry for each 'slot' of content. This means
that, for example, the commonswiki dump will contain MediaInfo information
as well as the usual wikitext. See
https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/core/+/master/docs…
for the schema and
https://www.mediawiki.org/wiki/Requests_for_comment/Schema_update_for_multi…
for further explanation and example outputs.
Phabricator task for the update: https://phabricator.wikimedia.org/T238972
PLEASE FORWARD to other lists as you deem appropriate. Thanks!
Ariel Glenn
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Hi all,
The next Research Showcase will be live-streamed on Wednesday, November 20,
2019, at 9:30 AM PST/17:30 UTC. We’ll have a presentation from Martin
Potthast of Leipzig University on text reuse in Wikipedia and other
presentation from the Wikimedia Foundation’s Isaac Johnson on the
demographics and interests of Wikipedia’s readers.
YouTube stream: https://www.youtube.com/watch?v=tIko_V1k09s
As usual, you can join the conversation on IRC at #wikimedia-research. You
can also watch our past research showcases here:
https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase
This month's presentations:
Wikipedia Text Reuse: Within and Without
By Martin Potthast, Leipzig University
We study text reuse related to Wikipedia at scale by compiling the first
corpus of text reuse cases within Wikipedia as well as without (i.e., reuse
of Wikipedia text in a sample of the Common Crawl). To discover reuse
beyond verbatim copy and paste, we employ state-of-the-art text reuse
detection technology, scaling it for the first time to process the entire
Wikipedia as part of a distributed retrieval pipeline. We further report on
a pilot analysis of the 100 million reuse cases inside, and the 1.6 million
reuse cases outside Wikipedia that we discovered. Text reuse inside
Wikipedia gives rise to new tasks such as article template induction,
fixing quality flaws, or complementing Wikipedia’s ontology. Text reuse
outside Wikipedia yields a tangible metric for the emerging field of
quantifying Wikipedia’s influence on the web. To foster future research
into these tasks, and for reproducibility’s sake, the Wikipedia text reuse
corpus and the retrieval pipeline are made freely available. Paper
<https://webis.de/publications.html#?q=wikipedia%20ecir%202019>, Demo
<https://demo.webis.de/wikipedia-text-reuse/>
Characterizing Wikipedia Reader Demographics and Interests
By Isaac Johnson, Wikimedia Foundation
Building on two past surveys on the motivation and needs of Wikipedia
readers (Why We Read Wikipedia
<https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#November_2016>; Why
the World Reads Wikipedia
<https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#December_2018>),
we examine the relationship between Wikipedia reader demographics and their
interests and needs. Specifically, we run surveys in thirteen different
languages that ask readers three questions about their motivation for
reading Wikipedia (motivation, needs, and familiarity) and five questions
about their demographics (age, gender, education, locale, and native
language). We link these survey results with the respondents' reading
sessions -- i.e. sequence of Wikipedia page views -- to gain a more
fine-grained understanding of how a reader's context relates to their
activity on Wikipedia. We find that readers have a diversity of backgrounds
but that the high-level needs of readers do not correlate strongly with
individual demographics. We also find, however, that there are
relationships between demographics and specific topic interests that are
consistent across many cultures and languages. This work provides insights
into the reach of various Wikipedia language editions and the relationship
between content or contributor gaps and reader gaps. See the meta page
<https://meta.wikimedia.org/wiki/Research:Characterizing_Wikipedia_Reader_Be…>
for more details.
--
Janna Layton (she, her)
Administrative Assistant - Product & Technology
Wikimedia Foundation <https://wikimediafoundation.org/>
Hi all,
Wikimedia Research Showcase [1] is almost six years old and we're
using the birthday opportunity to step back and reflect on the past,
celebrate the contributions by more than 70 speakers and many of you
who participated in the discussions, and plan for its future.
We would like to ask for your input as we're thinking about the future
of the Research Showcases. We want to hear from those of you who
participated in the showcases and/or watched them, as well as those of
you who decided this is not something for you. :) In order to gather
your input, we have put together a survey that we'd appreciate if you
participate in.
Link to survey (please note that the link will take you to Google
Forms [2]): https://docs.google.com/forms/d/e/1FAIpQLSecgn8cMu5IfTYRgn93bfOiJVEIL09RRf_…
Your contributions to this survey can help us in our thinking as we
move forward. Please submit your responses by 2019-11-22.
Sincerely,
Jonathan Morgan and Leila Zia
Research, Wikimedia Foundation
[1] https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase
[2] If you want to participate but not through Google Forms, ping me
off-list and I'll send you a pdf file you can fill and send back to me
(it won't be anonymous though. sorry.). I'm not attaching it to this
email as some lists may put my email in the moderation queue with an
attachment. (And I don't /think/ I can upload it to Commons.)
Today we are releasing a new dataset meant to help us understand the impact
of grants and programs on editing. This data was requested several years
ago, and we took a long time to bring in the privacy and security experts
whose help we needed to release it. With that work done, you can download
the data here: https://dumps.wikimedia.org/other/geoeditors/ and read about
it here:
https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Geoeditors/Pu…
You can send questions or comments on this thread or on the discussion page.