Pursuant to prior discussions about the need for a research
policy on Wikipedia, WikiProject Research is drafting a
policy regarding the recruitment of Wikipedia users to
participate in studies.
At this time, we have a proposed policy, and an accompanying
group that would facilitate recruitment of subjects in much
the same way that the Bot Approvals Group approves bots.
The policy proposal can be found at:
The Subject Recruitment Approvals Group mentioned in the proposal
is being described at:
Before we move forward with seeking approval from the Wikipedia
community, we would like additional input about the proposal,
and would welcome additional help improving it.
Also, please consider participating in WikiProject Research at:
University of Minnesota
as part of https://phabricator.wikimedia.org/T205846 we are going to ask to
all the stat1005's users to move to stat1007 during the next two weeks. The
deadline is November 14th, by which time ssh access to stat1005 will be
Background: on stat1005 we have a GPU (more details in
https://phabricator.wikimedia.org/T148843) that has been sitting there for
almost two years, and it would be great to try to make it work during the
next months. This effort will require a lot of tests/reboots/etc.. that can
of course impact ongoing work of all of you, so we prefer to move everybody
to another identical machine beforehand.
Please reach out to me or to the analytics team in T205846 or IRC
(#wikimedia-analytics on Freenode) if you have any
questions/doubts/blocker/etc.., we are not going to enforce the deadline if
anybody will raise concerns or blockers of course. It would be great to
move everybody by Nov 14th but we surely don't want to disrupt any ongoing
I am going to update the Wikitech documentation about stat1005 and stat1007
as soon as possible, for the moment keep in mind that stat1007 will take
over completely everything that stat1005 currently does.
I have already copied over all the stat1005 directories to stat1007, and
I'll periodically sync them during the following days. If you don't find
anything important, please add a note in T205846.
Thanks a lot and sorry for the trouble,
Luca (on behalf of the Analytics team)
as part of a larger project, we are running a small think-aloud study to
better understand how editors use current interface features and tools to
identify "suspicious" edits (either suspected vandalism or red flags for
bias, that kind of thing).
Just posting for comment at this point before we submit our IRB
documentation. Also happy to hear about existing research we may have
:: Andrea Forte
:: Associate Professor
:: College of Computing and Informatics, Drexel University
We’re preparing for the October 2018 research newsletter and looking for contributors. Please take a look at https://etherpad.wikimedia.org/p/WRN201810 and add your name next to any paper you are interested in covering. Our target publication date is on October 28 UTC although actual publication might happen several days later. As usual, short notes and one-paragraph reviews are most welcome.
Highlights from this month:
- Deliberation and Resolution on Wikipedia: A Case Study of Requests for Comments
- Indigenizing Wikipedia: Student Accountability to Native American Authors on the World’s Largest Encyclopedia
- Population preferences through Wikipedia edits
- Schema Inference on Wikidata
- Studying the Effect of Network Position on Efficiency: : A Case of Affiliation Network Featured Article Promotion
- Volunteer Retention, Burnout and Dropout in Online Voluntary Organizations: Stress, Conflict and Retirement of Wikipedians
- Welcome' Changes? Descriptive and Injunctive Norms in a Wikipedia Sub-Community
- Wikidata: A New Paradigm of Human-Bot Collaboration?
- World Influence of Infectious Diseases from Wikipedia Network Analysis
Masssly, Tilman Bayer and Dario Taraborelli
The next Research Showcase will be live-streamed this Wednesday, October
17, 2018 at 11:30 AM (PST) 18:30 UTC.
YouTube stream: https://www.youtube.com/watch?v=UJrJLWuNvXo
As usual, you can join the conversation on IRC at #wikimedia-research. You
can also watch our past research showcases here: https://www.mediawiki.or
This month's presentation:
*"Welcome" Changes? Descriptive and Injunctive Norms in a Wikipedia
*By Jonathan T. Morgan, Wikimedia Foundation and Anna Filippova, GitHub*
Open online communities rely on social norms for behavior regulation, group
cohesion, and sustainability. Research on the role of social norms online
has mainly focused on one source of influence at a time, making it
difficult to separate different normative influences and understand their
interactions. In this study, we use the Focus Theory to examine
interactions between several sources of normative influence in a Wikipedia
sub-community: local descriptive norms, local injunctive norms, and norms
imported from similar sub- communities. We find that exposure to injunctive
norms has a stronger effect than descriptive norms, that the likelihood of
performing a behavior is higher when both injunctive and descriptive norms
are congruent, and that conflicting social norms may negatively impact
pro-normative behavior. We contextualize these findings through member
interviews, and discuss their implications for both future research on
normative influence in online groups and the design of systems that support
*The pipeline of online participation inequalities: The case of Wikipedia
*By Aaron Shaw, Northwestern University and Eszter Hargittai, University of
Participatory platforms like the Wikimedia projects have unique potential
to facilitate more equitable knowledge production. However, digital
inequalities such as the Wikipedia gender gap undermine this democratizing
potential. In this talk, I present new research in which Eszter Hargittai
and I conceptualize a "pipeline" of online participation and model distinct
levels of awareness and behaviors necessary to become a contributor to the
participatory web. We test the theory in the case of Wikipedia editing,
using new survey data from a diverse, national sample of adult internet
users in the U.S.
The results show that Wikipedia participation consistently reflects
inequalities of education and internet experiences and skills. We find that
the gender gap only emerges later in the pipeline whereas gaps along racial
and socioeconomic lines explain variations earlier in the pipeline. Our
findings underscore the multidimensionality of digital inequalities and
suggest new pathways toward closing knowledge gaps by highlighting the
importance of education and Internet skills.
We conclude that future research and interventions to overcome digital
participation gaps should not focus exclusively on gender or class
differences in content creation, but expand to address multiple aspects of
digital inequality across pipelines of participation. In particular, when
it comes to overcoming gender gaps in the case of Wikipedia, our results
suggest that continued emphasis on recruiting female editors should include
efforts to disseminate the knowledge that Wikipedia can be edited. Our
findings support broader efforts to overcome knowledge- and skill-based
barriers to entry among potential contributors to the open web.
Administrative Assistant - Audiences & Technology
1 Montgomery St. Suite 1600
San Francisco, CA 94104
Forwarding a request for input.
( https://meta.wikimedia.org/wiki/User:Pine )
---------- Forwarded message ---------
From: Joe Sutherland <jsutherland(a)wikimedia.org>
Date: Fri, Oct 5, 2018 at 9:29 PM
Subject: [Analytics] Community health metrics kit: Input needed!
To: A mailing list for the Analytics Team at WMF and everybody who has an
interest in Wikipedia and analytics. <analytics(a)lists.wikimedia.org>
Hello everyone - apologies for cross-posting! *TL;DR*: We would like your
feedback on our Metrics Kit project. Please have a look and comment on
The Wikimedia Foundation's Trust and Safety team, in collaboration with the
Community Health Initiative, is working on a Metrics Kit designed to
measure the relative "health" of various communities that make up the
The ultimate outcome will be a public suite of statistics and data looking
at various aspects of Wikimedia project communities. This could be used by
both community members to make decisions on their community direction and
Wikimedia Foundation staff to point anti-harassment tool development in the
We have a set of metrics we are thinking about including in the kit,
ranging from the ratio of active users to active administrators,
administrator confidence levels, and off-wiki factors such as freedom to
participate. It's ambitious, and our methods of collecting such data will
Right now, we'd like to know:
* Which metrics make sense to collect? Which don't? What are we missing?
* Where would such a tool ideally be hosted? Where would you normally look
for statistics like these?
* We are aware of the overlap in scope between this and Wikistats <
https://stats.wikimedia.org/v2/#/all-projects> — how might these tools
Your opinions will help to guide this project going forward. We'll be
reaching out at different stages of this project, so if you're interested
in direct messaging going forward, please feel free to indicate your
interest by signing up on the consultation page.
Looking forward to reading your thoughts.
P.S.: Please feel free to CC me in conversations that might happen on this
 What do we mean by "health"? There is no standard definition of what
makes a Wikimedia community "healthy", but there are many indicators that
highlight where a wiki is doing well, and where it could improve. This
project aims to provide a variety of useful data points that will inform
community decisions that will benefit from objective data.
*Joe Sutherland* (he/him or they/them)
Trust and Safety Specialist
Analytics mailing list
Forwarding some good news.
( https://meta.wikimedia.org/wiki/User:Pine )
---------- Forwarded message ---------
From: Markus Kroetzsch <markus.kroetzsch(a)tu-dresden.de>
Date: Sat, Oct 13, 2018 at 12:30 AM
Subject: [Wikidata] Passing on praise/ISWC trip report
To: Discussion list for the Wikidata project. <wikidata(a)lists.wikimedia.org>
I am happy to report that we have just won the Best Paper Award of the
In-Use track of this year's International Semantic Web Conference
(ISWC), for our description of the SPARQL/RDF technology use on Wikidata
. I keep telling people here that the general awesomeness of Wikidata
is the work of many, and in particular of this great community of editors.
Overall, the year's ISWC here in Monterey, CA has surprised Denny and me
with the huge uptake that Wikidata gets by now in industry and academia
alike, which was a huge breakthrough over last year. An amazing array of
people are doing great work based on this data, and again I would like
to pass on all the thank you's I have heard over this week to all of you
working hard to make this happen. Users range from individual students
to major tech companies, and I hope there will be many contributions
flowing back to us through these stakeholders. We also have seen an
increasing amount of research being done using Wikidata for evaluation
and testing, again both in published works and in conversations with
people in small and big organisations.
Let me also congratulate Fariz Darari, who is not a stranger to this
list either, on receiving the Best Dissertation Award of the Semantic
Web Science Association for the research that is also behind some of the
tools he has been creating for Wikidata.
I gave another talk related to Wikidata's ontological modeling, which I
hope did not represent the situation all too wrongly  ;-). There will
be videos of this and the best paper presentation, and most other ISWC
talks on VideoLectures in the not-so-far future.
Finally, since our best paper is about the use of BlazeGraph as a
platform for queries, let me also mention that we have had a number of
productive meetings here to discuss the future of this great software
(you may know that there were some organisational changes to the team
developing this so far). There will be opportunity to contribute to this
open source project, either as a developer or in other ways, in the
future. Stay tuned for more information on this.
So, thanks again to everyone working towards the great success of this
project -- amazing work!
Greetings from Asilomar
 Stanislav Malyshev, Markus Krötzsch, Larry González, Julius Gonsior,
Adrian Bielefeldt: "Getting the Most out of Wikidata: Semantic
Technology Usage in Wikipedia’s Knowledge Graph"
Talk slides and paper online:
 Markus Krötzsch
Ontological Modelling in Wikidata
Invited keynote at the 9th Workshop on Ontology Design and Patterns (WOP'18)
Prof. Dr. Markus Kroetzsch
Knowledge-Based Systems Group
Center for Advancing Electronics Dresden (cfaed)
Faculty of Computer Science
+49 351 463 38486
Wikidata mailing list
the Analytics team is going to move the Oozie and Hive daemons from the
analytics1003 host to an-coord1001 (new host, hardware refresh) on Tuesday
Oct 9th at 10 AM CEST. This will require downtime for Oozie and Hive, so
some jobs might fail or not work at all during the maintenance. We have
allocated two hours for this procedure but it should require less time.
Tracking task: T205509
As always, please follow up with me or anybody in the analytics team for
clarifications and/or comments (via Phabricator or IRC Freenode
Thanks for the patience!
Luca (on behalf of the Analytics team)