Pursuant to prior discussions about the need for a research
policy on Wikipedia, WikiProject Research is drafting a
policy regarding the recruitment of Wikipedia users to
participate in studies.
At this time, we have a proposed policy, and an accompanying
group that would facilitate recruitment of subjects in much
the same way that the Bot Approvals Group approves bots.
The policy proposal can be found at:
http://en.wikipedia.org/wiki/Wikipedia:Research
The Subject Recruitment Approvals Group mentioned in the proposal
is being described at:
http://en.wikipedia.org/wiki/Wikipedia:Subject_Recruitment_Approvals_Group
Before we move forward with seeking approval from the Wikipedia
community, we would like additional input about the proposal,
and would welcome additional help improving it.
Also, please consider participating in WikiProject Research at:
http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Research
--
Bryan Song
GroupLens Research
University of Minnesota
Dear WikiFriends,
I'm hoping you & yours are all well, safe and healthy in these
unprecedented times we have all found ourselves in.
I'm writing you today with my 'educator & researcher' hat on, with a
special request to help Piotr Konieczny & I spread the word about a new
global research we are conducting.
While using Wikimedia-related assignments (Wikipedia, Commons, WikiBooks,
WikiSource, Wikidata, Wiktionary etc) in the classroom has been used all
over the world for over a decade, very little research was conducted about
what instructors who have tried it actually think about the experience.
We are hoping that answering the questions in the survey will help us
better understand:
- Whether this teaching approach is effective (or not)
- What are some of the challenges experienced by instructors
- How the process could be improved
The questions are meant for any instructors running a wiki assignment,
whether it is in k-12 or higher education, formal or informal educational
setting. We are hoping the results will allow us to globally share
experiences and learn from one another, so we can make it smoother, easier
and more effective for educators joining these efforts.
It is important to note that this would be the first time (that we know
of!) that an academic research of this type has been conducted around the
world, so we really need your help in spreading the word about it in your
local communities. We're hoping that any of you, supporting such
initiatives around the world over the years, would forward it to your local
Education contacts and ask them to participate. The more instructors
participating, the better.
We realize that it would have been great to have the questionnaire in a
variety of languages, but in order for us to process the data properly and
not via third-party translations and keep the anonymity and privacy of
participants, it was decided to release the survey just in English.
Here is a link to the survey - https://tinyurl.com/yd6dfata
<https://l.facebook.com/l.php?u=https%3A%2F%2Ftinyurl.com%2Fyd6dfata%3Ffbcli…>
.
Thank you all in advance, and of course, if there are any questions, Piotr
& I are here.
Stay healthy & safe!
Shani.
-----------------------------------------------
*Shani Evenstein Sigalov*
* Lecturer, Tel Aviv University.
* EdTech Innovation Strategist, NY/American Medical Program, Sackler School
of Medicine, Tel Aviv University.
* PhD Candidate, School of Education, Tel Aviv University.
* Azrieli Foundation Research Fellow.
* OER & Emerging Technologies Coordinator, UNESCO Chair
<https://education.tau.ac.il/node/3495> on Technology, Internationalization
and Education, School of Education, Tel Aviv University
<https://education.tau.ac.il/node/3495>.
* Member of the Board of Trustees
<https://wikimediafoundation.org/profile/shani-evenstein-sigalov/>, Wikimedia
Foundation <https://wikimediafoundation.org/>.
* Chairperson, The Hebrew Literature Digitization Society
<http://www.israelgives.org/amuta/580428621>.
* Chief Editor, Project Ben-Yehuda <http://benyehuda.org>.
+972-525640648
Hello,
Due to a lot of free time these days I started a personal research project
on gender bias in contributors to the French-language Wikipedia.
My goal is to explore the relation between contributor genders and the
people they create articles about. The hypotheses are:
1- contributors predominantly write biographies of people with the same
gender. Simplistically: men write about men; women write about women.
2- there are a lot fewer female contributors than male ones. This has been
studied in the past but AFAIK we don’t have recent numbers and they are
all on the English-language WP.
If these two hypotheses are true, this could explain part of the problem
with gender bias in biographies.
What I’m struggling with –And I guess some people before me did as well on
the English-language WP– is the very low level of information we have on
contributors’ genders: on WP:FR, 60-70% of contributors have not changed
their gender in their user settings.
Does anyone have any pointer on this?
More insights below:
Looking at the contributors with ≥500 edits, 2.4% are auto-declared as
female; 27.4% as male; 70.2% as 'unknown' (undeclared).
By definition, there’s no apparent way to know the approximate gender
repartition of the undeclared-gender accounts.
The French-language Wikipedia shows male- and unknown-gender user pages
with the 'Utilisateur:' prefix while the female-gender user pages use the
'Utilisatrice:' prefix. Based on this, one would assume that women would
be more inclined toward declaring their gender so that the interface would
stop misgendering them. However, we know that female users tend to
under-declare their gender to protect themselves.
I assumed that older accounts would be more inclined toward having a
declared gender but that’s not the case: >60% of accounts of all ages
(except the very old ones but the sample is very small) have not declared
their gender, see:
https://commons.wikimedia.org/wiki/File:Gender_repartition_of_Le_Bistro_WP-…
Some users have user boxes on their user page with various info. Some of
them declare their gender. Surprisingly however, most of the users with
these boxes have not declared their gender in their preferences.
Out of the 434 users with a "I’m a woman" user box on their page, only
32% are auto-declared as female. Same ratio for the 2773 "I’m a man" users:
only 34% are auto-declared as male. It goes up to 36 % for the "I’m a
lesbian" box (N=14) and 40% for the "I’m a gay" one (N=86).
As I expected, predominantly-male professions have a larger male population
in their box usage, but still an even larger 'unknown' population:
Out of the 640 "I’m an engineer" box users, 24% self-declared as 'male' and
1% as 'female'. For the 714 "I’m a computers person", that’s 27.7% and 0.6%.
However some boxes where I wouldn’t expect a large bias have one as well.
The Babel Italian users are 18% male and 2% female (N=2885). The Esperanto
ones are 24.5% male and 0.8% female (N=493).
There is certainly a bias in box usage: newer users tend to use them a lot
less than older users, and I would assume users who talk about themselves
with boxes don’t have the same profile as the average contributor.
Thanks,
--
Baptiste Fontaine
This might be of interest to some Research and Education folks too.
Pine
( https://meta.wikimedia.org/wiki/User:Pine )
---------- Forwarded message ---------
From: Amir E. Aharoni <amir.aharoni(a)mail.huji.ac.il>
Date: Mon, May 25, 2020 at 7:22 PM
Subject: [Wikimedia-l] Language Showcase, May 2020
To: wikimedia-l <wikimedia-l(a)lists.wikimedia.org>
Hello,
This is an announcement about a new installment of the Language Showcase, a
series of presentations about various aspects of language diversity and its
connection to Wikimedia Projects.
This new installment will deal with the latest design research about the
upcoming section translation feature for Content Translation.
This session is going to be broadcast over Zoom, and a recording will be
published for later viewing. You can also participate in the conversation
on IRC or with us on the Zoom meeting.
Please read below for the event details, including local time, joining
links and do let us know if you have any questions.
Thank you!
Amir
== Details ==
# Event: Language Showcase #5
# When: May 27, 2020 (Wednesday) at 13:00 UTC (check local time
https://www.timeanddate.com/worldclock/fixedtime.html?iso=20200527T1300 )
# Where:
Join Zoom Meeting
https://wikimedia.zoom.us/j/97081030000
Meeting ID: 970 8103 0000
IRC - #wikimedia-office (on Freenode)
# Agenda:
The latest design research about the upcoming section translation feature
for Content Translation.
--
Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
http://aharoni.wordpress.com
“We're living in pieces,
I want to live in peace.” – T. Moore
_______________________________________________
Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l(a)lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
Hi all,
The Research team at the Wikimedia Foundation has officially started a
new Formal Collaboration
<https://www.mediawiki.org/wiki/Wikimedia_Research/Formal_collaborations>
with the *Institute of Basic Science* (IBS) from South Korea to work
collaboratively on *Discovering content inconsistencies between
Wikidata and Wikipedia *
<https://meta.wikimedia.org/wiki/Research:Discovering_content_inconsistencie…>
as part of the *Knowledge Integrity program*
<https://research.wikimedia.org/knowledge-integrity.html>.
Here are a few pieces of information about this collaboration that we
would like to share with you:
* We aim to keep the research documentation for this project in the
corresponding research page on meta
<https://meta.wikimedia.org/wiki/Research:Discovering_content_inconsistencie…>.
* Meeyoung Cha from IBS & KAIST and her collaborators Cheng-Te Li and
Yi-Ju Lu from the National Cheng Kung University (Taiwan) and Jing Ma
from Hong Kong Baptist University, will be contributing to this
project. We are thankful to them for agreeing to spend their time and
expertise on this project in the coming 3 months and to those of you
who have already worked with us as we were shaping the proposal for
this project and are planning to continue your contributions to this
program.
* I act as the point of contact for this research in the Wikimedia
Foundation. Please feel free to reach out to me (directly, if it
cannot be shared publicly) if you have comments or questions about the
project.
Best,
*Diego Sáez TrumperResearch Scientist
User:Diego_(WMF) <https://meta.wikimedia.org/wiki/User:Diego_(WMF)> *
Hi everyone,
We’re preparing for the May 2020 research newsletter and looking for contributors. Please take a look at https://etherpad.wikimedia.org/p/WRN202005 and add your name next to any paper you are interested in covering. Our target publication date is May 31, 2020 18:00 UTC. If you can't make this deadline but would like to cover a particular paper in the subsequent issue, leave a note next to the paper's entry below. As usual, short notes and one-paragraph reviews are most welcome.
Highlights from this month:
- A Deeper Investigation of the Importance of Wikipedia Links to the Success of Search Engines
- A Large-scale Study of Wikipedia Users' Quality of Experience
- Adding evidence of the effects of treatments into relevant Wikipedia pages: a randomised trial
- Analyzing Wikipedia Users’ Perceived Quality Of Experience: A Large-Scale Study
- Beyond Performing Arts: Network Composition and Collaboration Patterns
- Citation Detective: a Public Dataset to Improve and Quantify Wikipedia Citation Quality at Scale
- Collaboration of Open Content News in Wikipedia: The Role and Impact of Gatekeepers
- Content Growth and Attention Contagion in Information Networks: Addressing Information Poverty on Wikipedia
- Detecting Undisclosed Paid Editing in Wikipedia
- Diagnosing Incompleteness in Wikidata with The Missing Path
- Domain-Specific Automatic Scholar ProfilingBased on Wikipedia
- How Wikipedia disease information evolve over time? An analysis of disease-based articles changes
- Knowledge Graphs on the Web -- an Overview
- Layered Graph Embedding for Entity Recommendation using Wikipedia in the Yahoo! Knowledge Graph
- Lexemes in Wikidata: 2020 status
- Mapping Wikipedia
- Matching Ukrainian Wikipedia Red Links with English Wikipedia’s Articles
- Measuring Social Bias in Knowledge Graph Embeddings
- Multi-class Multilingual Classification of Wikipedia Articles Using Extended Named Entity Tag Set
- Novel version of PageRank, CheiRank and 2DRank for Wikipedia in Multilingual Network using Social Impact
- Situating Wikipedia as a health information resource in various contexts: A scoping review
- The Political Geography of Shoah Knowledge and Awareness, Estimated from the Analysis of Global Library Catalogues and Wikipedia User Statistics
- The Positioning Matters: Estimating Geographical Bias in the Multilingual Record of Biographies on Wikipedia
- The Subversive Potential of Wikipedia: A Resource for Diversifying Political Science Content Online
- Vandalism Detection in Crowdsourced Knowledge Bases
- Visual Narratives and Collective Memory across Peer-Produced Accounts of Contested Sociopolitical Events
- Visualising open communities. Guidelines from three case studies
- WAC: A Corpus of Wikipedia Conversations for Online Abuse Detection
- Wikigender: A Machine Learning Model to Detect Gender Bias in Wikipedia
- WikiHist.html: English Wikipedia's Full Revision History in HTML Format
Masssly and Tilman Bayer
[1] http://meta.wikimedia.org/wiki/Research:Newsletter[2] WikiResearch (@WikiResearch) | Twitter
*The First Wikidata Workshop*
Co-located with the 19th International Conference on Semantic Web (ISWC
2020).
Date: To be announced (late October, early November)
The workshop will be held online, afternoon European time.
Website: https://wikidataworkshop.github.io/
== Important dates ==
Papers due: August 10, 2020
Notification of accepted papers: September 11, 2020
Camera-ready papers due: September 21, 2020
Workshop date: To be announced (end October/early November)
== Overview ==
Wikidata is an openly available knowledge base, hosted by the Wikimedia
Foundation. It can be accessed and edited by both humans and machines and
acts as a common structured-data repository for several Wikimedia projects,
including Wikipedia, Wiktionary, and Wikisource. It is used in a variety of
applications by researchers and practitioners alike.
In recent years, we have seen an increase in the number of publications
around Wikidata. While there are several dedicated venues for the broader
Wikidata community to meet, none of them focuses on publishing original,
peer-reviewed research. This workshop fills this gap - we hope to provide a
forum to build this fledgling scientific community and promote novel work
and resources that support it.
The workshop seeks original contributions that address the opportunities
and challenges of creating, contributing to, and using a global,
collaborative, open-domain, multilingual knowledge graph such as Wikidata.
We encourage a range of submissions, including novel research, opinion
pieces, and descriptions of systems and resources, which are naturally
linked to Wikidata and its ecosystem, or enabled by it. What we’re less
interested in are works which use Wikidata alongside or in lieu of other
resources to carry out some computational task - unless the work feeds back
into the Wikidata ecosystem, for instance by improving or commenting on
some Wikidata aspect, or suggesting new design features, tools and
practices.
We welcome interdisciplinary work, as well as interesting applications
which shed light on the benefits of Wikidata and discuss areas of
improvement.
The workshop is planned as an interactive half-day event, in which most of
the time will be dedicated to discussions and exchange rather than frontal
presentations. For this reason, all accepted papers will be presented in
short talks and accompanied by a poster. We are considering online options
in response to ongoing challenges such as travel restrictions and the
recent Covid-19 pandemic.
== Topics ==
Topics of submissions include, but are not limited to:
- Data quality and vandalism detection in Wikidata
- Referencing in Wikidata
- Anomaly, bias, or novelty detection in Wikidata
- Algorithms for aligning Wikidata with other knowledge graphs
- The Semantic Web and Wikidata
- Community interaction in Wikidata
- Multilingual aspects in Wikidata
- Machine learning approaches to improve data quality in Wikidata
- Tools, bots and datasets for improving or evaluating Wikidata
- Participation, diversity and inclusivity aspects in the Wikidata ecosystem
- Human-bot interaction
- Managing knowledge evolution in Wikidata
== Submission guidelines ==
We welcome the following types of contributions.
- Full research paper: Novel research contributions (7-12 pages)
- Short research paper: Novel research contributions of smaller scope than
full papers (3-6 pages)
- Position paper: Well-argued ideas and opinion pieces, not yet in the
scope of a research contribution (6-8 pages)
- Resource paper: New dataset or other resource directly relevant to
Wikidata, including the publication of that resource (8-12 pages)
- Demo paper: New system critically enabled by Wikidata (6-8 pages)
Submissions must be as PDF or HTML, formatted in the style of the Springer
Publications format for Lecture Notes in Computer Science (LNCS). For
details on the LNCS style, see Springer’s Author Instructions.
The papers will be peer-reviewed by at least two researchers. Accepted
papers will be published as open access papers on CEUR (we only publish to
CEUR if the authors agree to have their papers published).
Papers have to be submitted through easychair:
https://easychair.org/conferences/?conf=wikidataworkshop2020
== Proceedings ==
The complete set of papers will be published with the CEUR Workshop
Proceedings (CEUR-WS.org).
== Organizing committee ==
- Lucie-Aimée Kaffee, University of Southampton
- Oana Tifrea-Marciuska, Bloomberg
- Elena Simperl, King’s College London
- Denny Vrandečić, Google AI
== Programme committee ==
- Lydia Pintscher, Wikidata, Wikimedia Deutschland
- Maria-Esther Vidal, TIB Hannover
- Miriam Redi, Wikimedia Foundation
- Edgar Meij, Bloomberg
- Simon Razniewski, Max Planck Institute for Informatics
- Alessandro Piscopo, BBC
- Pavlos Vougiouklis, Huawei Technologies, Edinburgh
- Marco Ponza, University of Pisa
- Markus Krötzsch, Technische Universität Dresden
- Andrew D. Gordon, Microsoft Research & University of Edinburgh
- Cristina Sarasua, University of Zurich
- Aidan Hogan, Universidad de Chile
- Claudia Müller-Birn, FU Berlin
- Finn Årup Nielsen, Technical University of Denmark
--
Lucie-Aimée Kaffee
The WMF Research team has published a new pageview report of inbound
traffic coming from Facebook, Twitter, YouTube, and Reddit.[1]
The report contains a list of all articles that received at least 500 views
from one or more of these platforms (i.e. someone clicked a link on Twitter
that sent them directly to a Wikipedia article). The report is available
on-wiki and will be updated daily at around 14:00 UTC with traffic counts
from the previous calendar day.
We believe this report provides editors with a valuable new information
source. Daily inbound social media traffic stats can help editors monitor
edits to articles that are going viral on social media sites and/or are
being linked to by the social media platform itself in order to fact-check
disinformation and other controversial content[2][3].
The social media traffic report also contains additional public article
metadata that may be useful in the context of monitoring articles that are
receiving unexpected attention from social media sites, such as...
- the total number of pageviews (from all sources) that article received
in the same period of time
- the number of pageviews the article received from the same platform
(e.g. Facebook) the previous day (two days ago)
- the number of editors who have the page on their watchlist
- the number of editors who have watchlisted the page AND recently
visited it
We want your feedback! We have some ideas of our own for how to improve the
report, but we want to hear yours! If you have feature suggestions, please
add them here.[4] We intend to maintain this daily report for at least the
next two months. If we receive feedback that the report is useful, we are
considering making it available indefinitely.
If you have other questions about the report, please first check out our
(still growing) FAQ [5]. All questions, comments, concerns, ideas, etc. are
welcome on the project talkpage on Meta.[4]
1. https://en.wikipedia.org/wiki/User:HostBot/Social_media_traffic_report
2.
https://www.engadget.com/2018/03/15/wikipedia-unaware-would-be-youtube-fact…
3.
https://mashable.com/2017/10/05/facebook-wikipedia-context-articles-news-fe…
4.
https://meta.wikimedia.org/wiki/Research_talk:Social_media_traffic_report_p…
5.
https://meta.wikimedia.org/wiki/Research:Social_media_traffic_report_pilot/…
Cheers,
Jonathan
--
Jonathan T. Morgan
Senior Design Researcher
Wikimedia Foundation
User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
(Uses He/Him)
*Please note that I do not expect a response from you on evenings or
weekends*
Hi all,
The next Research Showcase will be live-streamed on Wednesday, May 20, at
9:30 AM PDT/16:30 UTC.
This month we will learn about recent research on machine learning systems
that rely on human supervision for their learning and optimization -- a
research area commonly referred to as Human-in-the-Loop ML. In the first
talk, Jie Yang will present a computational framework that relies on
crowdsourcing to identify influencers in Social Networks (Twitter) by
selectively obtaining labeled data. In the second talk, Estelle Smith will
discuss the role of the community in maintaining ORES, the machine learning
system that predicts the quality in Wikipedia applications.
YouTube stream: https://www.youtube.com/watch?v=8nDiu2ebdOI
As usual, you can join the conversation on IRC at #wikimedia-research. You
can also watch our past research showcases here:
https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase
This month's presentations:
*OpenCrowd: A Human-AI Collaborative Approach for Finding Social
Influencers via Open-Ended Answers Aggregation*
By: Jie Yang, Amazon (current), Delft University of Technology (starting
soon)
Finding social influencers is a fundamental task in many online
applications ranging from brand marketing to opinion mining. Existing
methods heavily rely on the availability of expert labels, whose collection
is usually a laborious process even for domain experts. Using open-ended
questions, crowdsourcing provides a cost-effective way to find a large
number of social influencers in a short time. Individual crowd workers,
however, only possess fragmented knowledge that is often of low quality. To
tackle those issues, we present OpenCrowd, a unified Bayesian framework
that seamlessly incorporates machine learning and crowdsourcing for
effectively finding social influencers. To infer a set of influencers,
OpenCrowd bootstraps the learning process using a small number of expert
labels and then jointly learns a feature-based answer quality model and the
reliability of the workers. Model parameters and worker reliability are
updated iteratively, allowing their learning processes to benefit from each
other until an agreement on the quality of the answers is reached. We
derive a principled optimization algorithm based on variational inference
with efficient updating rules for learning OpenCrowd parameters.
Experimental results on finding social influencers in different domains
show that our approach substantially improves the state of the art by 11.5%
AUC. Moreover, we empirically show that our approach is particularly useful
in finding micro-influencers, who are very directly engaged with smaller
audiences.
Paper: https://dl.acm.org/doi/fullHtml/10.1145/3366423.3380254
*Keeping Community in the Machine-Learning Loop*
By: C. Estelle Smith, MS, PhD Candidate, GroupLens Research Lab at the
University of Minnesota
On Wikipedia, sophisticated algorithmic tools are used to assess the
quality of edits and take corrective actions. However, algorithms can fail
to solve the problems they were designed for if they conflict with the
values of communities who use them. In this study, we take a
Value-Sensitive Algorithm Design approach to understanding a
community-created and -maintained machine learning-based algorithm called
the Objective Revision Evaluation System (ORES)—a quality prediction system
used in numerous Wikipedia applications and contexts. Five major values
converged across stakeholder groups that ORES (and its dependent
applications) should: (1) reduce the effort of community maintenance, (2)
maintain human judgement as the final authority, (3) support differing
peoples’ differing workflows, (4) encourage positive engagement with
diverse editor groups, and (5) establish trustworthiness of people and
algorithms within the community. We reveal tensions between these values
and discuss implications for future research to improve algorithms like
ORES.
Paper:
https://commons.wikimedia.org/wiki/File:Keeping_Community_in_the_Loop-_Unde…
--
Janna Layton (she, her)
Administrative Assistant - Product & Technology
Wikimedia Foundation <https://wikimediafoundation.org/>