Hi all,
join us for our monthly Analytics/Research Office hours on 2020-02-26 at
17.00-18.00 (UTC). Bring all your research questions and ideas to discuss
projects, data, analysis, etc…
To participate, please join the IRC channel: #wikimedia-research [1].
More detailed information can be found here [2] or on the etherpad [3] if
you would like to add items to agenda or check notes from previous meetings.
Best,
Martin
[1] irc://chat.freenode.net:6667/wikimedia-research
[2] https://www.mediawiki.org/wiki/Wikimedia_Research/Office_hours
[3] https://etherpad.wikimedia.org/p/Research-Analytics-Office-hours
--
Martin Gerlach
Research Scientist
Wikimedia Foundation
Good morning,
The scoping review I have been working on since last June, that
investigates Wikipedia as a health information resource, was published in
PLOS ONE yesterday. You may access it at:
https://doi.org/10.1371/journal.pone.0228786
Denise Smith (Mcbrarian)
*Denise Smith*, MLIS
Librarian
Health Sciences Library
1280 Main St. West.
Hamilton, ON L8S 4L8
Hi all,
The next Research Showcase will be live-streamed on Wednesday, February 19,
at 9:30 AM PST/17:30 UTC. We’ll have presentations from Jeffrey V.
Nickerson on human/machine collaboration on Wikipedia, and Lucie-Aimée
Kaffee on human/machine collaboration on Wikidata. A question-and-answer
session will follow.
YouTube stream: https://www.youtube.com/watch?v=fj0z20PuGIk
As usual, you can join the conversation on IRC at #wikimedia-research. You
can also watch our past research showcases here:
https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase
This month's presentations:
Autonomous tools and the design of work
By Jeffrey V. Nickerson, Stevens Institute of Technology
Bots and other software tools that exhibit autonomy can appear in an
organization to be more like employees than commodities. As a result,
humans delegate to machines. Sometimes the machines turn and delegate part
of the work back to humans. This talk will discuss how the design of human
work is changing, drawing on a recent study of editors and bots in
Wikipedia, as well as a study of game and chip designers. The Wikipedia bot
ecosystem, and how bots evolve, will be discussed. Humans are working
together with machines in complex configurations; this puts constraints on
not only the machines but also the humans. Both software and human skills
change as a result. Paper
<https://dl.acm.org/doi/pdf/10.1145/3359317?download=true>
When Humans and Machines Collaborate: Cross-lingual Label Editing in
Wikidata
By Lucie-Aimée Kaffee, University of Southampton
The quality and maintainability of any knowledge graph are strongly
influenced in the way it is created. In the case of Wikidata, the knowledge
graph is created and maintained by a hybrid approach of human editing
supported by automated tools. We analyse the editing of natural language
data, i.e. labels. Labels are the entry point for humans to understand the
information, and therefore need to be carefully maintained. Wikidata is a
good example for a hybrid multilingual knowledge graph as it has a large
and active community of humans and bots working together covering over 300
languages. In this work, we analyse the different editor groups and how
they interact with the different language data to understand the provenance
of the current label data. This presentation is based on the paper “When
Humans and Machines Collaborate: Cross-lingual Label Editing in Wikidata”,
published in OpenSym 2019 in collaboration with Kemele M. Endris and Elena
Simperl. Paper
<https://opensym.org/wp-content/uploads/2019/08/os19-paper-A16-kaffee.pdf>
--
Janna Layton (she, her)
Administrative Assistant - Product & Technology
Wikimedia Foundation <https://wikimediafoundation.org/>
[You can safely skip this message if you have already seen it in the
Wikidata mailing list, and pardon for the spam]
Hi everyone,
---------------------------------------------------------------
TL;DR: soweego 2 is on its way.
Here's the Project Grant proposal:
https://meta.wikimedia.org/wiki/Grants:Project/Hjfocs/soweego_2
---------------------------------------------------------------
Does the name *soweego* ring you a bell?
It's an artificial intelligence that links Wikidata to large catalogs [1].
It's a close friend of Mix'n'match [2], which mainly caters for small
catalogs.
The next big step is to check Wikidata content against third-party
trusted sources.
In a nutshell, we want to enable feedback loops between Wikidatans and
catalog maintainers.
The ultimate goal is to foster mutual benefits in the open knowledge
landscape.
I'd be really grateful if you could have a look at the proposal page [3].
Can't wait for your feedback.
Best,
Marco
[1] https://soweego.readthedocs.io/
[2] https://tools.wmflabs.org/mix-n-match/
[3] https://meta.wikimedia.org/wiki/Grants:Project/Hjfocs/soweego_2
Hi Analytics People,
The Wikimedia Analytics Team is pleased to announce the release of the most
complete dataset we have to date to analyze content and contributors
metadata: Mediawiki History [1] [2].
Data is in TSV format, released monthly around the 3rd of the month
usually, and every new release contains the full history of metadata.
The dataset contains an enhanced [3] and historified [4] version of user,
page and revision metadata and serves as a base to Wiksitats API on edits,
users and pages [5] [6].
We hope you will have as much fun playing with the data as we have building
it, and we're eager to hear from you [7], whether for issues, ideas or
usage of the data.
Analytically yours,
--
Joseph Allemandou (joal) (he / him)
Sr Data Engineer
Wikimedia Foundation
[1] https://dumps.wikimedia.org/other/mediawiki_history/readme.html
[2]
https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_his…
[3] Many pre-computed fields are present in the dataset, from edit-counts
by user and page to reverts and reverted information, as well as time
between events.
[4] As accurate as possible historical usernames and page-titles (as well
as user-groups and blocks) is available in addition to current values, and
are provided in a denormalized way to every event of the dataset.
[5] https://wikitech.wikimedia.org/wiki/Analytics/AQS/Wikistats_2
[6] https://wikimedia.org/api/rest_v1/
[7]
https://phabricator.wikimedia.org/maniphest/task/edit/?title=Mediawiki%20Hi…
2nd Call for Papers
formal papers - informal papers - doctoral programme
13th Conference on Intelligent Computer Mathematics
- CICM 2020 -
July 26-31, 2020
Bertinoro, Italy
http://www.cicm-conference.org/2020
----------------------------------------------------------------------
Digital and computational solutions are becoming the prevalent means
for the generation, communication, processing, storage and curation of
mathematical information.
CICM brings together the many separate communities that have developed
theoretical and practical solutions for mathematical applications such
as computation, deduction, knowledge management, and user interfaces.
It offers a venue for discussing problems and solutions in each of
these areas and their integration.
CICM 2020 Invited Speakers:
Kevin Buzzard (Imperial College, London, UK)
Catherine Dubois (ENSIIE, CNRS, Evry, France)
Christian Szegedy (Google Research, Mountain View, CA, USA)
CICM 2020 Programme committee:
see https://www.cicm-conference.org/2020/cicm.php?event=&menu=pc
CICM 2020 invites submissions in all topics relating to intelligent
computer mathematics, in particular but not limited to
* theorem proving and computer algebra
* mathematical knowledge management
* digital mathematical libraries
CICM appreciates the varying nature of the relevant research in this
area and invites submissions of different forms:
1) Formal submissions will be reviewed rigorously and accepted papers
will be published in a volume of Springer LNCS:
* regular papers (up to 15 pages including references) present
novel research results
* project and survey papers (up to 15 pages + bibliography)
summarize existing results
* system and dataset descriptions (up to 5 pages including
references) present digital artifacts
* system entry (1 page according to the given LaTeX template)
provides metadata and a quick overview of a new tool or a new
release of an existent tool
2) Informal submissions will be reviewed with a positive bias and
selected for presentation based on their relevance for the
community.
* informal papers may present work-in-progress, project
announcements, position statements, etc.
* posters and system demos will be presented in parallel in special
sessions
3) The doctoral programme provides PhD students with a forum to
present early results and receive constructive feedback and
mentoring.
*** Important Dates ***
Formal submissions
- Abstract deadline: March 01
- Full paper deadline: March 08
- Reviews sent to authors: April 17
- Rebuttals due: April 21
- Notification of acceptance: April 24
- Camera-ready copies due: May 03
- Conference: July 26-31
Informal submissions and doctoral programme
Two separate submission rounds are offered so that some authors can
make early travel plans while other authors submit spontaneously.
- First round submission deadline: April 15
- Notification of acceptance: May 1
- Second round submission deadline: June 15
- Notification of acceptance: July 1
All submissions should be made via easychair at
https://easychair.org/conferences/?conf=cicm13
As in previous years, we will publish the CICM 2020 proceedings with
Springer LNCS.
So, this is kinda wild:
http://news.mit.edu/2020/automated-rewrite-wikipedia-articles-0212
"A system created by MIT researchers could be used to automatically update
factual inconsistencies in Wikipedia articles, reducing time and effort
spent by human editors who now do the task manually... In a paper being
presented at the AAAI Conference on Artificial Intelligence, the
researchers describe a text-generating system that pinpoints and replaces
specific information in relevant Wikipedia sentences, while keeping the
language similar to how humans write and edit."
The paper, which I have not read yet, is called "Automatic Fact-guided
Sentence Modification" and the preprint is here:
https://arxiv.org/abs/1909.13838
(Note: I'm the computer science librarian at MIT, but I wasn't aware of
this project before I saw the news story. I may go and talk to them about
it though!)
cheers,
Phoebe
--
* I use this address for lists; send personal messages to phoebe.ayers <at>
gmail.com *
Hi,
My name is Anna Yuan and I am an undergraduate student working under the
supervision of Dr.Haiyi Zhu <https://haiyizhu.com/> in the HCI Department
at Carnegie Mellon University. Our team is currently conducting a research
study on Wikipedia's ORES system. Our research focuses on exploring
opportunities to better communicate the affordance of the ORES system and
thus help people effectively design and use ORES-based applications.
If you have developed or used any ORES-based application, we would love to
invite you to participate in this research. The research will be an
interview that takes approximately 45 minutes. During the research, I will
ask you about your background, your current experience of using ORES and
ORES-based applications, and your suggestion on how to improve the
ecosystem.
All participants will be offered $20 amazon gift cards. If you are
interested in taking part in this research or would like more information,
please reply to this email and let me know.
I am looking forward to your response.
Best,
Anna
Wiki Account: https://en.wikipedia.org/wiki/User:Bobo.03
Hi everyone,
I’m looking for statistics about the edits that are reverted on the English Wikipedia. This is for purposes of explaining to the public what Wikipedia’s quality control processes are like. If hard numbers aren’t available, I’m also interested in educated guesstimates.
1) An often-quoted statistic is that 7% of edits are reverted. Is this still believed to be true?
2) According to https://blog.wikimedia.org/2017/07/19/scoring-platform-team/, 2.5% of edits are vandalism. There are other common reasons for reverting, and I’m wondering if anyone has studied their frequency. Does anyone know what percentage of all edits are reverted for being:
a) Spam (as perceived by the reverter)
b) Copyright violation
c) Violations of the Biographies of Living Persons policy
3) Do statistics on the number of edits per day on the English Wikipedia (i.e. 164,000 edits per day) include edits that are blocked by the spam blacklists or by edit filters?
4) How many edits per day on the English Wikiepdia are prevented (blocked) by the spam blacklists?
5) How many edits per day on the English Wikiepdia are prevented by the edit filters?
6) What percentage of all reverts are made by users of Huggle and Stiki?
7) What proportion of vandalism is quickly reverted? A 2007 study (Priedhorsky et al) found that 42% of vandalistic contributions are repaired within one view and 70% within ten views - have any newer studies been done on this?
Thanks in advance!
Su-Laine
Vancouver, BC