As you maybe aware, Over the last 3 weeks, I've been looking into the
accuracy of active user statistics on English Wikipedia.
I haven't had a chance to upload the final results to
https://en.wikipedia.org/wiki/User:RhinosF1/activeuser but I have completed
the gathering of statistics and have attached a .pdf of the results to this
I've found it interesting how there is a sudden drop in the number of
active users although I half expected this and intended to find it although
I want to look deeper.
I'd like too see whether this is down to blocks or just not continuing and
asses whether time requirements or edit requirements have bigger impact.
I look forward to any feedback and help in the research.
The plan for the next stages are as follows:
1. About 10-14 days for people getting this email to respond.
2. Run the new list of queries for about 2-3 week to gather some data to
3. Show the data to enwiki users and ask for feedback / help collecting
4. Present results in 2-3 months time.
5. Gather wide feedback on results
6. Maybe take action to improve it if we can see what action needs doing
As you will see most of the data is from around 9pm UTC so in future stages
I would appreciate data collection from a larger range of times.
Thanks in advance,
I wonder what are those mechanisms/events (in Wikipedia or WikiProjects) which may attract editors to improve article quality.
One example is Today's articles for improvement. Within WikipProjects, GA/FA nominations seem useful too.
SEMANTiCS 2019 extends the deadlines of the Research & Innovation Track
and the LegalTech/ Digital Humanities and Cultural Heritage Track as
* Extended: Abstract Submission Deadline: May 6, 2019 (11:59
pm, Hawaii time)
* Extended: Paper Submission Deadline: May 13, 2019 (11:59 pm,
* Notification of Acceptance: June 17, 2019 (11:59 pm, Hawaii time)
* Camera-Ready Paper: July 29, 2019 (11:59 pm, Hawaii time)
For details please go to: https://2019.semantics.cc/calls
With kind regards,
The Semantics Organization Team
We’re preparing for the April 2019 research newsletter and looking for contributors. Please take a look at https://etherpad.wikimedia.org/p/WRN201904 and add your name next to any paper you are interested in covering. Our target publication date is on April 30 UTC. If you can't make this deadline but would like to cover a particular paper in the subsequent issue, leave a note next to the paper's entry in the etherpad. As usual, short notes and one-paragraph reviews are most welcome.
Highlights from this month:
- A season for all things: Phenological imprints in Wikipedia usage and their relevance to conservation
- Assessing the quality of information on wikipedia: A deep-learning approach
- Crosslingual Document Embedding As Reduced-Rank Ridge Regression
- Detecting and Gauging Impact on Wikipedia Page Views
- Eliciting New Wikipedia Users' Interests via Automatically Mined Questionnaires: For a Warm Welcome, Not a Cold Start
- Female scholars need to achieve more for equal public recognition
- Finding Prerequisite Relations using the Wikipedia Clickstream
- Interactive Quality Analytics of User-generated Content: An Integrated Toolkit for the Case of Wikipedia
- Participation of New Editors After Times of Shock on Wikipedia
- Searching News Articles Using an Event Knowledge Graph Leveraged by Wikidata
- Tor Users Contributing to Wikipedia: Just Like Everybody Else?
- Uncertainty During New Disease Outbreaks in Wikipedia
- WikiLinkGraphs: A complete, longitudinal and multi-language dataset of the Wikipedia link networks
- Wikipedia Cultural Diversity Dataset: A Complete Cartography for 300 Language Editions
Mohammed Abdulai and Tilman Bayer
The next Research Showcase, “Group Membership and Contributions to Public
Information Goods: The Case of WikiProject” and “Thanks for Stopping By: A
Study of ‘Thanks’ Usage on Wikimedia,” will be live-streamed next
Wednesday, April 17, 2019, at 11:30 AM PDT/19:30 UTC.
YouTube stream: https://www.youtube.com/watch?v=zmb5LoJzOoE
As usual, you can join the conversation on IRC at #wikimedia-research. You
can also watch our past research showcases here:
This month's presentations:
Group Membership and Contributions to Public Information Goods: The Case of
By Ark Fangzhou Zhang
We investigate the effects of group identity on contribution behavior on
the English Wikipedia, the largest online encyclopedia that gives free
access to the public. Using an instrumental variable approach that exploits
the variations in one’s exposure to WikiProject, we find that joining a
WikiProject has a significant impact on one’s level of contribution, with
an average increase of 79 revisions or 8,672 character per month. To
uncover the potential mechanism underlying the treatment effect, we use the
size of home page for WikiProject as a proxy for the number of
recommendations from a project. The results show that the users who join a
WikiProject with more recommendations significantly increase their
contribution to articles under the joined project, but not to articles
under other projects.
Thanks for Stopping By: A Study of ‘Thanks’ Usage on Wikimedia
By Swati Goel
The Thanks feature on Wikipedia, also known as "Thanks," is a tool with
which editors can quickly and easily send one other positive feedback. The
aim of this project is to better understand this feature: its scope, the
characteristics of a typical "Thanks" interaction, and the effects of
receiving a thank on individual editors. We study the motivational impacts
of "Thanks" because maintaining editor engagement is a central problem for
crowdsourced repositories of knowledge such as Wikimedia. Our main findings
are that most editors have not been exposed to the Thanks feature (meaning
they have never given nor received a thank), thanks are typically sent
upwards (from less experienced to more experienced editors), and receiving
a thank is correlated with having high levels of editor engagement. Though
the prevalence of "Thanks" usage varies by editor experience, the impact of
receiving a thank seems mostly consistent for all users. We empirically
demonstrate that receiving a thank has a strong positive effect on
short-term editor activity across the board and provide preliminary
evidence that thanks could compound to have long-term effects as well.
Janna Layton (she, her)
Administrative Assistant - Audiences & Technology
Wikimedia Foundation <https://wikimediafoundation.org/>
We are happy to announce that the 13th DBpedia Community Meeting will be
held in Leipzig, Germany on 23rd of May, 2019. DBpedia will be part of
the Language, Data and Knowledge conference (LDK). This new biennial
conference series aims at bringing together researchers from across
disciplines. Further information on the conference program and its
associated events can be found here: http://2019.ldk-conf.org/.
* Highlights *
- Keynote #1: Making Linked Data Fun with DBpedia by Peter Haase, metaphacts
- Keynote #2: From Wikipedia to Thousands of Wikis – The DBkWik
Knowledge Graph by Heiko Paulheim, University of Mannheim
- DBpedia Association hour
- DBpedia Showcase Session
* Quick Facts *
- Web URL: http://wiki.dbpedia.org/meetings/Leipzig2019
- When: May 23rd, 2019
- Where: Mediencampus Villa Ida, Poetenweg 28, 04155 Leipzig
- Call for Contribution: Submit your proposal in our form
- Registration: You need to buy a ticket via
* Side Event *
The Thinktank and Hackathon “Artificial Intelligence for Smart
Agriculture” is part of the DBpedia meeting. The activity is supported
by the projects DataBio, Bridge2Era as well as CIAOTECH/PNO. The goal of
the thinktank & hackathon is to build new ideas and small tools, which
are able to demonstrate the use auf AI within the agricultural domain
and a sustainable bioeconomy. Especially, the use and impact of linked
data for AI components will be one part of the event. Further
discussions about future collaborations and ideas of the future of
AI-based Smart Agriculture will be held. Please submit your ideas and
We are looking forward to meeting you in Leipzig!
Your DBpedia Association
A quick message to point out a recent model for crosslingual document
embedding that we developed. It is called Cr5 (for "Crosslingual Document
Embedding as Reduced-Rank Ridge Regression") and essentially lets you take
any text document in any language and represent it as a vector in a
language-independent way, such that documents can be compared across
languages. For instance, the Finnish Wikipedia article Olut
<https://fi.wikipedia.org/wiki/Olut> will result in a similar vector
representation as the English article Beer
<https://en.wikipedia.org/wiki/Beer>, since they are about the same
concept, and despite the fact that they have nearly no surface-level
similarities in terms of vocabulary etc.
We are publishing a pre-trained model with a small API that is very easy to
The model currently supports 28 languages , but it can readily be
trained for different sets of languages (code provided on GitHub, see
The provided model was trained on Wikipedia (surprise...) and essentially
sources information on how words in different languages correspond to one
another from the crosslingual article alignments provided by Wikidata .
While the resulting model can be applied to any text (not just Wikipedia
articles), it works particularly well on Wikipedia  -- which is the
reason I'm writing this email: I really hope that the community will start
using Cr5 to make better sense of Wikipedia across languages. Ideas abound:
crosslingual section alignment, crosslingual plagiarism detection,
comparison of topical foci across languages, crosslingual keyword search,
If there are any questions or comments, do drop us a line!
 bg, ca, cs, da, de, el, en, es, et, fi, fr, hr, hu, id, it, mk, nl, no,
pl, pt, ro, ru, sk, sl, sv, tr, uk, vi
 For more details on the method, please see the paper:
 In fact, it achieves state-of-the-art performance in the context of
Wikipedia, outperforming the previously best method, published by Facebook,
by a wide margin.