We are delighted to announce that Wiki Workshop 2020 will be held in
Taipei on April 20 or 21, 2020 (the date to be finalized soon) and as
part of the Web Conference 2020 . In the past years, Wiki Workshop
has traveled to Oxford, Montreal, Cologne, Perth, Lyon, and San
You can read more about the call for papers and the workshops at
http://wikiworkshop.org/2020/#call. Please note that the deadline for
the submissions to be considered for proceedings is January 17. All
other submissions should be received by February 21.
If you have questions about the workshop, please let us know on this
list or at wikiworkshop(a)googlegroups.com.
Looking forward to seeing you in Taipei.
Miriam Redi, Wikimedia Foundation
Bob West, EPFL
Leila Zia, Wikimedia Foundation
Hello everyone - apologies for cross-posting! *TL;DR*: We would like your
feedback on our Metrics Kit project. Please have a look and comment on
The Wikimedia Foundation's Trust and Safety team, in collaboration with the
Community Health Initiative, is working on a Metrics Kit designed to
measure the relative "health" of various communities that make up the
The ultimate outcome will be a public suite of statistics and data looking
at various aspects of Wikimedia project communities. This could be used by
both community members to make decisions on their community direction and
Wikimedia Foundation staff to point anti-harassment tool development in the
We have a set of metrics we are thinking about including in the kit,
ranging from the ratio of active users to active administrators,
administrator confidence levels, and off-wiki factors such as freedom to
participate. It's ambitious, and our methods of collecting such data will
Right now, we'd like to know:
* Which metrics make sense to collect? Which don't? What are we missing?
* Where would such a tool ideally be hosted? Where would you normally look
for statistics like these?
* We are aware of the overlap in scope between this and Wikistats <
https://stats.wikimedia.org/v2/#/all-projects> — how might these tools
Your opinions will help to guide this project going forward. We'll be
reaching out at different stages of this project, so if you're interested
in direct messaging going forward, please feel free to indicate your
interest by signing up on the consultation page.
Looking forward to reading your thoughts.
P.S.: Please feel free to CC me in conversations that might happen on this
 What do we mean by "health"? There is no standard definition of what
makes a Wikimedia community "healthy", but there are many indicators that
highlight where a wiki is doing well, and where it could improve. This
project aims to provide a variety of useful data points that will inform
community decisions that will benefit from objective data.
*Joe Sutherland* (he/him or they/them)
Trust and Safety Specialist
My name is Emily Chen and I'm a Computer Science Ph.D. student at the
University of Southern California. I tried sending this email earlier
before I had joined the mailer, so apologies if this email was sent out
twice! I'm currently conducting research on collective attention decay in
Wikipedia articles that are more heavily cited by other Wikipedia articles
within the Wikipedia ecosystem. This work builds upon the observations made
in Candia et al's paper on "The universal decay of collective memory and
and I have been using the number of page views articles receive as a proxy
>From what I can find, there is a maintained page view data set on
that spans 2011-current, and statistics that Domas Mituzas began collecting
from 2007 - 2016. This data seems to capture the gradual decay in an
individual article's pageviews, but doesn't capture the initial growth of
an article's page views. Would you happen to know if there are article page
view statistics from the earlier years of Wikipedia (2001-2007) or if there
are any general page view statistics from that time frame? Or would you
happen to know who I could contact for such a dataset? It would be really
interesting to study the temporal page view dynamics over Wikipedia's
lifespan alongside my current work in collective attention.
Thank you so much for your time!
Emily Chen (echen920 [at] usc [dot] edu)
Ph.D. Student | Computer Science
Viterbi School of Engineering & Information Sciences Institute
University of Southern California
We, the Research team at Wikimedia Foundation, have received some requests
over the past months for making ourselves more available to answer some of
the research questions that you as Wikimedia volunteers, affiliates' staff,
and researchers face in your projects and initiatives. Starting January
2020, we will experiment with monthly office hours organized jointly by our
team and the Analytics team where you can join us and direct your questions
to us. We will revisit this experiment in June 2020 to assess whether to
continue it or not.
We encourage you to attend the office hour if you have research related
questions. These can be questions about our teams, our projects, or more
importantly questions about your projects or ideas that we can support you
with during the office hours. You can also ask us questions about how to
use a specific dataset available to you, to answer a question you have, or
some other question. Note that the purpose of the office hours is to answer
your questions during the dedicated time of the office hour. Questions that
may require many hours of back-and-forth between our team and you are not
suited for this forum. For these bigger questions, however, we are happy to
brainstorm with you in the office hour and point you to some good
directions to explore further on your own (and maybe come back in the next
office hour and ask more questions).
Time and Location
We meet on the 4th Wednesday of every month 17.00-18.00 (UTC) in
#wikimedia-research IRC channel on freenode .
The first meeting will be on January 22.
Up-to-date information on mediawiki 
If you miss the office hour, you can read the logs of it at .
The future announcements about these office hours will only go to the
following lists so please make sure you're subscribed to them if you like
to receive a ping:
* wiki-research-l mailing list 
* analytics mailing list 
* wikidata mailing list 
* the Research category in Space 
on behalf of Research and Analytics at WMF,
I'm attempting to do some research into how different cultures consume
information. I'm focusing specifically on how this varies by time of day
and time of year. I had the idea of using the Wikipedia projectviews data
as a proxy for overall information, since Wikipedia is usually the first or
second search result for most interesting bits of information from pop
culture to geopolitics to science. Unfortunately, after looking at WiViVi,
it seems like my naive assumption of separating out Wikipedias by language
doesn't actually resolve that cleanly into countries. Since I'm
particularly interested in the effects of seasonality (e.g. different
academic calendars and holidays across countries, different lunchtimes
between northern and southern European countries in the same timezones), I
can't make the assumption that the % of traffic to a project from each
country is constant.
Is there any way I can get an hourly time series of which countries are
viewing which Wikipedias? Even a (country x project) resolution summary of
average views for the 24 hours of the day would be helpful, if that data
That's fascinating, John; thank you. I'm copying this to wiki-research-l and
Fabian Suchanek, who gave the first part of the Research Showcase last month.
What do you like for coding stories? https://quanteda.io/reference/dfm.html ?
Sentiment is hard because errors are often 180 degrees away from correct.
How do you both feel about Soru et al (June 2018) "Neural Machine Translation
for Query Construction and Composition"
On Sat, Jan 11, 2020 at 3:46 PM John Urbanik <johnurbanik(a)gmail.com> wrote:
> I used to work as the chief data scientist at Collin's company.
> I'd suggest looking at things like relationships between the views / edits for sets of pages as well as aggregating large sets of page views for different pages in various ways. There isn't a lot of literature that is directly applicable, and I can't disclose the precise methods being used due to NDA.
> In general, much of the pageview data is weibull or GEV distributed on top of being non-stationary, so I'd suggest looking into papers from extreme value theory literature as well as literature around Hawkes/Queue-Hawkes processes. Most traditional ML and signal processing is not very effective without doing some pretty substantial pre-processing, and even then things are pretty messy, depending on what you're trying to predict; most variables are heteroskedastic w.r.t pageviews and there are a lot of real world events that can cause false positives.
> Further, concept drift is pretty rapid in this space and structural breaks happen quite frequently, so the reliability of a given predictor can change extremely rapidly. Understanding how much training data to use for a given prediction problem is itself a super interesting problem since there may be some horizon after which the predictor loses power, but decreasing the horizon too much means over fitting and loss of statistical significance.
> Good luck!
Are hourly pageviews working for you again?
Where can we read more about how your company uses them for predictions?
I've been working on this problem for a long time, for scaling news
signals, along with Google Trends (which you have to de-normalize with
multiple overlapping queries because it scales results to always be in
[0,100].) Can you post a bibliography of your favorite two or three
resources from the last few and five years, please?
I’ve just seen the replies and thanks to everyone whose replied.
I was looking to try and work out what percent lf the active wikimedia
community are participating on meta and comparing to another wiki farm. Any
thoughts on that?
On Mon, 6 Jan 2020 at 20:31, Aaron Halfaker <aaron.halfaker(a)gmail.com>
> It doesn't look like Active Editors works for all wikis. I think you'd
> have to merge activity across all wikis to get a stat like that. I'm not
> sure I know of a good data strategy to get that.
> If you were to query it with quarry, you'd need to write a query for every
> wiki and then write some code to merge the results. Oof.
> If you to extract it from the XML dumps, you'd need to process each Wiki
> separately and then merge the results. Oof.
> The best solution to this is to have a common table/relation across all
> Wikis and to aggregate from there. I don't think there's any such
> cross-wiki table/relation available.
> On Mon, Jan 6, 2020 at 1:38 PM Jonathan Morgan <jmorgan(a)wikimedia.org>
> > Same dashboard, but for "All wikis":
> > https://stats.wikimedia.org/v2/#/all-projects
> > That work?
> > - J
> > On Mon, Jan 6, 2020 at 11:32 AM RhinosF1 - <rhinosf1(a)gmail.com> wrote:
> > > Hi,
> > >
> > > That provides active users for meta but not globally. Anything for
> > global?
> > >
> > > RhinosF1
> > >
> > > On Mon, 6 Jan 2020 at 18:10, Jonathan Morgan <jmorgan(a)wikimedia.org>
> > > wrote:
> > >
> > > > RhinosF1,
> > > >
> > > > Are you looking for information like this
> > > > <https://stats.wikimedia.org/v2/#/meta.wikimedia.org>, or something
> > > > different?
> > > >
> > > > - J
> > > >
> > > > On Mon, Jan 6, 2020 at 8:51 AM RhinosF1 - <rhinosf1(a)gmail.com>
> > > >
> > > > > Hi,
> > > > >
> > > > > Does anyone know a way to find out how many wikimedia users are
> > active
> > > > > globally compared to active on metawiki?
> > > > >
> > > > > This mean they've made more than 5 edits in the last 30 days for
> > this.
> > > > >
> > > > > Thanks,
> > > > > RhinosF1
> > > > > _______________________________________________
> > > > > Analytics mailing list
> > > > > Analytics(a)lists.wikimedia.org
> > > > > https://lists.wikimedia.org/mailman/listinfo/analytics
> > > > >
> > > >
> > > >
> > > > --
> > > > Jonathan T. Morgan
> > > > Senior Design Researcher
> > > > Wikimedia Foundation
> > > > User:Jmorgan (WMF) <
> > >
> > > > (Uses He/Him)
> > > > _______________________________________________
> > > > Wiki-research-l mailing list
> > > > Wiki-research-l(a)lists.wikimedia.org
> > > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > > >
> > > _______________________________________________
> > > Wiki-research-l mailing list
> > > Wiki-research-l(a)lists.wikimedia.org
> > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > >
> > --
> > Jonathan T. Morgan
> > Senior Design Researcher
> > Wikimedia Foundation
> > User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>
> > (Uses He/Him)
> > _______________________________________________
> > Wiki-research-l mailing list
> > Wiki-research-l(a)lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> Wiki-research-l mailing list
Does anyone know a way to find out how many wikimedia users are active
globally compared to active on metawiki?
This mean they've made more than 5 edits in the last 30 days for this.