Hi everyone,
We are delighted to announce that Wiki Workshop 2020 will be held in
Taipei on April 20 or 21, 2020 (the date to be finalized soon) and as
part of the Web Conference 2020 [1]. In the past years, Wiki Workshop
has traveled to Oxford, Montreal, Cologne, Perth, Lyon, and San
Francisco.
You can read more about the call for papers and the workshops at
http://wikiworkshop.org/2020/#call. Please note that the deadline for
the submissions to be considered for proceedings is January 17. All
other submissions should be received by February 21.
If you have questions about the workshop, please let us know on this
list or at wikiworkshop(a)googlegroups.com.
Looking forward to seeing you in Taipei.
Best,
Miriam Redi, Wikimedia Foundation
Bob West, EPFL
Leila Zia, Wikimedia Foundation
[1] https://www2020.thewebconf.org/
Hello everyone - apologies for cross-posting! *TL;DR*: We would like your
feedback on our Metrics Kit project. Please have a look and comment on
Meta-Wiki:
https://meta.wikimedia.org/wiki/Community_health_initiative/Metrics_kit
The Wikimedia Foundation's Trust and Safety team, in collaboration with the
Community Health Initiative, is working on a Metrics Kit designed to
measure the relative "health"[1] of various communities that make up the
Wikimedia movement:
https://meta.wikimedia.org/wiki/Community_health_initiative/Metrics_kit
The ultimate outcome will be a public suite of statistics and data looking
at various aspects of Wikimedia project communities. This could be used by
both community members to make decisions on their community direction and
Wikimedia Foundation staff to point anti-harassment tool development in the
right direction.
We have a set of metrics we are thinking about including in the kit,
ranging from the ratio of active users to active administrators,
administrator confidence levels, and off-wiki factors such as freedom to
participate. It's ambitious, and our methods of collecting such data will
vary.
Right now, we'd like to know:
* Which metrics make sense to collect? Which don't? What are we missing?
* Where would such a tool ideally be hosted? Where would you normally look
for statistics like these?
* We are aware of the overlap in scope between this and Wikistats <
https://stats.wikimedia.org/v2/#/all-projects> — how might these tools
coexist?
Your opinions will help to guide this project going forward. We'll be
reaching out at different stages of this project, so if you're interested
in direct messaging going forward, please feel free to indicate your
interest by signing up on the consultation page.
Looking forward to reading your thoughts.
best,
Joe
P.S.: Please feel free to CC me in conversations that might happen on this
list!
[1] What do we mean by "health"? There is no standard definition of what
makes a Wikimedia community "healthy", but there are many indicators that
highlight where a wiki is doing well, and where it could improve. This
project aims to provide a variety of useful data points that will inform
community decisions that will benefit from objective data.
--
*Joe Sutherland* (he/him or they/them)
Trust and Safety Specialist
Wikimedia Foundation
joesutherland.rocks
Hi all,
We, the Research team at Wikimedia Foundation, have received some requests
over the past months for making ourselves more available to answer some of
the research questions that you as Wikimedia volunteers, affiliates' staff,
and researchers face in your projects and initiatives. Starting January
2020, we will experiment with monthly office hours organized jointly by our
team and the Analytics team where you can join us and direct your questions
to us. We will revisit this experiment in June 2020 to assess whether to
continue it or not.
The scope
We encourage you to attend the office hour if you have research related
questions. These can be questions about our teams, our projects, or more
importantly questions about your projects or ideas that we can support you
with during the office hours. You can also ask us questions about how to
use a specific dataset available to you, to answer a question you have, or
some other question. Note that the purpose of the office hours is to answer
your questions during the dedicated time of the office hour. Questions that
may require many hours of back-and-forth between our team and you are not
suited for this forum. For these bigger questions, however, we are happy to
brainstorm with you in the office hour and point you to some good
directions to explore further on your own (and maybe come back in the next
office hour and ask more questions).
Time and Location
We meet on the 4th Wednesday of every month 17.00-18.00 (UTC) in
#wikimedia-research IRC channel on freenode [1].
The first meeting will be on January 22.
Up-to-date information on mediawiki [2]
Archiving
If you miss the office hour, you can read the logs of it at [3].
The future announcements about these office hours will only go to the
following lists so please make sure you're subscribed to them if you like
to receive a ping:
* wiki-research-l mailing list [4]
* analytics mailing list [5]
* wikidata mailing list [6]
* the Research category in Space [7]
on behalf of Research and Analytics at WMF,
Martin
[1] irc://irc.freenode.net/wikimedia-research
[2] https://www.mediawiki.org/wiki/Wikimedia_Research/Office_hours
[3] https://wm-bot.wmflabs.org/logs/%23wikimedia-research/
[4] https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
[5] https://lists.wikimedia.org/mailman/listinfo/analytics
[6] https://lists.wikimedia.org/mailman/listinfo/wikidata
[7] https://discuss-space.wmflabs.org/tags/research
--
Martin Gerlach
Research Scientist
Wikimedia Foundation
Hi everybody,
Just in time for the holidays, we're announcing the addition of Media
Requests to our metrics catalog. Over the last few months we've been
working on a dataset offering request numbers for every single image,
audio, video and document in the Wiki universe, since 2015.
This means we have 3 new metrics available in the Analytics Query Service:
- Media requests per referrer: e.g. how many images, audio, videos...
have been accessed from English Wikipedia in the last month? *73 billion
for November
<http://stats.wikimedia.org/v2/#/en.wikipedia.org/content/total-mediarequest…>.*
- Media requests per file: e.g. how many hits did this cool painting
<https://en.wikipedia.org/wiki/Christmas_tree#/media/File:Yggdrasil.jpg>
get in November? The answer is 483,791 hits
<https://wikimedia.org/api/rest_v1/metrics/mediarequests/per-file/all-refere…>
.
- Top files by media requests: e.g. what was the most popular video
yesterday, December 22nd? Fred Rogers testifying before the Senate
Subcommittee on Communications
<http://stats.wikimedia.org/v2/#/en.wikipedia.org/content/top-mediarequests/…>.
Fun! You can check out the top 1000 media files for any month or day, for
any media type.
Media requests is, in terms of absolute numbers, a huge dataset, so the per
file and top metrics are still being loaded with data all the way to 2015.
We expect this loading to finish in mid January.
You can read more about this in Wikitech
<https://wikitech.wikimedia.org/wiki/Analytics/AQS/Mediarequests>. As usual
if you have any questions about the dataset or the new metrics please send
them our way here on the list or via Phabricator.
Happy holidays!
Francisco + the A team
--
*Francisco Dans (él, he, 彼)*
Software Engineer, Analytics Team
Wikimedia Foundation
Dear all,
It's almost Christmas and the new year is coming around. At the end of each
year we publish a list of the most viewed Hebrew Wikipedia articles in the
past year.
We have a data point that appears to be anomalous: the article caffeine
<https://tools.wmflabs.org/pageviews/?project=he.wikipedia.org&platform=all-…>received
more than 450K views on one day: 26th of September 2019. We can't see any
reason for such a surge and it is completely disproportionate. Even on
English Wikipedia caffeine
<https://tools.wmflabs.org/pageviews/?project=en.wikipedia.org&platform=all-…>hasn't
received so many views on one day - not even on the 8th of February
when Friedlieb Ferdinand Runge who identified caffeine was features on the
daily Google Doodle.
It seems this data point is erroneous. Is there any way to verify that, or
inquire where the error stems from?
Kind regards and seasons greetings,
Dr. Keren Shatzman
Senior Coordinator, Academia & Projects
Wikimedia Israel
Hello everyone,
The first version of the Wikimedia Cloud Services Edits dashboard
(wmcs-edits) is available for use at https://wmcs-edits.wmflabs.org. Using
this dashboard you will able to see three types (tabular, time-series, and
hierarchical) of visualization of the percentage of edits coming to ~870
Wikimedia wikis from Wikimedia Cloud Services
<https://wikitech.wikimedia.org/wiki/Help:Cloud_Services_Introduction>
every month. Here are the top two highlights from November:
- 38.7% edits came to Wikimedia wikis via Wikimedia Cloud Services.
- The top three wikis with a higher percentage of Wikimedia cloud
services edits are Wikidata (83%), Arabic Wikipedia (4.7%), and Wikimedia
Commons (3.5%).
This tool has been set up by the Developer Advocacy team with help from the
Analytics team and their infrastructure for setting up dashboards. We hope
that the WMCS edits dashboard will help us learn a lot about how and where
the Wikimedia Cloud Services infrastructure contributes to Wikimedia
projects, the use of bots and tools by Wikimedia communities, their overall
health, and a lot more.
If you’ve any questions, there is anything unclear, or if you have feedback
to share, please drop a comment on the Phabricator task:
https://phabricator.wikimedia.org/T226663. We will take into account all
your feedback in the next iteration of this tool, for which the work will
begin in January!
Cheers,
Srishti
*Srishti Sethi*
Developer Advocate
Wikimedia Foundation <https://wikimediafoundation.org/>
Hello everyone,
The next Research Showcase will be live-streamed on Wednesday, December 18,
at 9:30 AM PST/17:30 UTC. We’ll have a presentation from Fabian Suchanek on
incomplete knowledge bases and one from Brian Keegan about Wikipedia and
the 2016 US Presidential election.
YouTube stream: https://www.youtube.com/watch?v=b4VrphM_TTA
As usual, you can join the conversation on IRC at #wikimedia-research. You
can also watch our past research showcases here:
https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase
This month's presentations:
Making Knowledge Bases More Complete
By Fabian Suchanek, Télécom Paris, Institut Polytechnique de Paris
A Knowledge Base (KB) is a computer-readable collection of facts about the
world (examples are Wikidata, DBpedia, and YAGO). The problem is that these
KBs are often missing entities or facts. In this talk, I present some new
methods to combat this incompleteness. I will also quickly talk about some
other research projects we are currently pursuing, including a new version
of YAGO. Publications <https://suchanek.name/work/publications/>
The Dynamics of Peer-Produced Political Information During the 2016 U.S.
Presidential Campaign
By Brian Keegan, Ph.D., Assistant Professor, Department of Information
Science, University of Colorado Boulder
Wikipedia plays a crucial role for online information seeking and its
editors have a remarkable capacity to rapidly revise its content in
response to current events. How did the production and consumption of
political information on Wikipedia mirror the dynamics of the 2016 U.S.
Presidential campaign? Drawing on systems justification theory and methods
for measuring the enthusiasm gap among voters, this paper quantitatively
analyzes the candidates' biographical and related articles and their
editors. Information production and consumption patterns match major events
over the course of the campaign, but Trump-related articles show
consistently higher levels of engagement than Clinton-related articles.
Analysis of the editors' participation and backgrounds show analogous
shifts in the composition and durability of the collaborations around each
candidate. The implications for using Wikipedia to monitor political
engagement are discussed. Paper
<http://www.brianckeegan.com/papers/CSCW_2019_Elections.pdf>
--
Janna Layton (she, her)
Administrative Assistant - Product & Technology
Wikimedia Foundation <https://wikimediafoundation.org/>
Hi everybody,
the Analytics team is going to shutdown stat1007 for a few minutes on Thu
Dec 12th at around 15:30 CET to check if there is space for a GPU in the
server's chassis. Please let us know if this will impact your work (so we
can arrange a different maintenance window).
Thanks!
Luca
Hi everybody,
the Analytics team is going to enable Kerberos authentication for Hadoop on
Monday December 2nd. The procedure will start around 10 AM CET and will
hopefully last 3/4 hours, but since this is an invasive change there might
be a possibility that it will last more. If you have anything important
that requires Hadoop on this date please let us know in advance.
The most visible change from the user's point of view is the introduction
of a new account/password to be able to use the Hadoop services (like
Hive/HDFS/Spark/Oozie). We created a user guide about what will change with
kerberos in
https://wikitech.wikimedia.org/wiki/Analytics/Systems/Kerberos/UserGuide.
There is also a task opened to track any doubt/question/special-use-cases
during the next two weeks: https://phabricator.wikimedia.org/T238560.
Feel free to reach out to IRC #wikimedia-analytics on Freenode too!
Thanks!
Luca (on behalf of the Analytics team)
Good news everyone, we enabled Kerberos!
If you use the Hadoop cluster in any way, you'll need to kinit to get a
Kerberos token so you can authenticate yourself or your job. Here's the
guide again:
https://wikitech.wikimedia.org/wiki/Analytics/Systems/Kerberos/UserGuide.
If you use our APIs or datasets, these are coming back online and updating
slowly since we paused them early today. Expect a bit more lag, but that
should clear within a few days as jobs catch up.
Let us know if you need any help on Phabricator, at
https://phabricator.wikimedia.org/T238560. And as always find us on IRC,
#wikimedia-analytics.
Your friendly neighborhood Analytics team