Hi all,
tl;dr: we'd like to remove the rev_is_revert field from the
mediawiki.revision-create stream to solve a missing event problem.
For years now, we've known that the mediawiki.revision-create stream
<https://stream.wikimedia.org/?doc#/streams/get_v2_stream_mediawiki_revision…>
has
been missing many real revision create events
<https://phabricator.wikimedia.org/T215001> when compared with
MediaWiki's MySQL databases. This makes the stream almost useless for
those who want to use it as a notification mechanism about all MediaWiki
page changes.
The reason for the large number of missing events is because the code that
emits the event is subscribing to the wrong MediaWiki hook. This patch
<https://gerrit.wikimedia.org/r/c/mediawiki/extensions/EventBus/+/679353/> will
fix this, however the correct hook does not give us the information we need
to set the rev_is_revert and rev_revert_details fields. This field is
relatively new (only added last August 2020
<https://github.com/wikimedia/schemas-event-primary/commit/53b6480cb1045316c…>).
We think that including the missing revisions is more important than
capturing the revert information, which really only captures whether or not
a user used the MediaWiki UI to issue a revert.
We plan on moving forward with this, but would like feedback before we do.
If you have objections, or other ideas on how we can provide this data
(like maybe including it in mediawiki/revision-tags-change
<https://schema.wikimedia.org/repositories//primary/jsonschema/mediawiki/rev…>
and
making that public?), let us know by replying to this email or in this
ticket: https://phabricator.wikimedia.org/T215001
Thanks!
-Andrew Otto
SRE, Data Engineering, WMF
Hi all,
Join the Research Team at the Wikimedia Foundation [1] for their monthly
Office hours on 2021-05-04 at 16:00-17:00 UTC (9am PT/6pm CET).
To participate, join the video-call via this link [2]. There is no set
agenda - feel free to add your item to the list of topics in the etherpad
[3] (You can do this after you join the meeting, too.), otherwise you are
welcome to also just hang out. More detailed information (e.g. about how to
attend) can be found here [4].
Through these office hours, we aim to make ourselves more available to
answer some of the research related questions that you as Wikimedia
volunteer editors, organizers, affiliates, staff, and researchers face in
your projects and initiatives. Some example cases we hope to be able to
support you in:
-
You have a specific research related question that you suspect you
should be able to answer with the publicly available data and you don’t
know how to find an answer for it, or you just need some more help with it.
For example, how can I compute the ratio of anonymous to registered editors
in my wiki?
-
You run into repetitive or very manual work as part of your Wikimedia
contributions and you wish to find out if there are ways to use machines to
improve your workflows. These types of conversations can sometimes be
harder to find an answer for during an office hour, however, discussing
them can help us understand your challenges better and we may find ways to
work with each other to support you in addressing it in the future.
-
You want to learn what the Research team at the Wikimedia Foundation
does and how we can potentially support you. Specifically for affiliates:
if you are interested in building relationships with the academic
institutions in your country, we would love to talk with you and learn
more. We have a series of programs that aim to expand the network of
Wikimedia researchers globally and we would love to collaborate with those
of you interested more closely in this space.
-
You want to talk with us about one of our existing programs [5].
Hope to see many of you,
Martin on behalf of the WMF Research Team
[1] https://research.wikimedia.org/team.html
[2] https://meet.jit.si/WMF-Research-Office-Hours
[3] https://etherpad.wikimedia.org/p/Research-Analytics-Office-hours
[4] https://www.mediawiki.org/wiki/Wikimedia_Research/Office_hours
[5] https://research.wikimedia.org/projects.html
--
Martin Gerlach
Research Scientist
Wikimedia Foundation
In her recent announcement of her upcoming departure as the Wikimedia
Foundation's CEO, Katherine highlighted a growth in "reader
engagement" by 30% during her tenure (i.e. since 2016).[1] A WMF
board member since reported in somewhat more detail that this refers
to "~1 billion interactions up 32% in six years".[2]
Are the underlying numbers published somewhere?
Regards, Tilman
[1] https://twitter.com/krmaher/status/1357390962410987520
[2] https://twitter.com/raju/status/1371100758343614471
PS: As some may be aware, a widely read German blogger linked to
Katherine's tweet while singling out the "reader engagement" bit for
some outspoken criticism. Just to clarify, that's not why I'm asking
(in fact I disagree with most of that criticism).
Hi everyone,
We are delighted to announce that Wiki Workshop 2021 will be held
virtually in April 2021 and as part of the Web Conference 2021 [1].
The exact day is to be finalized and we know it will be between April
19-23.
In the past years, Wiki Workshop has traveled to Oxford, Montreal,
Cologne, Perth, Lyon, and San Francisco, and (virtually) to Taipei.
Last year, we had more than 120 participants in the workshop and we
are particularly excited about this year's as we will celebrate the
20th birthday of Wikipedia.
We encourage contributions by all researchers who study the Wikimedia
projects. We specifically encourage 1-2 page submissions of
preliminary research. You will have the option to publish your work as
part of the proceedings of The Web Conference 2021.
You can read more about the call for papers and the workshop at
http://wikiworkshop.org/2021/#call. Please note that the deadline for
the submissions to be considered for proceedings is January 29. All
other submissions should be received by March 1.
If you have questions about the workshop, please let us know on this
list or at wikiworkshop(a)googlegroups.com.
Looking forward to seeing many of you in this year's edition.
Best,
Miriam Redi, Wikimedia Foundation
Bob West, EPFL
Leila Zia, Wikimedia Foundation
[1] https://www2021.thewebconf.org/
TLDR:
- PAWS can now connect to the new replicas, see News/Wiki Replicas 2020
Redesign#How should I connect to databases in PAWS?
<https://wikitech.wikimedia.org/wiki/News/Wiki_Replicas_2020_Redesign#How_sh…>
for
more info.
- Report issues here: T276284 Establish a working setup for PAWS with
multi-instance wikireplicas <https://phabricator.wikimedia.org/T276284>
Hi!
I'm forwarding this message from the cloud lists, in case you use PAWS and
didn't see the message.
PAWS is now capable of connecting and using the new replicas. For
background on the new replicas, please see News/Wiki_Replicas_2020_Redesign
<https://wikitech.wikimedia.org/wiki/News/Wiki_Replicas_2020_Redesign>
Here are some resources you can check:
- News/Wiki Replicas 2020 Redesign#How should I connect to databases in
PAWS?
<https://wikitech.wikimedia.org/wiki/News/Wiki_Replicas_2020_Redesign#How_sh…>
- Accessing the new replicas, changes from the previous cluster
<https://public.paws.wmcloud.org/User:JHernandez_(WMF)/Accessing%20the%20new…>
- Using Wikireplicas from PAWS with Python
<https://public.paws.wmcloud.org/User:JHernandez_(WMF)/Accessing%20Wikirepli…>
In summary, due to issues with mysql-proxy and the new architecture,
connecting to the replicas will be more in line with the Toolforge approach.
There is a credentials file in $HOME/.my.cnf that you can use when
connecting, instead of the environment variables. For the host name, you
can use the same ones you would use when connecting from Toolforge ("
{wiki}.{analytics,web}.db.svc.wikimedia.cloud").
To update a notebook, here is an example of the couple of changes when
connecting:
- import os
import pymysql
conn = pymysql.connect(
- host = os.environ['MYSQL_HOST'],
+ host = "eswiki.analytics.db.svc.wikimedia.cloud",
- user = os.environ['MYSQL_USERNAME'],
- password = os.environ['MYSQL_PASSWORD'],
+ read_default_file = ".my.cnf",
database = "eswiki_p"
)
Note you have to connect to the host name of the DB you are going to query
against.
Existing notebooks remain readable with the output cached, and we are
working on updating the documentation.
In two weeks -April 15- the old cluster will migrate the old cluster to
utilize new replication hosts, at which point replication may stop and
running PAWS notebooks connecting to the old cluster may get stale results.
In ~four weeks -April 28- the old hostnames will be redirected to the new
cluster, and running notebooks connecting to MYSQL_HOST will not work and
will need updating the credentials and DB host name.
If you find any issues or problems or need help, please reach out via IRC
on #wikimedia-cloud, mailing list (cloud(a)lists.wikimedia.org), or in the
phabricator task T276284 Establish a working setup for PAWS with
multi-instance wikireplicas <https://phabricator.wikimedia.org/T276284>
Feel free to forward this as needed to spread the word to PAWS users, thank
you!
--
Joaquin Oltra Hernandez
Developer Advocate - Wikimedia Foundation