Dear users of stat100{4,6,7},
we are planning on upgrading stat1004 to Debian Buster this Thursday
(2020-09-17) after 12:00 CEST (10:00 UTC). We will reinstall the machine,
preserving user data (home directories, /srv), but to be on the safe side,
we will backup that data. After the reinstall and a few tests, we will send
an all-clear to this list.
A few things of note:
- It would be greatly appreciated if you cleaned out unneeded data before
the
backup time mentioned above, thus speeding up backup (and restore if we
need
it).
- Any changes made to the file system contents after the time mentioned
above
may be lost.
- Around the time of the backup, both cron and systemd timers will be
disabled, and still-running process may be ungracefully terminated.
If this process works well, the remaining stat100x machines in need of
update
(6, 7) will be processed in a similar manner.
As always, if there are questions, do not hesitate to contact us.
Best,
Tobias
--
Tobias Klausmann, SRE, Wikimedia Foundation
Hi all,
The next Research Showcase will be live-streamed on Wednesday, September
23, at 9:30 AM PDT/16:30 UTC, and will be on the theme of knowledge gaps.
Miriam Redi will give an overview on the first draft of the taxonomy of
knowledge gaps in Wikimedia projects. The taxonomy is a first milestone
towards developing a framework to understand and measure knowledge gaps
with the goal of capturing the multi-dimensional aspect of knowledge gaps
and inform long-term decision making.
YouTube stream: https://www.youtube.com/watch?v=GJDsKPsz64o
As usual, you can join the conversation on IRC at #wikimedia-research. You
can also watch our past research showcases here:
https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase
This month's presentation:
A first draft of the knowledge gaps taxonomy for Wikimedia projects
By the Wikimedia Foundation Research Team <https://research.wikimedia.org/>
In response to Wikimedia Movement’s 2030 strategic direction
<https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2018-20>,
the Research
team <https://research.wikimedia.org/team.html> at the Wikimedia Foundation
is developing a framework to understand and measure knowledge gaps. The
goal is to capture the multi-dimensional aspect of knowledge gaps and
inform long-term decision making. The first milestone was to develop a
taxonomy of knowledge gaps which offers a grouping and descriptions of the
different Wikimedia knowledge gaps. The first draft of the taxonomy is now
published <https://arxiv.org/abs/2008.12314> and we seek your feedback to
improve it. In this talk, we will give an overview over the first draft of
the taxonomy of knowledge gaps in Wikimedia projects. Following that, we
will host an extended Q&A in which we would like to get your feedback and
discuss with you the taxonomy and knowledge gaps more generally.
- More information:
https://meta.wikimedia.org/wiki/Research:Knowledge_Gaps_Index/Taxonomy
--
Janna Layton (she/her)
Administrative Associate - Product & Technology
Wikimedia Foundation <https://wikimediafoundation.org/>
As mentioned at T251464 <https://phabricator.wikimedia.org/T251464>,
EventLogging <https://www.mediawiki.org/wiki/Extension:EventLogging> is
currently blocked by EasyPrivacy <https://easylist.to/> (a popular add-on
for ad blocking software) due to EventLogging sending its data to a URL
that includes the blacklisted string "beacon/event". In some cases, this
makes it difficult or impossible for us to get the analytics data we need
to make product decisions, e.g. T240697
<https://phabricator.wikimedia.org/T240697>. Two questions:
1. Is it reasonable to say that ad blockers should not be blocking
EventLogging (since it's just an internal logging system)?
2. If the answer to #1 is "yes", could we change the URL that
EventLogging uses so that it is no longer blacklisted by ad blockers?
--
*Ryan Kaldari* (they/them)
Director of Engineering, Product Department
Wikimedia Foundation <https://wikimediafoundation.org/>
Hi everybody,
In the course of maintenance, I'll reboot stat1008 within the next 1-2
hours. The reboot is necessary to properly update the alternative kernel
module
for GPU access[1], so GPU functionality might not be available immediately
after reboot. I will send a follow up mail once things are back to normal.
Best,
Tobias
[1] https://phabricator.wikimedia.org/T260442
--
Tobias Klausmann, SRE, Wikimedia Foundation
Hello,
I have a question about the "User edits" metric presented on Wikistats,
and would be very grateful for advice regarding an issue we encountered.
We are currently computing some edit metrics for multiple Wikipedia
language versions. However, we realized there is some discrepancy
between our edit count results and the ones reported on Wikistats. It
seems that total edit counts are higher for our data, while trends for
daily edits are also different. As an example, the French Wikipedia:
Wikistats:
https://stats.wikimedia.org/#/fr.wikipedia.org/contributing/user-edits/norm…
Our results (see attachment):
We removed all users marked as bots in the database, and excluded edits
to talk pages, as it is done with the Wikistats edit count metric. I
just now found this note [1]: "The original Wikistats did not count
edits if the page they were made on was deleted. We are doing the same
thing in Wikistats 2 for now, which means you may see metric totals
shifting over time (as pages are deleted)."
Could this be what is causing this rift, or are there other processing
details which we have to consider to reproduce the Wikistats numbers as
closely as possible? On a separate note - are the daily edit counts for
all pages (including deleted articles) accessible somewhere?
thanks, thorsten
[1] https://meta.wikimedia.org/wiki/Research:Wikistats_metrics/Edits
--
Thorsten Ruprechter
Institute of Interactive Systems and Data Science (ISDS)
Graz University of Technology, Austria
Hi everybody,
In the course of maintenance, I'll reboot stat1005 in ~5m. The reboot
is needed to clear some stuck state on the GPU, as well as testing an
alternative kernel module for GPU access[1], so GPU functionality might not
be available immediately after reboot. I will send a follow up mail once
things are back to what we would call normal.
Best,
Tobias
[1] https://phabricator.wikimedia.org/T260442
--
Tobias Klausmann, SRE, Wikimedia Foundation
Hi everybody,
We have just created
https://lists.wikimedia.org/mailman/listinfo/analytics-announce as a way to
reach all people interested in software/hardware/maintenance changes in the
Analytics infrastructure. I'd ask you to subscribe and share with others if
possible :)
Examples of future emails:
- Superset upgraded to 0.37 (I know I know the new version is out and we
are still on 0.36, but we'll test the new one soon :)
- Reboot of stat1005 for day X-X-X
- Upgrade of the Hadoop cluster on day X
- etc..
We are tracking the changes in https://phabricator.wikimedia.org/T260849,
if you have any doubts/concerns/suggestions please chime in the task!
Luca (on behalf of the Analytics team)
Hi all,
Join the Research Team at the Wikimedia Foundation [1] for their monthly
Office hours on 2020-09-01 at 16.00-17.00 (UTC).
Through these office hours, we aim to make ourselves more available to
answer some of the research related questions that you as Wikimedia
volunteer editors, organizers, affiliates, staff, and researchers face in
your projects and initiatives (*).
To participate, join the video-call via this Wikimedia-meet link [2]. There
is no set agenda - feel free to add your item to the list of topics in the
etherpad [3] (You can do this after you join the meeting, too.), otherwise
you are welcome to also just hang out. More detailed information (e.g.
about how to attend) can be found here [4].
Started in the beginning of 2020 as an experiment [5], after the first 6
editions we have evaluated the scope and format of the Research office
hours. In order to decrease barriers of accessibility and to facilitate
more direct interaction, we have switched the format from IRC to video
call. We will re-evaluate the current format at the end of the year. We
would also be glad to hear your feedback and/or comments.
(*) Some example cases we hope to be able to support you in:
-
You have a specific research related question that you suspect you
should be able to answer with the publicly available data and you don’t
know how to find an answer for it, or you just need some more help with it.
For example, how can I compute the ratio of anonymous to registered editors
in my wiki?
-
You run into repetitive or very manual work as part of your Wikimedia
contributions and you wish to find out if there are ways to use machines to
improve your workflows. These types of conversations can sometimes be
harder to find an answer for during an office hour, however, discussing
them can help us understand your challenges better and we may find ways to
work with each other to support you in addressing it in the future.
-
You want to learn what the Research team at the Wikimedia Foundation
does and how we can potentially support you. Specifically for affiliates:
if you are interested in building relationships with the academic
institutions in your country, we would love to talk with you and learn
more. We have a series of programs that aim to expand the network of
Wikimedia researchers globally and we would love to collaborate with those
of you interested more closely in this space.
-
You want to talk with us about one of our existing programs [6].
Hope to see many of you,
Martin (WMF Research Team)
[1] https://research.wikimedia.org/team.html
[2] https://meet.wmcloud.org/ResearchOfficeHours
[3] https://etherpad.wikimedia.org/p/Research-Analytics-Office-hours
[4] https://www.mediawiki.org/wiki/Wikimedia_Research/Office_hours
[5]
https://lists.wikimedia.org/pipermail/wiki-research-l/2019-December/007039.…
[6] https://research.wikimedia.org/projects.html
--
Martin Gerlach
Research Scientist
Wikimedia Foundation