Hi all,
Join the Research Team at the Wikimedia Foundation [1] for their monthly
Office hours this Tuesday, 2022-04-05. Find your local time here
<https://zonestamp.toolforge.org/1649199600>.
To participate, join the video-call via this link [2]. There is no set
agenda - feel free to add your item to the list of topics in the etherpad
[3]. You are welcome to add questions / items to the etherpad in advance,
or when you arrive at the session. Even if you are unable to attend the
session, you can leave a question that we can address asynchronously. If
you do not have a specific agenda item, you are welcome to hang out and
enjoy the conversation. More detailed information (e.g., about how to
attend) can be found here [4].
Through these office hours, we aim to make ourselves available to answer
research related questions that you as Wikimedia volunteer editors,
organizers, affiliates, staff, and researchers face in your projects and
initiatives. Here are some example cases we hope to be able to support you
with:
-
You have a specific research related question that you suspect you
should be able to answer with the publicly available data and you don’t
know how to find an answer for it, or you just need some more help with it.
For example, how can I compute the ratio of anonymous to registered editors
in my wiki?
-
You run into repetitive or very manual work as part of your Wikimedia
contributions and you wish to find out if there are ways to use machines to
improve your workflows. These types of conversations can sometimes be
harder to find an answer for during an office hour. However, discussing
them can help us understand your challenges better and we may find ways to
work with each other to support you in addressing it in the future.
-
You want to learn what the Research team at the Wikimedia Foundation
does and how we can potentially support you. Specifically for affiliates:
if you are interested in building relationships with the academic
institutions in your country, we would love to talk with you and learn
more. We have a series of programs that aim to expand the network of
Wikimedia researchers globally and we would love to collaborate with those
of you interested more closely in this space.
-
You want to talk with us about one of our existing programs [5].
To improve the impact and accessibility of our sessions, we invite you to
share your feedback in a brief optional survey [6]. We estimate that it
will take about 5-10 minutes to complete. We welcome your input even if you
have not attended Office Hours. If you prefer to not respond via Google
form, you can provide your feedback via email. We will accept responses
until April 15, 2022.
Hope to see many of you,
Emily on behalf of the WMF Research Team
[1] https://research.wikimedia.org
[2] https://meet.jit.si/WMF-Research-Office-Hours
[3] https://etherpad.wikimedia.org/p/Research-Analytics-Office-hours
[4] https://www.mediawiki.org/wiki/Wikimedia_Research/Office_hours
[5] https://research.wikimedia.org/projects.html
[6] https://forms.gle/Y5zJ7gunk4RvqvJX8
--
Emily Lescak (she / her)
Senior Research Community Officer
The Wikimedia Foundation
Hello!
tl;dr: all publicly available event streams at stream.wikimedia.org will
have their retention time set to 7 days.
Many of the streams available at stream.wikimedia.org have retention times
of 31 days. This means that at any given time, the past 31 days of these
streams are consumable.
Sometimes, within these streams, certain data may accidentally contain
personally identifiable information. For example, someone might
accidentally enter their personal email into a revision comment field. On
the wikis, this information can be quickly suppressed so that it is not
viewable externally. However, because streams are historical and immutable,
it is difficult to remove this information from the stream history.
To help mitigate the risk of PII exposure, we are reducing the retention of
these streams to 7 days. We plan to make this change on *Monday April 4th
2022*.
In the future, we would like to intentionally remove this data from
streams. Doing so requires us to maintain new services that produce new
streams with PII information redacted. Doing this is not a trivial thing to
stand up, hence this mitigation effort for now.
-Andrew Otto
Wikimedia Foundation
Hi all,
The next Research Showcase will be live-streamed Wednesday, March 16 at
6:30AM PT / 13:30 UTC. Find your local time here:
https://zonestamp.toolforge.org/1647437436.
The theme is: Patterns and dynamics of article quality.
YouTube stream: https://www.youtube.com/watch?v=o5e6S7ac4q4
You can join the conversation on IRC at #wikimedia-research. You can also
watch our past research showcases here:
https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase.
The Showcase will feature the following talks:
Quality monitoring in Wikipedia - A computational perspectiveBy *Animesh
Mukherjee <https://cse.iitkgp.ac.in/~animeshm/> (Indian Institute of
Technology, Kharagpur)*In this talk, I shall summarize our five-year long
research highlights concerning Wikipedia. In particular, I shall deep dive
into two of our recent works; while the first one attempts to understand
the early indications of which editors would soon go "missing" (aka missing
editors) [1], the second one investigates how the quality of a Wikipedia
article transitions over time and whether computational models could be
built to understand the characteristics of future transitions [2]. In each
case, I will present a suite of key results and the main insights that we
obtained thereof.[1] When expertise gone missing: Uncovering the loss of
prolific contributors in Wikipedia
<https://link.springer.com/chapter/10.1007/978-3-030-91669-5_23>, ICADL
2021 (pdf <https://arxiv.org/pdf/2109.09979>)[2] Quality Change: norm or
exception? Measurement, Analysis and Detection of Quality Change in
Wikipedia <https://arxiv.org/abs/2111.01496>, CSCW 2022 (pdf
<https://arxiv.org/pdf/2111.01496>)
Automatically Labeling Low Quality Content on Wikipedia by Leveraging
Editing BehaviorsBy *Sumit Asthana <http://sumitasthana.xyz/> (University
of Michigan, Ann Arbor)*Wikipedia articles aim to be definitive sources of
encyclopedic content. Yet, only 0.6% of Wikipedia articles have high
quality according to its quality scale due to insufficient number of
Wikipedia editors and enormous number of articles. Supervised Machine
Learning (ML) quality improvement approaches that can automatically
identify and fix content issues rely on manual labels of individual
Wikipedia sentence quality. However, current labeling approaches are
tedious and produce noisy labels. In this talk, I will discuss an automated
labeling approach that identifies the semantic category (e.g., adding
citations, clarifications) of historic Wikipedia edits and uses the
modified sentences prior to the edit as examples that require that semantic
improvement. Highest-rated article sentences are examples that no longer
need semantic improvements. I will discuss the performance of models
training with this labeling approach over models trained with existing
labeling approaches, and also the implications of such a large scale semi
supervised labeling approach in capturing the editing practices of
Wikipedia editors and helping them improve articles faster.Related
paper: Automatically
Labeling Low Quality Content on Wikipedia By Leveraging Patterns in Editing
Behaviors <https://dl.acm.org/doi/10.1145/3479503>, CSCW 2021 (pdf
<https://arxiv.org/pdf/2108.02252>)
--
Emily Lescak (she / her)
Senior Research Community Officer
The Wikimedia Foundation
Hello,
The Research team [0] at the Wikimedia Foundation hosts monthly Office
Hours [1] to connect with researchers, answer questions, and share updates.
To improve the impact and accessibility of our sessions, we invite you to
share your feedback in a brief optional survey [2]. We estimate that it
will take about 5-10 minutes to complete. We welcome your input even if you
have not attended Office Hours. If you prefer to not respond via Google
form, you can provide your feedback via email. We will accept responses
until April 15, 2022.
Thank you for your time and consideration.
Emily, on behalf of the Research team
[0] https://research.wikimedia.org/
[1] https://www.mediawiki.org/wiki/Wikimedia_Research/Office_hours
[2] https://forms.gle/Y5zJ7gunk4RvqvJX8
--
Emily Lescak (she / her)
Senior Research Community Officer
The Wikimedia Foundation
Hi everyone,
Summary: Wiki Workshop 2022 [0] will take place virtually as part of
The Web Conference 2022 [1]. Call for papers is now open:
https://wikiworkshop.org/2022/#call . Deadline to submit for paper to
appear in the proceedings of the conference is Feb 3, for all other
submissions March 10. The workshop will take place on April 25, 2022.
--
We are delighted to announce that Wiki Workshop 2022 [0] will be held
virtually April 25, 2022 and as part of the Web Conference 2022 [1].
In the past years, Wiki Workshop has traveled to Oxford, Montreal,
Cologne, Perth, Lyon, and San Francisco, and (virtually) to Taipei and
Ljubljana.
Last year, we had more than 150 participants in the workshop along
with 22 accepted paper presentations, keynote, panel, music and more.
The workshop is now a vibrant event for Wikimedia researchers and
those interested in this space to get together on an annual basis.
We encourage contributions by all researchers who study the Wikimedia
projects. We specifically encourage 1-2 page submissions of
preliminary research. You will have the option to publish your work as
part of the proceedings of The Web Conference 2022.
You can read more about the call for papers and the workshop at
http://wikiworkshop.org/2022/#call. Please note that the deadline for
the submissions to be considered for proceedings is February 3. All
other submissions should be received by March 10.
If you have questions about the workshop, please let us know on this
list or at wikiworkshop(a)googlegroups.com.
Looking forward to seeing many of you in this year's edition.
Best,
Srijan Kumar, Georgia Tech
Emily Lesack, Wikimedia Foundation
Miriam Redi, Wikimedia Foundation
Bob West, EPFL
Leila Zia, Wikimedia Foundation
[0] https://wikiworkshop.org/2022/
[1] https://www2022.thewebconf.org/