Hello all,
The next Wikimedia Research Showcase will be held Wednesday, December 15 at
17:30 UTC (9:30 PT / 12:30 ET / 18:30 CET).
You can view the livestream here: https://youtu.be/HKODaHgmQWw
<https://www.google.com/url?q=https://youtu.be/HKODaHgmQWw&sa=D&source=calen…>
The Showcase will feature the following talks:
*Latin American Youth and their Information Ecosystem: Finding, Evaluation,
Creating, and Sharing Content Online*
The increased importance the Internet plays as a core source of information
in youth's lives, now underscored by the pandemic, gives new urgency to the
need to better understand young people’s information habits and attitudes.
Answers to questions like where young people go to look for information,
what information they decide to trust and how they share the information
they find, hold important implications for the knowledge they obtain, the
beliefs they form and the actions they take in areas ranging from personal
health, professional employment or their educational training.
In this research showcase, we will be summarizing insights from focus group
interviews in Latin America that offer a window into the experiences of
young people themselves. Taken together, these perspectives might help us
to develop a more comprehensive understanding of how young people in Latin
America use the Internet in general and interact with information from
online sources in particular.
Speakers: Lionel Brossi and Ana María Castillo. Artificial Intelligence and
Society Hub at University of Chile.
--
Characterizing the Online Learning Landscape: What and How People Learn
Online
Hundreds of millions of people learn something new online every day.
Simultaneously, the study of online education has blossomed with new
systems, experiments, and observations creating and exploring previously
undiscovered online learning environments. In this talk I will discuss our
study, in which we endeavor to characterize this entire landscape of online
learning experiences using a national survey of 2260 US adults who are
balanced to match the demographics of the U.S. We examine the online
learning resources that they consult, and we analyze the subjects that they
pursue using those resources. Furthermore, we compare both formal and
informal online learning experiences on a larger scale than has ever been
done before, to our knowledge, to better understand which subjects people
are seeking for intensive study. We find that there is a core set of online
learning experiences that are central to other experiences and these are
shared among the majority of people who learn online.
Speaker: Sean Kross, University of California San Diego
https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase
--
Janna Layton (she/her)
Administrative Associate - Product & Technology
Wikimedia Foundation <https://wikimediafoundation.org/>
Hi all,
Join the Research Team at the Wikimedia Foundation [1] for their monthly
Office hours this Wednesday, 2021-12-08 at 00:00-1:00 UTC (16:00 PT 12-07 /
19:00 ET 12-07 / 1:00 CET 12-08). Find your local date and time here
<https://zonestamp.toolforge.org/1638921637>. Please note the time change!
We are experimenting with our Office hours schedules to make our sessions
more globally welcoming.
To participate, join the video-call via this link [2]. There is no set
agenda - feel free to add your item to the list of topics in the etherpad
[3]. You are welcome to add questions / items to the etherpad in advance,
or when you arrive at the session. Even if you are unable to attend, you
can leave a question that we can address asynchronously. If you do not have
a specific agenda item, you are welcome to hang out and enjoy the
conversation. More detailed information (e.g. about how to attend) can be
found here [4].
Through these office hours, we aim to make ourselves more available to
answer research related questions that you as Wikimedia volunteer editors,
organizers, affiliates, staff, and researchers face in your projects and
initiatives. Here are some example cases we hope to be able to support you
with:
-
You have a specific research related question that you suspect you
should be able to answer with the publicly available data and you don’t
know how to find an answer for it, or you just need some more help with it.
For example, how can I compute the ratio of anonymous to registered editors
in my wiki?
-
You run into repetitive or very manual work as part of your Wikimedia
contributions and you wish to find out if there are ways to use machines to
improve your workflows. These types of conversations can sometimes be
harder to find an answer for during an office hour. However, discussing
them can help us understand your challenges better and we may find ways to
work with each other to support you in addressing it in the future.
-
You want to learn what the Research team at the Wikimedia Foundation
does and how we can potentially support you. Specifically for affiliates:
if you are interested in building relationships with the academic
institutions in your country, we would love to talk with you and learn
more. We have a series of programs that aim to expand the network of
Wikimedia researchers globally and we would love to collaborate with those
of you interested more closely in this space.
-
You want to talk with us about one of our existing programs [5].
This is also a good opportunity to learn more about the Research Fund [6]!
Hope to see many of you,
Emily on behalf of the WMF Research Team
[1] https://research.wikimedia.org
[2] https://meet.jit.si/WMF-Research-Office-Hours
[3] https://etherpad.wikimedia.org/p/Research-Analytics-Office-hours
[4] https://www.mediawiki.org/wiki/Wikimedia_Research/Office_hours
[5] https://research.wikimedia.org/projects.html
[6]
https://meta.wikimedia.org/wiki/Grants:Programs/Wikimedia_Research_%26_Tech…
--
Emily Lescak (she / her)
Senior Research Community Officer
The Wikimedia Foundation
Hello all,
The next Wikimedia Research Showcase will be on Wednesday, November 17, at
17:30 UTC (9:30am PST/12:30pm EST/ 18:30 CET). The topic is content
moderation.
Livestream: https://www.youtube.com/watch?v=Rx3xesDkp2o
*Amy S. Bruckman (Georgia Institute of Technology, USA)Is Deplatforming
Censorship? What happened when controversial figures were deplatformed,
with philosophical musings on the nature of free speech*
Abstract: When a controversial figure is deplatformed, what happens to
their online influence? In this talk, first, I’ll present results from a
study of the deplatforming from Twitter of three figures who repeatedly
broke platform rules (Alex Jones, Milo Yiannopoulos, and Owen Benjamin).
Second, I’ll discuss what happened when this study was on the front page of
Reddit, and the range of angry reactions from people who say that they’re
in favor of “free speech.” I’ll explore the nature of free speech, and why
our current speech regulation framework is fundamentally broken. Finally,
I’ll conclude with thoughts on the strength of Wikipedia’s model in
contrast to other platforms, and highlight opportunities for improvement.
*Nathan TeBlunthuis (University of Washington / Northwestern University,
USA)Effects of Algorithmic Flagging on Fairness. Quasi-experimental
Evidence from Wikipedia*
Abstract: Online community moderators often rely on social signals such as
whether or not a user has an account or a profile page as clues that users
may cause problems. Reliance on these clues can lead to "overprofiling bias
when moderators focus on these signals but overlook the misbehavior of
others. We propose that algorithmic flagging systems deployed to improve
the efficiency of moderation work can also make moderation actions more
fair to these users by reducing reliance on social signals and making norm
violations by everyone else more visible. We analyze moderator behavior in
Wikipedia as mediated by RCFilters, a system which displays social signals
and algorithmic flags, and estimate the causal effect of being flagged on
moderator actions. We show that algorithmically flagged edits are reverted
more often, especially those by established editors with positive social
signals, and that flagging decreases the likelihood that moderation actions
will be undone. Our results suggest that algorithmic flagging systems can
lead to increased fairness in some contexts but that the relationship is
complex and contingent.
https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase
--
Janna Layton (she/her)
Administrative Associate - Product & Technology
Wikimedia Foundation <https://wikimediafoundation.org/>
Hi all,
Join the Research Team at the Wikimedia Foundation [1] for their monthly
Office hours this Tuesday, 2021-11-02, at 12:00-13:00 UTC (5am PT/8am
ET/1pm CET). Please note the time change! We are experimenting with our
Office hours schedules to make our sessions more globally welcoming.
To participate, join the video-call via this link [2]. There is no set
agenda - feel free to add your item to the list of topics in the etherpad
[3]. You are welcome to add questions / items to the etherpad in advance,
or when you arrive at the session. Even if you are unable to attend the
session, you can leave a question that we can address asynchronously. If
you do not have a specific agenda item, you are welcome to hang out and
enjoy the conversation. More detailed information (e.g. about how to
attend) can be found here [4].
Through these office hours, we aim to make ourselves more available to
answer research related questions that you as Wikimedia volunteer editors,
organizers, affiliates, staff, and researchers face in your projects and
initiatives. Here are some example cases we hope to be able to support you
with:
-
You have a specific research related question that you suspect you
should be able to answer with the publicly available data and you don’t
know how to find an answer for it, or you just need some more help with it.
For example, how can I compute the ratio of anonymous to registered editors
in my wiki?
-
You run into repetitive or very manual work as part of your Wikimedia
contributions and you wish to find out if there are ways to use machines to
improve your workflows. These types of conversations can sometimes be
harder to find an answer for during an office hour. However, discussing
them can help us understand your challenges better and we may find ways to
work with each other to support you in addressing it in the future.
-
You want to learn what the Research team at the Wikimedia Foundation
does and how we can potentially support you. Specifically for affiliates:
if you are interested in building relationships with the academic
institutions in your country, we would love to talk with you and learn
more. We have a series of programs that aim to expand the network of
Wikimedia researchers globally and we would love to collaborate with those
of you interested more closely in this space.
-
You want to talk with us about one of our existing programs [5].
Hope to see many of you,
Emily on behalf of the WMF Research Team
[1] https://research.wikimedia.org
[2] https://meet.jit.si/WMF-Research-Office-Hours
[3] https://etherpad.wikimedia.org/p/Research-Analytics-Office-hours
[4] https://www.mediawiki.org/wiki/Wikimedia_Research/Office_hours
[5] https://research.wikimedia.org/projects.html
--
Emily Lescak (she / her)
Senior Research Community Officer
The Wikimedia Foundation
Hi all,
The next Wikimedia Research Showcase will be on October 27, 16:30 UTC (9:30am
PT/ 12:30pm ET/ 18:30pm CEST). The Wikimedia Foundation Research Team will
present on knowledge gaps.
Livestream: https://www.youtube.com/watch?v=d0Qg98EVmuI
Speaker: Wikimedia Foundation Research Team
Title: Automatic approaches to bridge knowledge gaps in Wikimedia projects
Abstract: In order to advance knowledge equity as part of the Wikimedia
Movement’s 2030 strategic direction, the Research team at the Wikimedia
Foundation has been conducting research to “Address Knowledge Gaps” as one
of its main programs. One core component of this program is to develop
technologies to bridge knowledge gaps. In this talk, we give an overview on
how we approach this task using tools from Machine Learning in four
different contexts: section alignment in content translation, link
recommendation in structured editing, image recommendation in multimedia
knowledge gaps, and the equity of the recommendations themselves. We will
present how these models can assist contributors in addressing knowledge
gaps. Finally, we will discuss the impact of these models in applications
deployed across Wikimedia projects supporting different Product initiatives
at the Wikimedia Foundation.
More information:
* Section alignment:
meta:Research:Expanding_Wikipedia_articles_across_languages/Inter_language_approach#Section_Alignment
<https://meta.wikimedia.org/wiki/Research:Expanding_Wikipedia_articles_acros…>
* Link recommendation:
meta:Research:Link_recommendation_model_for_add-a-link_structured_task
<https://meta.wikimedia.org/wiki/Research:Link_recommendation_model_for_add-…>
* Image recommendation:
meta:Research:Recommending_Images_to_Wikipedia_Articles
<https://meta.wikimedia.org/wiki/Research:Recommending_Images_to_Wikipedia_A…>
* Equity in recommendations:
meta:Research:Prioritization_of_Wikipedia_Articles/Recommendation
<https://meta.wikimedia.org/wiki/Research:Prioritization_of_Wikipedia_Articl…>
--
Janna Layton (she/her)
Administrative Associate - Product & Technology
Wikimedia Foundation <https://wikimediafoundation.org/>
--
Janna Layton (she/her)
Administrative Associate - Product & Technology
Wikimedia Foundation <https://wikimediafoundation.org/>
Data gap in API. Hey all, does anyone know if there is a plan to get the
API loaded with data for Oct 21st. Seeing a lot of language versions
missing data just for that day, even if they have data the day before and
after. Languages I've noticed with the gap include: Arabic, Chinese,
Russian, Japanese, Turkish, VIetnamese, Thai, and Portuguese. I'm sure
there are others as well
One example seen below:
https://pageviews.toolforge.org/?project=zh.wikipedia.org&platform=all-acce…
Dear Wikimedia analytics team,
We are 3 master students from Vrije Universiteit Amsterdam (VU) and Universtity of Amsterdam (UVA) doing a large scale data engineering project about detecting DDOS attacks on Wikipedia by analysing page views and traffic and trying to distinguish e.g. DDOS attacks from trending topics.
For this project, we need a lot of data. We found two sources of public data, Pageview complete (https://dumps.wikimedia.org/other/pageview_complete/) and the filtered version thereof (https://dumps.wikimedia.org/other/pageviews/). While these dumps are already quite useful, we also found that there is a dataset with even more information (https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Pageview_ho…), in particular it contains the country a pageview came from and the referer, which could both be very useful for our project.
According to the above page, this dataset has been made private since 2018. We would like to ask whether it is possible to have access to this dataset for our research, or any other extended version of the public dump, which would enable us to do more in-depth research. We have our own cluster so we could work on a copy of the data. Moreover we would like to share our project and all our results with you to help contribute to your security measures.
Best regards,
Charel Felten, Gilles Magalhaes and Aleksander Janczewski
Hi all,
Join the Research Team at the Wikimedia Foundation [1] for their monthly
Office hours next Tuesday, 2021-10-05, at 16:00-17:00 UTC (9am PT/6pm
CEST). To participate, join the video-call via this link [2]. There is no
set agenda - feel free to add your item to the list of topics in the
etherpad [3] (You can do this after you join the meeting, too.), otherwise
you are welcome to also just hang out. More detailed information (e.g.
about how to attend) can be found here [4]. Through these office hours, we
aim to make ourselves more available to answer some of the research related
questions that you as Wikimedia volunteer editors, organizers, affiliates,
staff, and researchers face in your projects and initiatives. Some example
cases we hope to be able to support you in: - You have a specific research
related question that you suspect you should be able to answer with the
publicly available data and you don’t know how to find an answer for it, or
you just need some more help with it. For example, how can I compute the
ratio of anonymous to registered editors in my wiki? - You run into
repetitive or very manual work as part of your Wikimedia contributions and
you wish to find out if there are ways to use machines to improve your
workflows. These types of conversations can sometimes be harder to find an
answer for during an office hour, however, discussing them can help us
understand your challenges better and we may find ways to work with each
other to support you in addressing it in the future. - You want to learn
what the Research team at the Wikimedia Foundation does and how we can
potentially support you. Specifically for affiliates: if you are interested
in building relationships with the academic institutions in your country,
we would love to talk with you and learn more. We have a series of programs
that aim to expand the network of Wikimedia researchers globally and we
would love to collaborate with those of you interested more closely in this
space. - You want to talk with us about one of our existing programs [5].
Hope to see many of you, Emily on behalf of the WMF Research Team [1]
https://research.wikimedia.org [2]
https://meet.jit.si/WMF-Research-Office-Hours [3]
https://etherpad.wikimedia.org/p/Research-Analytics-Office-hours [4]
https://www.mediawiki.org/wiki/Wikimedia_Research/Office_hours
[5] https://research.wikimedia.org/projects.html
--
Emily Lescak (she / her)
Senior Research Community Officer
The Wikimedia Foundation
TL;DR I would like to access wikipedia's articles' metadata (such as # edits, pageviews etc). I need to access a big volume of instances in order to train and maintain an online classifier and the API seems not sustainable. I was wondering which tool is the most appropriate for this task.
Hello everyone,
It is my first time interacting in this mailing list, so I will be happy to receive further feedbacks on how to better interact with the community :)
I crossposted this message to Wiki-research-l as well.
I am trying to access Wikipedia meta data in a streaming and time/resource sustainable manner. By meta data I mean many of the voices that can be found in the statistics of a wiki article, such as edits, editors list, page views etc.
I would like to do such for an online classifier type of structure: retrieve the data from a big number of wiki pages every tot time and use it as input for predictions.
I tried to use the Wiki API, however it is time and resource expensive, both for me and Wikipedia.
My preferred choice now would be to query the specific tables in the Wikipedia database, in the same way this is done through the Quarry tool. The problem with Quarry is that I would like to build a standalone script, without having to depend on a user interface like Quarry. Do you think that this is possible? I am still fairly new to all of this and I don’t know exactly which is the best direction.
I saw [1] that I could access wiki replicas both through Toolforge and PAWS, however I didn’t understand which one would serve me better, could I ask you for some feedback?
Also, as far as I understood [2], directly accessing the DB through Hive is too technical for what I need, right? Especially because it seems that I would need an account with production shell access and I honestly don’t think that I would be granted access to it. Also, I am not interested in accessing sensible and private data.
Last resource is parsing analytics dumps, however this seems less organic in the way of retrieving and polishing the data. As also, it would be strongly decentralised and physical-machine dependent, unless I upload the polished data online every time.
Sorry for this long message, but I thought it was better to give you a clearer picture (hoping this is clear enough). If you could give me even some hint it would be highly appreciated.
Best,
Cristina