Hello everyone,
The next Research Showcase will be live-streamed Wednesday, October 19, at
9:30 AM PST/16:30 UTC. Find your local time here
<https://zonestamp.toolforge.org/1666197004>.
YouTube stream: https://www.youtube.com/watch?v=ML-ULyARpU4
Members of the Research team will collect questions on IRC at
#wikimedia-research and YouTube.
This month's presentation is a panel discussion celebrating Wikidata's 10th
birthday!
October 2022 marks the tenth anniversary of the launch of Wikidata (
www.wikidata.org). In ten years, this project has become the largest
community-driven free knowledge graph in the world, enabling a common
knowledge base for Wikimedia projects. The language-independent nature of
Wikidata has greatly improved the maintenance and consistency of knowledge
across Wikipedia language editions, fostering knowledge equity in
Wikimedia. In addition, since Wikidata is a collaborative project that can
be read and edited by humans and machines alike, it is also widely used in
third-party applications delivering knowledge as a service for all. The
Wikimedia Research community has devoted significant effort and resources
in studying the foundations, capabilities and applications of Wikidata,
from the complex requirements of representing real-world knowledge in a
multilingual environment to the needs to assess the quality of data and
sources in Wikidata. To learn more about the state of the art of Wikidata
and research challenges in the era of AI/ML, we will celebrate this tenth
anniversary with a panel that will bring together established
researchers/practitioners in this field.
The panel will be moderated by Denny Vrandečić (WMF) with panelists Lydia
Pintscher (WMDE), Elena Simperl (King's College London), Katherine Thornton
(Yale), and Markus Krötzsch (Technical University of Dresden).
You can also watch our past research showcases here:
https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase
We hope you can join us!
Warm regards,
Emily, on behalf of the WMF Research team
--
Emily Lescak (she / her)
Senior Research Community Officer
The Wikimedia Foundation
Hi all,
We are seeing missing page view data for multiple Wikipedia language
projects for September 21st, 2022.
Here is an example link showing that 2022-09-21 is not available for Times
Square:
https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedi…
Is there awareness of this issue and/or an estimate of when the data might
be available?
Regards,
Ben
Hi all,
The next Research Showcase, featuring the recipients of this year's
Wikimedia Foundation Research Awards of the Year, will be live-streamed
Wednesday, July 20, at 9:30 AM PST/16:30 UTC. Find your local time here
<https://zonestamp.toolforge.org/1658334607>.
YouTube stream: https://www.youtube.com/watch?v=KMvXOQU5fX4
<https://www.youtube.com/watch?v=KMvXOQU5fX4>
You are welcome to ask questions via YouTube chat or on IRC at
#wikimedia-research.
This month's presentations:
Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine
LearningBy *Krishna Srinivasan (Google)*The milestone improvements brought
about by deep representation learning and pre-training techniques have led
to large performance gains across downstream NLP, IR and Vision tasks.
Multimodal modeling techniques aim to leverage large high-quality
visio-linguistic datasets for learning complementary information across
image and text modalities. In this talk, I introduce the Wikipedia-based
Image Text (WIT) Dataset to better facilitate multimodal, multilingual
learning. WIT is composed of a curated set of 37.5 million entity rich
image-text examples with 11.5 million unique images across 108 Wikipedia
languages.
WIT’s unique advantages include: WIT is the largest multimodal dataset by
the number of image-text examples by 3x (at the time of writing). WIT is
massively multilingual (first of its kind) with coverage over 100+
languages. WIT represents a more diverse set of concepts and real world
entities relative to what previous datasets cover.
WIT Dataset is available for download and use via a Creative Commons
license here: https://github.com/google-research-datasets/wit
I conclude the talk with future directions to expand and extend the WIT
dataset. Link to paperː https://arxiv.org/pdf/2103.01913.pdf
Assessing the Quality of Sources in Wikidata Across LanguagesBy *Gabriel
Amaral (King's College London)*Wikidata is one of the most important
sources of structured data on the web, built by a worldwide community of
volunteers. As a secondary source, its contents must be backed by credible
references; this is particularly important as Wikidata explicitly
encourages editors to add claims for which there is no broad consensus, as
long as they are corroborated by references. Nevertheless, despite this
essential link between content and references, Wikidata’s ability to
systematically assess and assure the quality of its references remains
limited. To this end, we carry out a mixed-methods study to determine the
relevance, ease of access, and authoritativeness of Wikidata references, at
scale and in different languages, using online crowdsourcing, descriptive
statistics, and machine learning. The findings help us ascertain the
quality of references in Wikidata, and identify common challenges in
defining and capturing the quality of user-generated multilingual
structured data on the web. Link to paperː
https://dl.acm.org/doi/abs/10.1145/3484828
You can also watch our past research showcases here:
https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase
Emily, on behalf of the Research team
--
Emily Lescak (she / her)
Senior Research Community Officer
The Wikimedia Foundation
Hello,
I am one of the test engineers on the QTE team.
There is a plan to migrate the MediaWiki software on production to
Kubernetes.
In preparation for this, we will be migrating test2wiki to Kubernetes
first so that QTE can test it and catch any bugs before the wider
roll-out.
I am trying to identify areas of our software for which the migration to
Kubernetes might pose a risk.
I wonder if this might be true of any of the software you are
responsible for. In particular, I am thinking about where MediaWiki is
interacting with different services in our ecosystem. I don't know
enough about this area to make an informed judgement.
Any ideas about what might be risky and in need of testing, and how one
might go about testing it on test2wiki
(https://test2.wikipedia.org/wiki/Main_Page) would be of great help to
me.
Let me know if you have any questions.
Thank you,
Dom
Hi all,
Join the Research Team at the Wikimedia Foundation [1] for their monthly
Office hours Tuesday, 2022-07-05. Find your local time here
<https://zonestamp.toolforge.org/1657036800>.
To participate, join the video-call via this link [2]. There is no set
agenda - feel free to add your item to the list of topics in the etherpad
[3]. You are welcome to add questions / items to the etherpad in advance,
or when you arrive at the session. Even if you are unable to attend the
session, you can leave a question that we can address asynchronously. If
you do not have a specific agenda item, you are welcome to hang out and
enjoy the conversation. More detailed information (e.g., about how to
attend) can be found here [4].
Through these office hours, we aim to make ourselves available to answer
research related questions that you as Wikimedia volunteer editors,
organizers, affiliates, staff, and researchers face in your projects and
initiatives. Here are some example cases we hope to be able to support you
with:
-
You have a specific research related question that you suspect you
should be able to answer with the publicly available data and you don’t
know how to find an answer for it, or you just need some more help with it.
For example, how can I compute the ratio of anonymous to registered editors
in my wiki?
-
You run into repetitive or very manual work as part of your Wikimedia
contributions and you wish to find out if there are ways to use machines to
improve your workflows. These types of conversations can sometimes be
harder to find an answer for during an office hour. However, discussing
them can help us understand your challenges better and we may find ways to
work with each other to support you in addressing it in the future.
-
You want to learn what the Research team at the Wikimedia Foundation
does and how we can potentially support you. Specifically for affiliates:
if you are interested in building relationships with the academic
institutions in your country, we would love to talk with you and learn
more. We have a series of programs that aim to expand the network of
Wikimedia researchers globally and we would love to collaborate with those
of you interested more closely in this space.
-
You want to talk with us about one of our existing programs [5].
Hope to see many of you,
Emily, on behalf of the WMF Research Team
[1] https://research.wikimedia.org
[2] https://meet.jit.si/WMF-Research-Office-Hours
[3] https://etherpad.wikimedia.org/p/Research-Analytics-Office-hours
[4] https://www.mediawiki.org/wiki/Wikimedia_Research/Office_hours
[5] https://research.wikimedia.org/projects.html
--
Emily Lescak (she / her)
Senior Research Community Officer
The Wikimedia Foundation
Hi all,
The next Research Showcase, *Wikipedia's Languages*, will be live-streamed
Wednesday, June 15, at 4:00 AM PST/11:00 AM UTC. View your local time here
<https://zonestamp.toolforge.org/1655290800>.
YouTube stream: https://www.youtube.com/watch?v=AZQM1dtn3g0
You are welcome to ask questions via YouTube chat or on IRC at
#wikimedia-research.
This month's presentations:
Quantifying knowledge synchronisation in the 21st centuryBy *Jisung Yoon
(Pohang University of Science and Technology)*Humans acquire and accumulate
knowledge through language usage and eagerly exchange their knowledge for
advancement. Although geographical barriers had previously limited
communication, the emergence of information technology has opened new
avenues for knowledge exchange. However, it is unclear which communication
pathway is dominant in the 21st century. Here, we explore the dominant path
of knowledge diffusion in the 21st century using Wikipedia, the largest
communal dataset. We evaluate the similarity of shared knowledge between
population groups, distinguished based on their language usage. When
population groups are more engaged with each other, their knowledge
structure is more similar, where engagement is indicated by socio-economic
connections, such as cultural, linguistic, and historical features.
Moreover, geographical proximity is no longer a critical requirement for
knowledge dissemination. Furthermore, we integrate our data into a
mechanistic model to better understand the underlying mechanism and suggest
that the knowledge "Silk Road" of the 21st century is based online.
The Language Geography of WikipediaBy *Martin Dittus*Every language is a
system of being, doing, knowing, and imagining. With over 7,000 active
languages in the world, how many languages are fully represented online? To
answer this question, digital non-profit Whose Knowledge? initiated the
first ever report on the State of the Internet's Languages. As part of this
report, Martin Dittus and Mark Graham have investigated the languages of
Wikipedia. Wikipedia began with a single English-language edition more than
two decades ago, and now offers more than 300 language editions, which
places it at the forefront of digital language support. However, this does
not mean that speakers of these languages get access to the same content:
Wikipedia’s language editions vary widely in scale. We further find that
this inequality is also reflected in Wikipedia’s geographic coverage: not
all places are captured in every language. Wikipedia's coverage often
follows the global distribution of speakers of the respective language. Yet
even when we account for the distribution of language populations, certain
language communities are much more strongly represented on Wikipedia than
others. As a consequence, we find that for many countries in Africa,
Central and South America, and South Asia, most of the content about those
countries is in a foreign language, often a European-colonial language. In
other words, in many of these places, people may need to be able to speak a
second (possibly foreign) language in order to access Wikipedia information
about their own places. Why do we see these differences? And what can be
done to improve things?
You can also watch our past research showcases here:
https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase
Emily, on behalf of the Research team
--
Emily Lescak (she / her)
Senior Research Community Officer
The Wikimedia Foundation
Hi all,
The registration for Wiki Workshop 2022 [1] is now open. The event is
virtually held on April 25, 12:00-18:30 UTC and as part of The Web
Conference 2022 [2]. The plenary parts of the event will be recorded
and shared publicly afterwards.
Wiki Workshop is the largest Wikimedia research event of the year (so
far;) that the Research team at the Wikimedia Foundation co-organizes
with our Research Fellow, Bob West (EPFL). This year, Srijan Kumar
(Georgia Tech) joined the organizing team as well.:) The event brings
together scholars and researchers from across the world who are
interested in or are actively engaged with research and development on
the Wikimedia projects.
While the details of the schedule are to be finalized and posted in
the coming week, we expect to generally follow the format of 2021 [3].
This year we received research submissions from more than 20 countries
and have accepted 27 research papers whose authors will present the
work as part of the workshop (If you are an author of an accepted
paper: congrats!:) . Our keynote speaker is Larry Lessig [4] and we
will have a panel to reflect on the decade anniversary of SOPA/PIPA,
moderated by Erik Moeller (Freedom of the Press). And of course, all
the music, games, etc. will remain. :)
If you are interested in participating in the live event, please
indicate your interest by filling out [5]. Anyone is encouraged to
register: you don't have to be a researcher. In the registration form,
please explain why attending the live event will support you in your
work on the Wikimedia projects and beyond.
If you have questions, please don't hesitate to reach out.
Best,
Leila
[1] https://wikiworkshop.org/2022/
[2] https://www2022.thewebconf.org/
[3] https://wikiworkshop.org/2021/#schedule
[4] https://hls.harvard.edu/faculty/directory/10519/Lessig
[5] (privacy statement for the Google form survey [6])
https://docs.google.com/forms/d/e/1FAIpQLSctlkUv8FasB2Nc4RvThnxAbjPzUwmnxB2…
[6] https://foundation.wikimedia.org/wiki/Legal:Wiki_Workshop_Registration_Priv…
--
Leila Zia
Head of Research
Wikimedia Foundation
Hi all,
Join the Research Team at the Wikimedia Foundation [1] for their monthly
Office hours Tuesday, 2022-06-07. Find your local time here
<https://zonestamp.toolforge.org/1654642800>.
To participate, join the video-call via this link [2]. There is no set
agenda - feel free to add your item to the list of topics in the etherpad
[3]. You are welcome to add questions / items to the etherpad in advance,
or when you arrive at the session. Even if you are unable to attend the
session, you can leave a question that we can address asynchronously. If
you do not have a specific agenda item, you are welcome to hang out and
enjoy the conversation. More detailed information (e.g., about how to
attend) can be found here [4].
Through these office hours, we aim to make ourselves available to answer
research related questions that you as Wikimedia volunteer editors,
organizers, affiliates, staff, and researchers face in your projects and
initiatives. Here are some example cases we hope to be able to support you
with:
-
You have a specific research related question that you suspect you
should be able to answer with the publicly available data and you don’t
know how to find an answer for it, or you just need some more help with it.
For example, how can I compute the ratio of anonymous to registered editors
in my wiki?
-
You run into repetitive or very manual work as part of your Wikimedia
contributions and you wish to find out if there are ways to use machines to
improve your workflows. These types of conversations can sometimes be
harder to find an answer for during an office hour. However, discussing
them can help us understand your challenges better and we may find ways to
work with each other to support you in addressing it in the future.
-
You want to learn what the Research team at the Wikimedia Foundation
does and how we can potentially support you. Specifically for affiliates:
if you are interested in building relationships with the academic
institutions in your country, we would love to talk with you and learn
more. We have a series of programs that aim to expand the network of
Wikimedia researchers globally and we would love to collaborate with those
of you interested more closely in this space.
-
You want to talk with us about one of our existing programs [5].
Hope to see many of you,
Emily, on behalf of the WMF Research Team
[1] https://research.wikimedia.org
[2] https://meet.jit.si/WMF-Research-Office-Hours
[3] https://etherpad.wikimedia.org/p/Research-Analytics-Office-hours
[4] https://www.mediawiki.org/wiki/Wikimedia_Research/Office_hours
[5] https://research.wikimedia.org/projects.html
--
Emily Lescak (she / her)
Senior Research Community Officer
The Wikimedia Foundation
To all observers,
Okay so I wouldn't even bother with the idea of altering infrastructure,
I'd Focus More On Substructure In between each branch, even though they
would later have to go through a phishing obviously if the idea is getting
the organization to proceed through the initial intake and have a
filtration system with a protocols to ensure nothing is ever able to be
considered stagnant.Especially avoiding the ongoing process resulting in
the stress upon the colliding aforementioned intake process that's built
noticeably compiled attention from different standpoints.
I also haven't the slightest clue in that which is I am amidst the process
of, but i just reread it and sounds like that would overcomplicate the
units pathing/macros/scripting /trigger/actionbecausei seriously am so sick
i cannot even keep focus and cold sweats and shivering i will take my
leave for a little r&r will be strong and recuperate by next week.