Hello everybody,
I'd like to announce a new dataset of Wikipedia pageviews that shows trends
in search engine usage across countries, language editions, internet
browsers, and device operating systems (
https://techblog.wikimedia.org/2021/06/07/searching-for-wikipedia/). We are
releasing this dataset within the context of the Foundational program [1]
and in particular research focused on Wikimedia's relationship with
external platforms [2]. This dataset fills a large gap in both public
information about global search engine usage and our understanding of the
external platforms that mediate how readers reach Wikipedia.
For context, approximately 50% of Wikipedia pageviews come directly from
clicking on links on search engines (with another 30% coming from clicking
on internal links within Wikipedia and the rest from external sites or
direct navigation) [3]. For those 50% of pageviews coming from search
engines, this dataset (and corresponding dashboard) allows you to examine
which search engines were used and how those patterns vary by country,
Wikipedia language, internet browser, and device operating system.
If you have any questions about these datasets or related projects please
feel free to contact me. More details here:
https://techblog.wikimedia.org/2021/06/07/searching-for-wikipedia/
Best,
Isaac
[1] https://research.wikimedia.org/foundational.html
[2]
https://meta.wikimedia.org/wiki/Research:External_Reuse_of_Wikimedia_Content
[3] https://discovery.wmflabs.org/external/
Launch of The Public Service Media and Public Service Internet Manifesto.
Online event, Thursday 17 June 2021, 16:00 UK time, 17:00 Central
European Time
https://www.eventbrite.co.uk/e/launch-of-the-public-service-media-and-publi…
This event launches “The Public Service Media and Public Service
Internet Manifesto”.
The Internet and the media landscape are broken. The dominant commercial
Internet platforms endanger democracy. The Manifesto stresses the
importance of public service media and the creation of a public service
Internet for the future of society and safeguarding democracy.
In the online event, media experts will talk about why they support and
signed the Manifesto that is the outcome of a discussion and
collaboration process organised as part of the AHRC research network
InnoPSM: Innovation in Public Service Media Policies.
With interventions by Alessandro D'Arma, Roy Cobby Avaria, Leonhard
Dobusch, Christian Fuchs, Minna Horowitz, Luciana Musello, Jack L. Qiu,
Barbara Thomass
The event takes place on Zoom. After registering on Eventbrite, you will
receive the Zoom access data at latest one day in advance of the event.
The audience of the event will have the opportunity to be among the
first to read and sign the Public Service Media and Public Service
Internet Manifesto.
Hello everybody,
Within the context of the Knowledge Integrity program
<https://research.wikimedia.org/knowledge-integrity.html>, the Research
Team (and our formal collaborators
<https://www.mediawiki.org/wiki/Wikimedia_Research/Formal_collaborations>)
has been working on releasing relevant datasets on this area.
Recently we have published the following datasets:
-
Tracking Knowledge Propagation Across Wikipedia Languages: A dataset of
inter-language knowledge propagation in Wikipedia. Covering the entire 309
language editions and 33M articles, the dataset aims to track the full
propagation history of Wikipedia concepts, and allow follow up research on
building predictive models of them. For this purpose, we align all the
Wikipedia articles in a language-agnostic manner according to the concept
they cover, their topic, and the timestamp of each article creation, which
results in 13M propagation instances. (paper
<https://arxiv.org/abs/2103.16613>, dataset
<https://zenodo.org/record/4433137>, code
<https://github.com/rodolfovalentim/wikipedia-content-propagation>, meta
<https://meta.wikimedia.org/wiki/Research:Exploration_on_content_propagation…>
)
-
Wiki-Reliability: A Large Scale Dataset for Content Reliability on
(English) Wikipedia: We selected the 10 most popular reliability-related
templates on English Wikipedia, and propose an effective method to label
almost 1M samples of Wikipedia article revisions as positive or negative
with respect to each template. Each positive/negative example in the
dataset comes with the full article text and 20 features from the
revision's metadata (paper <https://arxiv.org/abs/2105.04117>, dataset
<https://figshare.com/articles/dataset/Wiki-Reliability_A_Large_Scale_Datase…>,
code <https://github.com/kay-wong/Wiki-Reliability/>, meta
<https://meta.wikimedia.org/wiki/Research:Wiki-Reliability:_A_Large_Scale_Da…>
).
We hope that these datasets can be used by the research community to keep
working on understanding and modeling knowledge integrity in Wikipedia.
Currently we are working on expanding both datasets. For knowledge
propagation, we are characterizing the different types of cascades, and
generating new prediction models. For the Wiki-Reliability dataset, we are
currently working on expanding this to more languages.
If you have any questions about these datasets or related projects please
feel free to contact me.
Best,
--
Diego Sáez Trumper
Senior Research Scientist
Wikimedia Foundation.
Hi all,
Join the Research Team at the Wikimedia Foundation [1] for their monthly
Office hours on 2021-06-01 at 16:00-17:00 UTC (9am PT/6pm CEST).
To participate, join the video-call via this link [2]. There is no set
agenda - feel free to add your item to the list of topics in the etherpad
[3] (You can do this after you join the meeting, too.), otherwise you are
welcome to also just hang out. More detailed information (e.g. about how to
attend) can be found here [4].
Through these office hours, we aim to make ourselves more available to
answer some of the research related questions that you as Wikimedia
volunteer editors, organizers, affiliates, staff, and researchers face in
your projects and initiatives. Some example cases we hope to be able to
support you in:
-
You have a specific research related question that you suspect you
should be able to answer with the publicly available data and you don’t
know how to find an answer for it, or you just need some more help with it.
For example, how can I compute the ratio of anonymous to registered editors
in my wiki?
-
You run into repetitive or very manual work as part of your Wikimedia
contributions and you wish to find out if there are ways to use machines to
improve your workflows. These types of conversations can sometimes be
harder to find an answer for during an office hour, however, discussing
them can help us understand your challenges better and we may find ways to
work with each other to support you in addressing it in the future.
-
You want to learn what the Research team at the Wikimedia Foundation
does and how we can potentially support you. Specifically for affiliates:
if you are interested in building relationships with the academic
institutions in your country, we would love to talk with you and learn
more. We have a series of programs that aim to expand the network of
Wikimedia researchers globally and we would love to collaborate with those
of you interested more closely in this space.
-
You want to talk with us about one of our existing programs [5].
Hope to see many of you,
Martin on behalf of the WMF Research Team
[1] https://research.wikimedia.org/team.html
[2] https://meet.jit.si/WMF-Research-Office-Hours
[3] https://etherpad.wikimedia.org/p/Research-Analytics-Office-hours
[4] https://www.mediawiki.org/wiki/Wikimedia_Research/Office_hours
[5] https://research.wikimedia.org/projects.html
--
Martin Gerlach
Research Scientist
Wikimedia Foundation