Hello all!
The Search Platform Team usually holds an open meeting on the first
Wednesday of each month. Come talk to us about anything related to
Wikimedia search, Wikidata Query Service (WDQS), Wikimedia Commons Query
Service (WCQS), etc.!
Feel free to add your items to the Etherpad Agenda for the next meeting.
Details for our next meeting:
Date: Wednesday, March 1st, 2023
Time: 16:00-17:00 UTC / 08:00 PST / 11:00 EST / 17:00 CET
Etherpad: https://etherpad.wikimedia.org/p/Search_Platform_Office_Hours
Google Meet link: https://meet.google.com/vgj-bbeb-uyi
Join by phone: https://tel.meet/vgj-bbeb-uyi?pin=8118110806927
Have fun and see you soon!
Guillaume
--
*Guillaume Lederrey* (he/him)
Engineering Manager
Wikimedia Foundation <https://wikimediafoundation.org/>
Hello Wikidata!
I'm happy to report that the WDQS reload <https://phabricator.wikimedia.org/T323096> is now complete. We believe the reload has eliminated the data discrepancies mentioned in the linked ticket. However, please let us know if this is not the case.
Thank you for your patience and have a great rest of your week!
Best,
Brian King
SRE, Search Platform Team
Wikimedia Foundation
IRC: inflatador
Hi all,
Please see the call for papers for the 10th edition of Wiki Workshop below.
The call is for extended abstracts (2 pages) of ongoing or completed work.
The deadline is March 23. The submissions are non-archival which means you
can submit work that is already published as well! :)
Submit and join us in conversations about research on the Wikimedia
projects.
Best,
Leila
--
Leila Zia
Head of Research
Wikimedia Foundation
---------- Forwarded message ---------
From: Martin Gerlach <mgerlach(a)wikimedia.org>
Date: Mon, Feb 20, 2023 at 1:29 AM
Subject: [Wiki-research-l] [events] Wiki Workshop 2023 Call for Papers
To: <wiki-research-l(a)lists.wikimedia.org>
Hi everyone,
The call for papers for the 10th Wiki Workshop in 2023 is out:
https://wikiworkshop.org/2023/#call Submit your 2-page abstracts by March
23 (all submissions are non-archival). The workshop will take place on May
11, 2023. For more information, see the workshop website [1].
If you have questions about the workshop, please let us know on this list
or at wikiworkshop(a)googlegroups.com.
Looking forward to seeing many of you in this year's edition.
Best,
Pablo Aragón, Wikimedia Foundation
Martin Gerlach, Wikimedia Foundation
Evelin Heidel, Wikimedistas de Uruguay
Emily Lescak, Wikimedia Foundation
Francesca Tripodi, University of North Carolina
Bob West, EPFL
Leila Zia, Wikimedia Foundation
[1] https://wikiworkshop.org/2023/
—
We invite contributions to the 10th edition (!) of Wiki Workshop, which
will take place virtually on May 11, 2023 (tentatively 12:00-19:00 UTC).
Wiki Workshop is the largest Wikimedia research event of the year, aimed at
bringing together researchers who study all aspects of Wikimedia projects
(including, but not limited to, Wikipedia, Wikidata, Wikimedia Commons,
Wikisource, and Wiktionary) as well as Wikimedia developers, affiliate
organizations, and volunteer editors. Co-organized by the Wikimedia
Foundation’s Research team and members of the Wikimedia research community,
the workshop facilitates a direct pathway for exchanging ideas between the
organizations that serve Wikimedia projects and the researchers actively
studying them. New this year: Building on the successful experiences of
organizing Wiki Workshop in 2015 <https://wikiworkshop.org/2015/>, 2016
<https://wikiworkshop.org/2016/>, 2017 <https://wikiworkshop.org/2017/>,
2018 <https://wikiworkshop.org/2018/>, 2019 <https://wikiworkshop.org/2019/>
, 2020 <https://wikiworkshop.org/2020/>, 2021
<https://wikiworkshop.org/2021/>, and 2022 <https://wikiworkshop.org/2022/>
and based on feedback from authors and participants over the years, we are
introducing a few updates to the research track of the workshop for 2023:
-
This 10th edition will take place as a standalone event (rather than in
co-location with a conference, as in previous years).
-
We have changed the format of submissions and will only accept 2-page
extended abstracts (following the successful IC2S2 model).
-
Submissions are non-archival, so we welcome ongoing, completed, and
already published work.
-
We are excited to share that the authors of Wiki Workshop 2023 will have
the opportunity to receive feedback, improve their work, and submit the
extended version of their research paper to a special issue of the ACM
Transactions on the Web, which will have a dedicated open call for papers
later in 2023.
Topics include, but are not limited to:
-
new technologies and initiatives to grow content, quality, equity,
diversity, and participation across Wikimedia projects
-
use of bots, algorithms, and crowdsourcing strategies to curate, source,
or verify content and structured data
-
bias in content and gaps of knowledge on Wikimedia projects
-
relation between Wikimedia projects and the broader (open) knowledge
ecosystem
-
exploration of what constitutes a source and how/if the incorporation of
other kinds of sources are possible (e.g., oral histories, video)
-
detection of low-quality, promotional, or fake content (misinformation
or disinformation), as well as fake accounts (e.g., sock puppets)
-
questions related to community health (e.g., sentiment analysis,
harassment detection, tools that could increase harmony)
-
motivations, engagement models, incentives, and needs of editors,
readers, and/or developers of Wikimedia projects
-
innovative uses of Wikipedia and other Wikimedia projects for AI and NLP
applications and vice versa
-
consensus-finding and conflict resolution on editorial issues
-
dynamics of content reuse across projects and the impact of policies and
community norms on reuse privacy, security, and trust
-
collaborative content creation
-
innovative uses of Wikimedia projects' content and consumption patterns
as sensors for real-world events, culture, etc.
-
open-source research code, datasets, and tools to support research on
Wikimedia contents and communities
-
connections between Wikimedia projects and the Semantic Web
-
strategies for how to incorporate Wikimedia projects into media literacy
interventions
This year’s Wiki Workshop solicits extended abstracts (PDF format, maximum
2 pages, including references). Submissions that exceed the 2-page limit
will be automatically rejected. Authors may include 1 additional page with
figures and/or tables (including captions) only. Initial submissions
require names and affiliations of authors, 5 keywords, a title, abstract,
and a main text outlining the contribution, methods, findings, and impact
of the work, whichever is relevant. Submissions will be non-archival and as
a result may have already been published, under review, or ongoing
research. All submissions will be reviewed by multiple members of the Wiki
Workshop Program Committee. The names of the authors will be revealed to
the reviewers, whereas reviewers will remain anonymous to authors. Authors
of accepted abstracts will be invited to present their research in a
pre-recorded oral presentation with dedicated time for live Q&A on May 11,
2023. Accepted abstracts may be shared on the website prior to the event.
The template for formatting the submission as well as the submission link
to easychair will be made available by February 23.
--
Martin Gerlach (he/him) | Senior Research Scientist | Wikimedia Foundation
_______________________________________________
Wiki-research-l mailing list -- wiki-research-l(a)lists.wikimedia.org
To unsubscribe send an email to wiki-research-l-leave(a)lists.wikimedia.org
Hello all!
TL;DR: We expect to successfully complete the recent data reload on
Wikidata Query Service soon, but we've encountered multiple failures
related to the size of the graph, and anticipate that this issue may worsen
in the future. Although we succeeded this time, we cannot guarantee that
future reload attempts will be successful given the current trend of the
data reload process. Thank you for your understanding and patience..
Longer version:
WDQS is updated from a stream of recent changes on Wikidata, with a maximum
delay of ~2 minutes. This process was improved as part of the WDQS
Streaming Updater project to ensure data coherence[1] . However, the update
process is still imperfect and can lead to data inconsistencies in some
cases[2][3]. To address this, we reload the data from dumps a few times per
year to reinitialize the system from a known good state.
The recent reload of data from dumps started in mid-December and was
initially met with some issues related to download and instabilities in
Blazegraph, the database used by WDQS[4]. Loading the data into Blazegraph
takes a couple of weeks due to the size of the graph, and we had multiple
attempts where the reload failed after >90% of the data had been loaded.
Our understanding of the issue is that a "race condition" in Blazegraph[5],
where subtle timing changes lead to corruption of the journal in some rare
cases, is to blame.[6]
We want to reassure you that the last reload job was successful on one of
our servers. The data still needs to be copied over to all of the WDQS
servers, which will take a couple of weeks, but should not bring any
additional issues. However, reloading the full data from dumps is becoming
more complex as the data size grows, and we wanted to let you know why the
process took longer than expected. We understand that data inconsistencies
can be problematic, and we appreciate your patience and understanding while
we work to ensure the quality and consistency of the data on WDQS.
Thank you for your continued support and understanding!
Guillaume
[1] https://phabricator.wikimedia.org/T244590
[2] https://phabricator.wikimedia.org/T323239
[3] https://phabricator.wikimedia.org/T322869
[4] https://phabricator.wikimedia.org/T323096
[5] https://en.wikipedia.org/wiki/Race_condition#In_software
[6] https://phabricator.wikimedia.org/T263110
--
*Guillaume Lederrey* (he/him)
Engineering Manager
Wikimedia Foundation <https://wikimediafoundation.org/>
Thanks to all mentors for their participation! Results are out - Wikimedia
got accepted as a mentoring organization in GSoC 2023. There is still a
month before the contributor application period begins (on March 20th), in
case you would like to propose more projects. All the finalized ideas are
published here: <https://www.mediawiki.org/wiki/Google_Summer_of_Code/2023>
[1]. For Outreachy, we have finalized two project ideas <
https://www.mediawiki.org/wiki/Outreachy/Round_26> [2]; contributors who
meet the eligibility criteria will be able to view the project details and
contribute from March 6th onwards.
Cheers,
Srishti, Sohom & Gopa (Wikimedia Org Admins)
[1] https://www.mediawiki.org/wiki/Google_Summer_of_Code/2023
[2] https://www.mediawiki.org/wiki/Outreachy/Round_26
*Srishti Sethi*
Senior Developer Advocate
Wikimedia Foundation <https://wikimediafoundation.org/>
On Tue, Feb 14, 2023 at 9:09 AM Susanna Ånäs <susanna.anas(a)gmail.com> wrote:
> Hi. I am reposting the Wikidocumentaries proposal as I somehow managed to
> send it to a separate thread, so here we go!
>
> Thank you for the opportunity, we are excited to join this round of GSoC
> with this Wikidocumentaries proposal!
>
> Wikidocumentaries[1] is a website that provides a language-independent way
> of browsing Wikimedia projects based on Wikidata items. It displays media
> from external repositories and integrates them as part of the pages. The
> idea is to allow the users to find relevant open content and contribute it
> to the Wikimedia projects by using the content for their purposes.
>
> So far, Wikidocumentaries has not been enabled for contributions to
> Wikimedia projects. The goal of the GSoC project is to establish the entire
> process for retrieving media from a given media repository related to the
> currently viewed topic in Wikidocumentaries, displaying it in
> Wikidocumentaries and uploading it to Wikimedia Commons, adding structured
> data statements to it.
>
> When this workflow has been completed, it will be possible to make
> available further features to match, enrich or organize the data. It is
> possible to expand the work to these areas, based on the interests of the
> intern.
>
> Tech: The UI code is created with Vue, and the API code is JavaScript. The
> work focuses on Structured Data on Commons, therefore understanding of the
> MediaWiki API, Wikidata and Structured Data on Commons is needed.
>
> * Mentors: TuukkaH, Susannaanas
> * Codebase: GitHub[2]
> * Phabricator: Ticket[3], Project[4], Microtasks[5]
> * Documentation website[6]
>
> Looking forward to tackling these issues together!
>
> Cheers
> Susanna & Tuukka
>
> [1] https://wikidocumentaries-demo.wmcloud.org/
> [2] https://github.com/Wikidocumentaries
> [3] https://phabricator.wikimedia.org/T329023
> [4] https://phabricator.wikimedia.org/tag/wikidocumentaries/
> [5] https://phabricator.wikimedia.org/T329256
> [6] https://wikidocumentaries.wmcloud.org/wiki/Main_Page
>
> la 14. tammik. 2023 klo 1.31 Srishti Sethi (ssethi(a)wikimedia.org)
> kirjoitti:
>
>> Hello everyone,
>>
>> TLDR; Wikimedia will soon be applying as a mentoring organization to *Google
>> Summer of Code 2023* <
>> https://www.mediawiki.org/wiki/Google_Summer_of_Code/2023> [1] and *Outreachy
>> Round 26* <https://www.mediawiki.org/wiki/Outreachy/Round_26> [2]. We
>> are currently working on a list of interesting project ideas to include in
>> the application. If you have some ideas for *coding or non-coding
>> (design, documentation, translation, outreach, research) projects*,
>> share them by* February 7th* here: <
>> https://phabricator.wikimedia.org/T326991> [3]. For non-coding projects
>> that can be promoted via Outreachy, there are only two available slots,
>> which will be allocated to mentors on a first-come, first-serve basis.
>>
>> *Timeline*
>> As a mentor, you will engage potential candidates in the application
>> period for both programs between March and April. You will help candidates
>> make small contributions to your project and answer any project-related
>> queries during this time. You will work more closely with the accepted
>> candidates during the coding period between May and August.
>>
>> *Tips for proposing projects*
>> * Follow this task description template when you propose a project in
>> Phabricator: <
>> https://phabricator.wikimedia.org/tag/outreach-programs-projects> [4].
>> You can also use this workboard to pick an idea if you don't have one
>> already. Add #Google- Summer-of-Code (2023) or #Outreachy (Round 26) tag.
>> * Project should require an experienced developer ~15 days and a newcomer
>> ~3 months to complete.
>> * Each project should have at least two mentors, including one with a
>> technical background.
>> * Ideally, the project has no tight deadlines, a moderate learning curve,
>> and fewer dependencies on Wikimedia's core infrastructure. Projects
>> addressing the needs of a language community are most welcome.
>>
>> Learn more about the roles and responsibilities of mentors on
>> MediaWiki.org: <https://www.mediawiki.org/wiki/Outreachy/Mentors> [5], <
>> https://www.mediawiki.org/wiki/Google_Summer_of_Code/Mentors> [6].
>>
>> Cheers,
>> Srishti
>>
>> [1] https://www.mediawiki.org/wiki/Google_Summer_of_Code/2023
>>
>> [2] https://www.mediawiki.org/wiki/Outreachy/Round_26
>>
>> [3] https://phabricator.wikimedia.org/T326991
>>
>> [4] https://phabricator.wikimedia.org/tag/outreach-programs-projects/
>>
>> [5] https://www.mediawiki.org/wiki/Outreachy/Mentors
>>
>> [6] https://www.mediawiki.org/wiki/Google_Summer_of_Code/Mentors
>>
>> *Srishti Sethi*
>> Senior Developer Advocate
>> Wikimedia Foundation <https://wikimediafoundation.org/>
>>
>> _______________________________________________
>> Wikitech-l mailing list -- wikitech-l(a)lists.wikimedia.org
>> To unsubscribe send an email to wikitech-l-leave(a)lists.wikimedia.org
>>
>> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
>
>
Hello everyone,
If you are interested in organizing or joining a hackathon event, but
cannot attend the in-person Hackathon event in May in Athens, Greece, this
email is for you!
We encourage communities, user groups or chapters to organize satellite
events connected to the in-person Hackathon. These events are to be
organized autonomously and share the hackathon's purpose: bringing the
global technical community together to connect, hack, run technical
discussions, and explore new ideas.
You can work with your wiki community to organize these events before,
during, or after the main event to onboard newcomers to the technical
aspects of the Wikimedia movement, hosting watch parties or meetups in your
region to offer an alternative to people who cannot join the in-person
event in Athens.
To obtain help with organizing an event, you can apply for funds via the *Rapid
Grants* maintained by the Wikimedia Foundation. The deadline to apply for
funding is *March 20*. When preparing for your event, you can reach out to
the Hackathon organizing team for support with resources, designing the
program, and guidance on getting involved in the global event.
Learn more about the satellite events, funding process, and a checklist for
organizing on the wiki page: <
https://www.mediawiki.org/wiki/Wikimedia_Hackathon_2023/Satellite_events>
[1]
Cheers,
Srishti
On behalf of the Hackathon organizing team
[1] https://www.mediawiki.org/wiki/Wikimedia_Hackathon_2023/Satellite_events
*Srishti Sethi*
Senior Developer Advocate
Wikimedia Foundation <https://wikimediafoundation.org/>
Hello everyone,
The next Research Showcase will be livestreamed next Wednesday, February 15
at 9:30AM PT / 17:30 UTC. The theme is The Free Knowledge Ecosystem.
YouTube stream: https://www.youtube.com/watch?v=8VJmR-3lTac
We welcome you to join the conversation on IRC at #wikimedia-research. You
can also watch our past research showcases:
https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase
This month's presentations:
The evolution of humanitarian mapping in OpenStreetMap (OSM) and how it
affects map completeness and inequalities in OSMBy *Benjamin Herfort,
Heidelberg Institute for Geoinformation Technology*Mapping efforts of
communities in OpenStreetMap (OSM) over the previous decade have created a
unique global geographic database, which is accessible to all with no
licensing costs. The collaborative maps of OSM have been used to support
humanitarian efforts around the world as well as to fill important data
gaps for implementing major development frameworks such as the Sustainable
Development Goals (SDGs). Besides the well-examined Global North - Global
South bias in OSM, the OSM data as of 2023 shows a much more spatially
diverse spread pattern than previously considered, which was shaped by
regional, socio-economic and demographic factors across several scales.
Humanitarian mapping efforts of the previous decade have already made OSM
more inclusive, contributing to diversify and expand the spatial footprint
of the areas mapped. However, methods to quantify and account for the
remaining biases in OSM’s coverage are needed so that researchers and
practitioners will be able to draw the right conclusions, e .g. about
progress towards the SDGs in cities.
Dataset reuseː Toward translating principles to practiceBy *Laura Koesten,
University of Vienna*The web provides access to millions of datasets. These
data can have additional impact when used beyond the context for which they
were originally created. But using a dataset beyond the context in which it
originated remains challenging. Simply making data available does not mean
it will be or can be easily used by others. At the same time, we have
little empirical insight into what makes a dataset reusable and which of
the existing guidelines and frameworks have an impact.In this talk, I will
discuss our research on what makes data reusable in practice. This is
informed by a synthesis of literature on the topic, our studies on how
people evaluate and make sense of data, and a case study on datasets on
GitHub. In the case study, we describe a corpus of more than 1.4 million
data files from over 65,000 repositories. Building on reuse features from
the literature, we use GitHub’s engagement metrics as proxies for dataset
reuse and devise an initial model, using deep neural networks, to predict a
dataset’s reusability. This demonstrates the practical gap between
principles and actionable insights that might allow data publishers and
tool designers to implement functionalities that facilitate reuse.
We hope you can join us!
Warm regards,
Emily
--
Emily Lescak (she / her)
Senior Research Community Officer
The Wikimedia Foundation