Hello!
We will soon deploy some fixes for date parsing that especially affect
Czech and possibly other languages as well.
Wikidata’s parsing of dates in the Czech language has long been affected by
some issues (T221097 <https://phabricator.wikimedia.org/T221097>), where
some reasonable representations couldn’t be parsed (e.g. 01.02.2023), while
others were parsed incorrectly: for example, 11.12.2023 (11 December 2023)
was parsed as 12 November 2023, and 07.05.1997 (7 May 1997) bizarrely
became 30 June 1997.
Matěj Suchánek <https://www.wikidata.org/wiki/User:Mat%C4%9Bj_Such%C3%A1nek>
has investigated these errors and implemented a solution, which will be
deployed on February 15. As far as we can tell, all the changes it produces
are positive: that is, if the way a date is parsed changes, then the old
behavior was bad, and the change is an improvement. Nevertheless, it’s
possible that some users expected the old behavior, or that some external
programs might even be broken by the change. Users who add time data to
Wikidata should make sure that the date shown to them as a result of their
edit is correct. If you want to test the behavior changes, the new code is
already live on Beta Wikidata.
We are currently looking into other languages that may be affected as well.
If you have any questions or want to provide feedback please leave us a
comment on this ticket <https://phabricator.wikimedia.org/T221097>.
Cheers,
--
Mohammed Sadat
*Community Communications Manager, Wikidata*
Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin
Phone: +49 (0) 30 577 116 2466
https://wikimedia.de
Grab a spot in my calendar for a chat: calendly.com/masssly.
Keep up to date! Current news and exciting stories about Wikimedia,
Wikipedia and Free Knowledge in our newsletter (in German): Subscribe now
<https://www.wikimedia.de/newsletter/>.
Imagine a world in which every single human being can freely share in the
sum of all knowledge. Help us to achieve our vision!
https://spenden.wikimedia.de
Wikimedia Deutschland – Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.
Hello all!
The Search Platform Team usually holds an open meeting on the first
Wednesday of each month. Come talk to us about anything related to
Wikimedia search, Wikidata Query Service (WDQS), Wikimedia Commons Query
Service (WCQS), etc.!
Feel free to add your items to the Etherpad Agenda for the next meeting.
Details for our next meeting:
Date: Wednesday, March 1st, 2023
Time: 16:00-17:00 UTC / 08:00 PST / 11:00 EST / 17:00 CET
Etherpad: https://etherpad.wikimedia.org/p/Search_Platform_Office_Hours
Google Meet link: https://meet.google.com/vgj-bbeb-uyi
Join by phone: https://tel.meet/vgj-bbeb-uyi?pin=8118110806927
Have fun and see you soon!
Guillaume
--
*Guillaume Lederrey* (he/him)
Engineering Manager
Wikimedia Foundation <https://wikimediafoundation.org/>
Hello Wikidata!
I'm happy to report that the WDQS reload <https://phabricator.wikimedia.org/T323096> is now complete. We believe the reload has eliminated the data discrepancies mentioned in the linked ticket. However, please let us know if this is not the case.
Thank you for your patience and have a great rest of your week!
Best,
Brian King
SRE, Search Platform Team
Wikimedia Foundation
IRC: inflatador
Hello all!
TL;DR: We expect to successfully complete the recent data reload on
Wikidata Query Service soon, but we've encountered multiple failures
related to the size of the graph, and anticipate that this issue may worsen
in the future. Although we succeeded this time, we cannot guarantee that
future reload attempts will be successful given the current trend of the
data reload process. Thank you for your understanding and patience..
Longer version:
WDQS is updated from a stream of recent changes on Wikidata, with a maximum
delay of ~2 minutes. This process was improved as part of the WDQS
Streaming Updater project to ensure data coherence[1] . However, the update
process is still imperfect and can lead to data inconsistencies in some
cases[2][3]. To address this, we reload the data from dumps a few times per
year to reinitialize the system from a known good state.
The recent reload of data from dumps started in mid-December and was
initially met with some issues related to download and instabilities in
Blazegraph, the database used by WDQS[4]. Loading the data into Blazegraph
takes a couple of weeks due to the size of the graph, and we had multiple
attempts where the reload failed after >90% of the data had been loaded.
Our understanding of the issue is that a "race condition" in Blazegraph[5],
where subtle timing changes lead to corruption of the journal in some rare
cases, is to blame.[6]
We want to reassure you that the last reload job was successful on one of
our servers. The data still needs to be copied over to all of the WDQS
servers, which will take a couple of weeks, but should not bring any
additional issues. However, reloading the full data from dumps is becoming
more complex as the data size grows, and we wanted to let you know why the
process took longer than expected. We understand that data inconsistencies
can be problematic, and we appreciate your patience and understanding while
we work to ensure the quality and consistency of the data on WDQS.
Thank you for your continued support and understanding!
Guillaume
[1] https://phabricator.wikimedia.org/T244590
[2] https://phabricator.wikimedia.org/T323239
[3] https://phabricator.wikimedia.org/T322869
[4] https://phabricator.wikimedia.org/T323096
[5] https://en.wikipedia.org/wiki/Race_condition#In_software
[6] https://phabricator.wikimedia.org/T263110
--
*Guillaume Lederrey* (he/him)
Engineering Manager
Wikimedia Foundation <https://wikimediafoundation.org/>
Thanks to all mentors for their participation! Results are out - Wikimedia
got accepted as a mentoring organization in GSoC 2023. There is still a
month before the contributor application period begins (on March 20th), in
case you would like to propose more projects. All the finalized ideas are
published here: <https://www.mediawiki.org/wiki/Google_Summer_of_Code/2023>
[1]. For Outreachy, we have finalized two project ideas <
https://www.mediawiki.org/wiki/Outreachy/Round_26> [2]; contributors who
meet the eligibility criteria will be able to view the project details and
contribute from March 6th onwards.
Cheers,
Srishti, Sohom & Gopa (Wikimedia Org Admins)
[1] https://www.mediawiki.org/wiki/Google_Summer_of_Code/2023
[2] https://www.mediawiki.org/wiki/Outreachy/Round_26
*Srishti Sethi*
Senior Developer Advocate
Wikimedia Foundation <https://wikimediafoundation.org/>
On Tue, Feb 14, 2023 at 9:09 AM Susanna Ånäs <susanna.anas(a)gmail.com> wrote:
> Hi. I am reposting the Wikidocumentaries proposal as I somehow managed to
> send it to a separate thread, so here we go!
>
> Thank you for the opportunity, we are excited to join this round of GSoC
> with this Wikidocumentaries proposal!
>
> Wikidocumentaries[1] is a website that provides a language-independent way
> of browsing Wikimedia projects based on Wikidata items. It displays media
> from external repositories and integrates them as part of the pages. The
> idea is to allow the users to find relevant open content and contribute it
> to the Wikimedia projects by using the content for their purposes.
>
> So far, Wikidocumentaries has not been enabled for contributions to
> Wikimedia projects. The goal of the GSoC project is to establish the entire
> process for retrieving media from a given media repository related to the
> currently viewed topic in Wikidocumentaries, displaying it in
> Wikidocumentaries and uploading it to Wikimedia Commons, adding structured
> data statements to it.
>
> When this workflow has been completed, it will be possible to make
> available further features to match, enrich or organize the data. It is
> possible to expand the work to these areas, based on the interests of the
> intern.
>
> Tech: The UI code is created with Vue, and the API code is JavaScript. The
> work focuses on Structured Data on Commons, therefore understanding of the
> MediaWiki API, Wikidata and Structured Data on Commons is needed.
>
> * Mentors: TuukkaH, Susannaanas
> * Codebase: GitHub[2]
> * Phabricator: Ticket[3], Project[4], Microtasks[5]
> * Documentation website[6]
>
> Looking forward to tackling these issues together!
>
> Cheers
> Susanna & Tuukka
>
> [1] https://wikidocumentaries-demo.wmcloud.org/
> [2] https://github.com/Wikidocumentaries
> [3] https://phabricator.wikimedia.org/T329023
> [4] https://phabricator.wikimedia.org/tag/wikidocumentaries/
> [5] https://phabricator.wikimedia.org/T329256
> [6] https://wikidocumentaries.wmcloud.org/wiki/Main_Page
>
> la 14. tammik. 2023 klo 1.31 Srishti Sethi (ssethi(a)wikimedia.org)
> kirjoitti:
>
>> Hello everyone,
>>
>> TLDR; Wikimedia will soon be applying as a mentoring organization to *Google
>> Summer of Code 2023* <
>> https://www.mediawiki.org/wiki/Google_Summer_of_Code/2023> [1] and *Outreachy
>> Round 26* <https://www.mediawiki.org/wiki/Outreachy/Round_26> [2]. We
>> are currently working on a list of interesting project ideas to include in
>> the application. If you have some ideas for *coding or non-coding
>> (design, documentation, translation, outreach, research) projects*,
>> share them by* February 7th* here: <
>> https://phabricator.wikimedia.org/T326991> [3]. For non-coding projects
>> that can be promoted via Outreachy, there are only two available slots,
>> which will be allocated to mentors on a first-come, first-serve basis.
>>
>> *Timeline*
>> As a mentor, you will engage potential candidates in the application
>> period for both programs between March and April. You will help candidates
>> make small contributions to your project and answer any project-related
>> queries during this time. You will work more closely with the accepted
>> candidates during the coding period between May and August.
>>
>> *Tips for proposing projects*
>> * Follow this task description template when you propose a project in
>> Phabricator: <
>> https://phabricator.wikimedia.org/tag/outreach-programs-projects> [4].
>> You can also use this workboard to pick an idea if you don't have one
>> already. Add #Google- Summer-of-Code (2023) or #Outreachy (Round 26) tag.
>> * Project should require an experienced developer ~15 days and a newcomer
>> ~3 months to complete.
>> * Each project should have at least two mentors, including one with a
>> technical background.
>> * Ideally, the project has no tight deadlines, a moderate learning curve,
>> and fewer dependencies on Wikimedia's core infrastructure. Projects
>> addressing the needs of a language community are most welcome.
>>
>> Learn more about the roles and responsibilities of mentors on
>> MediaWiki.org: <https://www.mediawiki.org/wiki/Outreachy/Mentors> [5], <
>> https://www.mediawiki.org/wiki/Google_Summer_of_Code/Mentors> [6].
>>
>> Cheers,
>> Srishti
>>
>> [1] https://www.mediawiki.org/wiki/Google_Summer_of_Code/2023
>>
>> [2] https://www.mediawiki.org/wiki/Outreachy/Round_26
>>
>> [3] https://phabricator.wikimedia.org/T326991
>>
>> [4] https://phabricator.wikimedia.org/tag/outreach-programs-projects/
>>
>> [5] https://www.mediawiki.org/wiki/Outreachy/Mentors
>>
>> [6] https://www.mediawiki.org/wiki/Google_Summer_of_Code/Mentors
>>
>> *Srishti Sethi*
>> Senior Developer Advocate
>> Wikimedia Foundation <https://wikimediafoundation.org/>
>>
>> _______________________________________________
>> Wikitech-l mailing list -- wikitech-l(a)lists.wikimedia.org
>> To unsubscribe send an email to wikitech-l-leave(a)lists.wikimedia.org
>>
>> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
>
>
Hello everyone,
If you are interested in organizing or joining a hackathon event, but
cannot attend the in-person Hackathon event in May in Athens, Greece, this
email is for you!
We encourage communities, user groups or chapters to organize satellite
events connected to the in-person Hackathon. These events are to be
organized autonomously and share the hackathon's purpose: bringing the
global technical community together to connect, hack, run technical
discussions, and explore new ideas.
You can work with your wiki community to organize these events before,
during, or after the main event to onboard newcomers to the technical
aspects of the Wikimedia movement, hosting watch parties or meetups in your
region to offer an alternative to people who cannot join the in-person
event in Athens.
To obtain help with organizing an event, you can apply for funds via the *Rapid
Grants* maintained by the Wikimedia Foundation. The deadline to apply for
funding is *March 20*. When preparing for your event, you can reach out to
the Hackathon organizing team for support with resources, designing the
program, and guidance on getting involved in the global event.
Learn more about the satellite events, funding process, and a checklist for
organizing on the wiki page: <
https://www.mediawiki.org/wiki/Wikimedia_Hackathon_2023/Satellite_events>
[1]
Cheers,
Srishti
On behalf of the Hackathon organizing team
[1] https://www.mediawiki.org/wiki/Wikimedia_Hackathon_2023/Satellite_events
*Srishti Sethi*
Senior Developer Advocate
Wikimedia Foundation <https://wikimediafoundation.org/>
Hello everyone,
The next Research Showcase will be livestreamed next Wednesday, February 15
at 9:30AM PT / 17:30 UTC. The theme is The Free Knowledge Ecosystem.
YouTube stream: https://www.youtube.com/watch?v=8VJmR-3lTac
We welcome you to join the conversation on IRC at #wikimedia-research. You
can also watch our past research showcases:
https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase
This month's presentations:
The evolution of humanitarian mapping in OpenStreetMap (OSM) and how it
affects map completeness and inequalities in OSMBy *Benjamin Herfort,
Heidelberg Institute for Geoinformation Technology*Mapping efforts of
communities in OpenStreetMap (OSM) over the previous decade have created a
unique global geographic database, which is accessible to all with no
licensing costs. The collaborative maps of OSM have been used to support
humanitarian efforts around the world as well as to fill important data
gaps for implementing major development frameworks such as the Sustainable
Development Goals (SDGs). Besides the well-examined Global North - Global
South bias in OSM, the OSM data as of 2023 shows a much more spatially
diverse spread pattern than previously considered, which was shaped by
regional, socio-economic and demographic factors across several scales.
Humanitarian mapping efforts of the previous decade have already made OSM
more inclusive, contributing to diversify and expand the spatial footprint
of the areas mapped. However, methods to quantify and account for the
remaining biases in OSM’s coverage are needed so that researchers and
practitioners will be able to draw the right conclusions, e .g. about
progress towards the SDGs in cities.
Dataset reuseː Toward translating principles to practiceBy *Laura Koesten,
University of Vienna*The web provides access to millions of datasets. These
data can have additional impact when used beyond the context for which they
were originally created. But using a dataset beyond the context in which it
originated remains challenging. Simply making data available does not mean
it will be or can be easily used by others. At the same time, we have
little empirical insight into what makes a dataset reusable and which of
the existing guidelines and frameworks have an impact.In this talk, I will
discuss our research on what makes data reusable in practice. This is
informed by a synthesis of literature on the topic, our studies on how
people evaluate and make sense of data, and a case study on datasets on
GitHub. In the case study, we describe a corpus of more than 1.4 million
data files from over 65,000 repositories. Building on reuse features from
the literature, we use GitHub’s engagement metrics as proxies for dataset
reuse and devise an initial model, using deep neural networks, to predict a
dataset’s reusability. This demonstrates the practical gap between
principles and actionable insights that might allow data publishers and
tool designers to implement functionalities that facilitate reuse.
We hope you can join us!
Warm regards,
Emily
--
Emily Lescak (she / her)
Senior Research Community Officer
The Wikimedia Foundation