We will soon deploy some fixes for date parsing that especially affect
Czech and possibly other languages as well.
Wikidata’s parsing of dates in the Czech language has long been affected by
some issues (T221097 <https://phabricator.wikimedia.org/T221097>), where
some reasonable representations couldn’t be parsed (e.g. 01.02.2023), while
others were parsed incorrectly: for example, 11.12.2023 (11 December 2023)
was parsed as 12 November 2023, and 07.05.1997 (7 May 1997) bizarrely
became 30 June 1997.
Matěj Suchánek <https://www.wikidata.org/wiki/User:Mat%C4%9Bj_Such%C3%A1nek>
has investigated these errors and implemented a solution, which will be
deployed on February 15. As far as we can tell, all the changes it produces
are positive: that is, if the way a date is parsed changes, then the old
behavior was bad, and the change is an improvement. Nevertheless, it’s
possible that some users expected the old behavior, or that some external
programs might even be broken by the change. Users who add time data to
Wikidata should make sure that the date shown to them as a result of their
edit is correct. If you want to test the behavior changes, the new code is
already live on Beta Wikidata.
We are currently looking into other languages that may be affected as well.
If you have any questions or want to provide feedback please leave us a
comment on this ticket <https://phabricator.wikimedia.org/T221097>.
*Community Communications Manager, Wikidata*
Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin
Phone: +49 (0) 30 577 116 2466
Grab a spot in my calendar for a chat: calendly.com/masssly.
Keep up to date! Current news and exciting stories about Wikimedia,
Wikipedia and Free Knowledge in our newsletter (in German): Subscribe now
Imagine a world in which every single human being can freely share in the
sum of all knowledge. Help us to achieve our vision!
Wikimedia Deutschland – Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.
I'm happy to report that the WDQS reload <https://phabricator.wikimedia.org/T323096> is now complete. We believe the reload has eliminated the data discrepancies mentioned in the linked ticket. However, please let us know if this is not the case.
Thank you for your patience and have a great rest of your week!
SRE, Search Platform Team
TL;DR: We expect to successfully complete the recent data reload on
Wikidata Query Service soon, but we've encountered multiple failures
related to the size of the graph, and anticipate that this issue may worsen
in the future. Although we succeeded this time, we cannot guarantee that
future reload attempts will be successful given the current trend of the
data reload process. Thank you for your understanding and patience..
WDQS is updated from a stream of recent changes on Wikidata, with a maximum
delay of ~2 minutes. This process was improved as part of the WDQS
Streaming Updater project to ensure data coherence . However, the update
process is still imperfect and can lead to data inconsistencies in some
cases. To address this, we reload the data from dumps a few times per
year to reinitialize the system from a known good state.
The recent reload of data from dumps started in mid-December and was
initially met with some issues related to download and instabilities in
Blazegraph, the database used by WDQS. Loading the data into Blazegraph
takes a couple of weeks due to the size of the graph, and we had multiple
attempts where the reload failed after >90% of the data had been loaded.
Our understanding of the issue is that a "race condition" in Blazegraph,
where subtle timing changes lead to corruption of the journal in some rare
cases, is to blame.
We want to reassure you that the last reload job was successful on one of
our servers. The data still needs to be copied over to all of the WDQS
servers, which will take a couple of weeks, but should not bring any
additional issues. However, reloading the full data from dumps is becoming
more complex as the data size grows, and we wanted to let you know why the
process took longer than expected. We understand that data inconsistencies
can be problematic, and we appreciate your patience and understanding while
we work to ensure the quality and consistency of the data on WDQS.
Thank you for your continued support and understanding!
*Guillaume Lederrey* (he/him)
Wikimedia Foundation <https://wikimediafoundation.org/>
If you are interested in organizing or joining a hackathon event, but
cannot attend the in-person Hackathon event in May in Athens, Greece, this
email is for you!
We encourage communities, user groups or chapters to organize satellite
events connected to the in-person Hackathon. These events are to be
organized autonomously and share the hackathon's purpose: bringing the
global technical community together to connect, hack, run technical
discussions, and explore new ideas.
You can work with your wiki community to organize these events before,
during, or after the main event to onboard newcomers to the technical
aspects of the Wikimedia movement, hosting watch parties or meetups in your
region to offer an alternative to people who cannot join the in-person
event in Athens.
To obtain help with organizing an event, you can apply for funds via the *Rapid
Grants* maintained by the Wikimedia Foundation. The deadline to apply for
funding is *March 20*. When preparing for your event, you can reach out to
the Hackathon organizing team for support with resources, designing the
program, and guidance on getting involved in the global event.
Learn more about the satellite events, funding process, and a checklist for
organizing on the wiki page: <
On behalf of the Hackathon organizing team
Senior Developer Advocate
Wikimedia Foundation <https://wikimediafoundation.org/>
The next Research Showcase will be livestreamed next Wednesday, February 15
at 9:30AM PT / 17:30 UTC. The theme is The Free Knowledge Ecosystem.
YouTube stream: https://www.youtube.com/watch?v=8VJmR-3lTac
We welcome you to join the conversation on IRC at #wikimedia-research. You
can also watch our past research showcases:
This month's presentations:
The evolution of humanitarian mapping in OpenStreetMap (OSM) and how it
affects map completeness and inequalities in OSMBy *Benjamin Herfort,
Heidelberg Institute for Geoinformation Technology*Mapping efforts of
communities in OpenStreetMap (OSM) over the previous decade have created a
unique global geographic database, which is accessible to all with no
licensing costs. The collaborative maps of OSM have been used to support
humanitarian efforts around the world as well as to fill important data
gaps for implementing major development frameworks such as the Sustainable
Development Goals (SDGs). Besides the well-examined Global North - Global
South bias in OSM, the OSM data as of 2023 shows a much more spatially
diverse spread pattern than previously considered, which was shaped by
regional, socio-economic and demographic factors across several scales.
Humanitarian mapping efforts of the previous decade have already made OSM
more inclusive, contributing to diversify and expand the spatial footprint
of the areas mapped. However, methods to quantify and account for the
remaining biases in OSM’s coverage are needed so that researchers and
practitioners will be able to draw the right conclusions, e .g. about
progress towards the SDGs in cities.
Dataset reuseː Toward translating principles to practiceBy *Laura Koesten,
University of Vienna*The web provides access to millions of datasets. These
data can have additional impact when used beyond the context for which they
were originally created. But using a dataset beyond the context in which it
originated remains challenging. Simply making data available does not mean
it will be or can be easily used by others. At the same time, we have
little empirical insight into what makes a dataset reusable and which of
the existing guidelines and frameworks have an impact.In this talk, I will
discuss our research on what makes data reusable in practice. This is
informed by a synthesis of literature on the topic, our studies on how
people evaluate and make sense of data, and a case study on datasets on
GitHub. In the case study, we describe a corpus of more than 1.4 million
data files from over 65,000 repositories. Building on reuse features from
the literature, we use GitHub’s engagement metrics as proxies for dataset
reuse and devise an initial model, using deep neural networks, to predict a
dataset’s reusability. This demonstrates the practical gap between
principles and actionable insights that might allow data publishers and
tool designers to implement functionalities that facilitate reuse.
We hope you can join us!
Emily Lescak (she / her)
Senior Research Community Officer
The Wikimedia Foundation