Wikidata-tech

wikidata-tech@lists.wikimedia.org

621 discussions

ANN: A Go package providing utilities for processing Wikipedia and Wikidata dumps
by Mitar 09 Jan '22

09 Jan '22

Hi! I just published the first version of a Go package which provides utilities for processing Wikidata entities JSON dumps and Wikimedia Enterprise HTML dumps. It processes them in parallel on multiple cores, so processing is rather fast. I hope it will be useful to others, too. https://gitlab.com/tozd/go/mediawiki Any feedback is welcome. Mitar -- http://mitar.tnode.com/ https://twitter.com/mitar_m

1 0

The Wikibase 2022 Release for MediaWiki 1.36 is now available!
by Mohammed Sadat Abdulai 06 Jan '22

06 Jan '22

We hereby announce that the Wikibase 2022 Release for MediaWiki 1.36 has been published https://releases.wikimedia.org/wikibase/1.36 and on Docker Hub <https://hub.docker.com/u/wikibase>. Here are links to important documentation related to the release: - The full release notes <https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/Wikibas…> - Installation instructions <https://www.mediawiki.org/wiki/Wikibase/Suite> (manual installation) - Installation instructions <https://www.mediawiki.org/wiki/Wikibase/Docker> (Docker) - Updating your Wikibase <https://github.com/wmde/wikibase-release-pipeline/blob/main/docs/topics/upg…> (Docker) This new release is compatible with MediaWiki 1.36 and contains new Wikibase features: - WikibaseEdtf <https://github.com/ProfessionalWiki/WikibaseEdtf#wikibase-edtf> extension that allows to store date and time data following Extended Date/Time Format (EDTF) Specification <https://www.loc.gov/standards/datetime/> using a dedicated datatype. - SyntaxHighlight <https://www.mediawiki.org/wiki/Extension:SyntaxHighlight> extension that provides rich formatting of source code is added to wikibase-bundle container image - WDQS backend is updated to the most recent version 0.3.97 This release also requires ElasticSearch to be in use in order to have case-insensitive search of items and properties in Wikibase. If you have any questions feel free to ask on this mailing list or leave a comment at Talk:Wikibase/FAQ <https://www.mediawiki.org/wiki/Talk:Wikibase/FAQ>. -- Mohammed Sadat *Community Communications Manager for Wikidata/Wikibase* Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin Phone: +49 (0)30 219 158 26-0 https://wikimedia.de Keep up to date! Current news and exciting stories about Wikimedia, Wikipedia and Free Knowledge in our newsletter (in German): Subscribe now <https://www.wikimedia.de/newsletter/>. Imagine a world in which every single human being can freely share in the sum of all knowledge. Help us to achieve our vision! https://spenden.wikimedia.de Wikimedia Deutschland – Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.

1 0

Wikibase: 1.35.5 security release updates
by Mohammed Sadat Abdulai 17 Dec '21

17 Dec '21

Hello, We would like to announce the following security and maintenance updates to the Wikibase 1.35 container image, which include fixes to severe security issues in MediaWiki and instructions for disabling features in ElasticSearch to mitigate the recently discovered log4shell vulnerability <https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-44228>. Here are links to important documentation related to the release, which include instructions for updating MediaWiki to 1.35.5 and a security fix for Wikibase: - MediaWiki release notes <https://github.com/wikimedia/mediawiki/blob/REL1_35/RELEASE-NOTES-1.35> - Wikibase release notes <https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/Wikibas…> - Upgrade instructions <https://github.com/wmde/wikibase-release-pipeline/blob/main/docs/topics/upg…> If updating your Wikibase installation is not an option, please refer to these instructions on disabling the vulnerable code in MediaWiki in the recent security release announcement. <https://lists.wikimedia.org/hyperkitty/list/wikitech-l@lists.wikimedia.org/…> If you have any questions please feel free to ask on this mailing list or leave a comment at Talk:Wikibase/FAQ <https://www.mediawiki.org/wiki/Talk:Wikibase/FAQ>. Cheers, -- Mohammed Sadat *Community Communications Manager for Wikidata/Wikibase* Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin Phone: +49 (0)30 219 158 26-0 https://wikimedia.de Keep up to date! Current news and exciting stories about Wikimedia, Wikipedia and Free Knowledge in our newsletter (in German): Subscribe now <https://www.wikimedia.de/newsletter/>. Imagine a world in which every single human being can freely share in the sum of all knowledge. Help us to achieve our vision! https://spenden.wikimedia.de Wikimedia Deutschland – Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.

1 0

Follow-up on the new WDQS Updater
by Zbyszko Papierski 27 Oct '21

27 Oct '21

Hello! I’d like to provide a bit more background and summarize a bit our work on the new WDQS updater, from the technical perspective. It has been common knowledge that the old updater had its issues. Main ones among them: - Low throughput, that often caused huge spikes of lag that were very hard to get down from (this is a nice example - [1]). - Reliance on Blazegraph to reconcile the data - Blazegraph’s reads affects writes and vice versa, which quite often caused a cascading failure for both update latency and query performance. - Ineffective handling of eventual consistency - this was one of the reasons for missing data in WDQS. What is worse is that we had very low visibility of what goes missing. We’ll be publishing a series of blog posts that will provide a more in-depth description of the architecture and the challenges during development - stay tuned! In the meantime, I want to explain a few things that about the new updater: - Higher best case lag is the result of the decisions of trading low latency for high consistency - considering the data we lost with the old updater, we think this approach is better in our situation. We would rather have a complete data set than a faster incomplete one. To make sure that we’re keeping the lag manageable, we introduced an SLO [2] and will introduce alerting on the lag being under 10 minutes. - Data is reconciled within the pipeline, which has a dramatically lower effect on Blazegraph. This should help with the updates, which was the goal, but also positively affects query engine stability. - As we previously mentioned in the general announcement, the difference in throughput is substantial (10 edits/sec vs 88 edits/sec) - which means a much faster catch up and more room to grow for Wikidata. The new updater can be scaled even more if necessary. The new Streaming Updater didn’t resolve all the issues magically and there are still two main ones, that we need to address: - Data loss - while the reconciliation mechanism works better than with the old updater, we literally lost updates without any way of knowing about it, other than user feedback - [3] [4]. This is a really bad way of finding out about issues. The new Streaming Updater can still miss data, especially due to late events or eventual consistency, as mentioned before. One thing that changed, however, is that the new updater has better inconsistency/late event reporting, which allows us to build a subsystem around it to reconcile the data. More information here - [5]. - Blazegraph instability - no matter how fast and stable the new updater might be, Blazegraph is still the last node in the process. That means that the whole update process will be affected by Blazegraph’s instability and will in turn produce a lag. One of the most common reasons for that instability is a so-called “GC death spiral”. A server in that state won’t answer any queries (which is a problem in itself), but after restarting, the lag will be high for some time. We are investigating a solution that can help us with this - [6]. I hope that answers at least some of the concerns already raised. Rest assured that we are working on way more things to improve the experience than the updater, all of which is, as always, available to see on our backlog board ([7]) and workboard ([8]). Any and all feedback welcome! Regards, Zbyszko [1] https://grafana.wikimedia.org/d/000000489/wikidata-query-service?viewPanel=… [2] https://grafana-rw.wikimedia.org/d/yCBd7Tdnk/wdqs-lag-slo [3] https://phabricator.wikimedia.org/T272120 [4] https://phabricator.wikimedia.org/T291609 [5] https://phabricator.wikimedia.org/T279541 [6] https://phabricator.wikimedia.org/T293862 [7] https://phabricator.wikimedia.org/tag/wikidata-query-service/ [8] https://phabricator.wikimedia.org/project/view/1227/ -- Zbyszko Papierski (He/Him) Senior Software Engineer Wikimedia Foundation <https://wikimediafoundation.org/>

1 0

[BREAKING CHANGE] Blank node deprecation in WDQS & Wikibase RDF model
by Mike Pham 22 Oct '21

22 Oct '21

Hi all! This is a reminder / retroactive breaking change announcement for the Wikidata Query Service and Wikibase RDF model. The change was already announced <https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/me…>, and we invite you to review that announcement if you haven’t seen it before; but that email was quite a while ago, and we also didn’t send it to all of the usual channels, so we wanted to send this email to ensure everyone is aware of the change. The most important part: queries that tested for blank nodes using the standard isBlank() SPARQL function must now use wikibase:isSomeValue() instead. Best regards, — *Mike Pham* (he/him) Sr Product Manager, Search Wikimedia Foundation <https://wikimediafoundation.org/>

2 1

Nominate your favorite Wikidata projects for the WikidataCon community awards
by Léa Lacroix 22 Sep '21

22 Sep '21

Hello all, The Wikidata ecosystem is a huge galaxy of exciting content, tools, projects, powered by the communities as well as organizations working with the software and the data. For seven years, people are gathering, starting projects, developing tools, improving the editors' workflows, filling various gaps, working all together to give more people more access to more knowledge. In the frame of the WikidataCon 2021, we are organizing the *WikidataCon community awards <https://www.wikidata.org/wiki/Wikidata:WikidataCon_2021/Contribute/Communit…>* to celebrate the work of people and groups involved in Wikidata, and highlight some projects nominated by the community. Until *October 10th*, you can participate and *nominate one or several Wikidata-related projects* that you like, that are useful for you or for the community. Such a project can be for example: a community gathering or other initiative that led to great results (WikiProject, event, editathon…), a tool (gadget, script, external tool…) or any other action that led to improving Wikidata’s data, the workflow of its editors or the outreach. The nomination process is taking place publicly and collaboratively on this talk page <https://www.wikidata.org/wiki/Wikidata_talk:WikidataCon_2021/Contribute/Com…>. You can also help improving the description of projects that are already nominated. After October 10th, the awards committee will select a few projects that particularly caught their attention, and will present them during the Wikidata community awards ceremony taking place during the first day of the WikidataCon <https://www.wikidata.org/wiki/Wikidata:WikidataCon_2021/Program/Day_1_-_Mai…> . In order to reach out to the broader audience as possible, feel free to share this message on the talk pages, social media groups or other channels where you are active and people from various groups and experiences on Wikidata can participate. Thanks in advance for your help! For the Awards committee, Cheers, -- Léa Lacroix Community Engagement Coordinator Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.

1 0

[BREAKING CHANGE] Pagename/filename normalization on saving an edit
by Lucas Werkmeister 13 Sep '21

13 Sep '21

Hello, As you may know, Wikibase currently does not normalize pagenames/filenames on save (e.g. underscores in the input for properties of datatype Commons media are allowed). At the same time, Wikidata’s quality constraints extension <https://www.mediawiki.org/wiki/Extension:WikibaseQualityConstraints> triggers a constraint violation after saving, if underscores are used. This is by design as to long-established <https://www.wikidata.org/wiki/Template:Constraint:Commons_link> Community practices. As a result, this inconsistency leaves users with unnecessary manual work. We will update Wikibase so that when a new edit is saved via UI or API, and a pagename/filename is added or changed in that edit, then this pagename/filename will be normalized on save ("My file_name.jpg" -> "My file name.jpg"). More generally, the breaking change is that a user of the Wikibase API may send one data value when saving an edit, and get back a slightly different (normalized) data value after the edit was made: it is no longer the case that data values are either saved unmodified or totally rejected (e.g. if a file doesn’t exist on Commons). Since this guarantee is being removed with this breaking change announcement, we may introduce further normalizations in the future and only announce them as significant changes, not breaking changes. The change is currently available on test.wikidata.org and test-commons.wikimedia.org. It will be deployed on Wikidata on or shortly after September 6th. If you have any questions or feedback, please feel free to let us know in this ticket <https://phabricator.wikimedia.org/T251480>. Cheers, Lucas Werkmeister -- Lucas Werkmeister (he/er) Full Stack Developer Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin Phone: +49 (0)30 219 158 26-0 https://wikimedia.de Imagine a world in which every single human being can freely share in the sum of all knowledge. Help us to achieve our vision! https://spenden.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.

1 1

Technical Issues with Wikidata Query Service
by Zbyszko Papierski 03 Sep '21

03 Sep '21

Hi all, At around 1PM UTC today (Sep 3) we started experiencing stability issues with WDQS, localized (at least at the moment) to a single, of two, datacenter. Unfortunately, we haven't been able to pinpoint the issue as of now. We suspect that someone is running a query that affects Blazegraph - that happened a few times in the past. Unfortunately, our usual tactics did help us to find which one. We are working on identifying the issue, but it's clear that this could in a few hours bring the service down, so we are working on a quick workaround. Since we observed the issue is only causing actual service failures after ~2h after restart, for now we are going to introduce a procedure that will restart servers randomly, so that uptime for each will be at max around 1h. Only one server should be restarted at any given time. This will cause some queries to be killed, when each of the servers is restarted, but the alternative is worse. We'll continue to work to find the root cause and will inform you of all of our progress. We will also post our progress here: [1]. Regards, Zbyszko Papierski [1] https://phabricator.wikimedia.org/T290330 -- Zbyszko Papierski (He/Him) Senior Software Engineer Wikimedia Foundation <https://wikimediafoundation.org/>

1 0

Open tech position
by Rossenova, Lozana 26 Aug '21

26 Aug '21

Dear all, [Apologies for cross-posting] I'm posting here because there is an open devOps position at the Open Science Lab in TIB Hannover where I work, and it might be of interest to people on this list. >> https://www.tib.eu/en/tib/careers-and-apprenticeships/vacancies/details/ job-advertisement-no-62-2021 <https://www.tib.eu/en/tib/careers-and-apprenticeships/vacancies/details /job-advertisement-no-62-2021> We are looking for someone with experience in OSS / Mediawiki / Wikibase software (ideally) hence I'm posting here. Please feel free to spread the word if you know anyone who might be interested and feel free to reach out to me directly at lozana.rossenova(a)tib.eu <mailto:lozana.rossenova@tib.eu> if you have any questions and want to learn more. The position is in Germany, but remote work is also possible. Cheers, Lozana Rossenova -- Research Associate Open Science Lab

1 0

[Breaking change] Languages of entity stubs in RDF output
by Mohammed Sadat Abdulai 24 Aug '21

24 Aug '21

This breaking change is relevant for anyone who consumes Wikidata RDF data through Special:EntityData (rather than the dumps) without using the “dump” flavor. When an Item references other entities (e.g. the statement P31:Q5), the non-dump (?flavor=dump) RDF output of that Item would include the labels and descriptions of the referenced entities (e.g. P31 and Q5) in all languages. That bloats the output drastically and causes performance issues. See Special:EntityData/Q1337.rdf <https://www.wikidata.org/wiki/Special:EntityData/Q1337.rdf> as an example. We will change this so that for referenced entities, only labels and descriptions in the request language (set e.g. via ?uselang=) and its fallback languages are included in the response. For the main entity being requested, labels, descriptions and aliases are still included in all languages available, of course. If you don’t actually need this “stub” data of referenced entities at all, and are only interested in data about the main entity being requested, we encourage you to use the “dump” flavor instead (include flavor=dump in the URL parameters). In that case, this change will not affect you at all, since the dump flavor includes no stub data, regardless of language. This change is currently available for testing at test.wikidata.org. It will be deployed on Wikidata on August 23rd. You are welcome to give us general feedback by leaving a comment in this ticket <https://phabricator.wikimedia.org/T285795>. If you have any questions please do not hesitate to ask. Cheers, -- Mohammed Sadat *Community Communications Manager for Wikidata/Wikibase* Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin Phone: +49 (0)30 219 158 26-0 https://wikimedia.de Keep up to date! Current news and exciting stories about Wikimedia, Wikipedia and Free Knowledge in our newsletter (in German): Subscribe now <https://www.wikimedia.de/newsletter/>. Imagine a world in which every single human being can freely share in the sum of all knowledge. Help us to achieve our vision! https://spenden.wikimedia.de Wikimedia Deutschland – Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.

2 1

← Newer
1
2
3
4
5
6
7
...
63
Older →

Jump to page:

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

Wikidata-tech