Hello,
We would like to announce the following security and maintenance updates to
the Wikibase 1.35 container image, which include fixes to severe security
issues in MediaWiki and instructions for disabling features in
ElasticSearch to mitigate the recently discovered log4shell vulnerability
<https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-44228>.
Here are links to important documentation related to the release, which
include instructions for updating MediaWiki to 1.35.5 and a security fix
for Wikibase:
-
MediaWiki release notes
<https://github.com/wikimedia/mediawiki/blob/REL1_35/RELEASE-NOTES-1.35>
-
Wikibase release notes
<https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/Wikibas…>
-
Upgrade instructions
<https://github.com/wmde/wikibase-release-pipeline/blob/main/docs/topics/upg…>
If updating your Wikibase installation is not an option, please refer to
these instructions on disabling the vulnerable code in MediaWiki in the
recent security release announcement.
<https://lists.wikimedia.org/hyperkitty/list/wikitech-l@lists.wikimedia.org/…>
If you have any questions please feel free to ask on this mailing list or
leave a comment at Talk:Wikibase/FAQ
<https://www.mediawiki.org/wiki/Talk:Wikibase/FAQ>.
Cheers,
--
Mohammed Sadat
*Community Communications Manager for Wikidata/Wikibase*
Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin
Phone: +49 (0)30 219 158 26-0
https://wikimedia.de
Keep up to date! Current news and exciting stories about Wikimedia,
Wikipedia and Free Knowledge in our newsletter (in German): Subscribe now
<https://www.wikimedia.de/newsletter/>.
Imagine a world in which every single human being can freely share in the
sum of all knowledge. Help us to achieve our vision!
https://spenden.wikimedia.de
Wikimedia Deutschland – Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.
Hello!
I’d like to provide a bit more background and summarize a bit our work on
the new WDQS updater, from the technical perspective.
It has been common knowledge that the old updater had its issues. Main ones
among them:
-
Low throughput, that often caused huge spikes of lag that were very hard
to get down from (this is a nice example - [1]).
-
Reliance on Blazegraph to reconcile the data - Blazegraph’s reads
affects writes and vice versa, which quite often caused a cascading failure
for both update latency and query performance.
-
Ineffective handling of eventual consistency - this was one of the
reasons for missing data in WDQS. What is worse is that we had very low
visibility of what goes missing.
We’ll be publishing a series of blog posts that will provide a more
in-depth description of the architecture and the challenges during
development - stay tuned!
In the meantime, I want to explain a few things that about the new updater:
-
Higher best case lag is the result of the decisions of trading low
latency for high consistency - considering the data we lost with the old
updater, we think this approach is better in our situation. We would rather
have a complete data set than a faster incomplete one. To make sure that
we’re keeping the lag manageable, we introduced an SLO [2] and will
introduce alerting on the lag being under 10 minutes.
-
Data is reconciled within the pipeline, which has a dramatically lower
effect on Blazegraph. This should help with the updates, which was the
goal, but also positively affects query engine stability.
-
As we previously mentioned in the general announcement, the difference
in throughput is substantial (10 edits/sec vs 88 edits/sec) - which means
a much faster catch up and more room to grow for Wikidata. The new updater
can be scaled even more if necessary.
The new Streaming Updater didn’t resolve all the issues magically and there
are still two main ones, that we need to address:
-
Data loss - while the reconciliation mechanism works better than with
the old updater, we literally lost updates without any way of knowing about
it, other than user feedback - [3] [4]. This is a really bad way of finding
out about issues. The new Streaming Updater can still miss data, especially
due to late events or eventual consistency, as mentioned before. One thing
that changed, however, is that the new updater has better
inconsistency/late event reporting, which allows us to build a subsystem
around it to reconcile the data. More information here - [5].
-
Blazegraph instability - no matter how fast and stable the new updater
might be, Blazegraph is still the last node in the process. That means that
the whole update process will be affected by Blazegraph’s instability and
will in turn produce a lag. One of the most common reasons for that
instability is a so-called “GC death spiral”. A server in that state won’t
answer any queries (which is a problem in itself), but after restarting,
the lag will be high for some time. We are investigating a solution that
can help us with this - [6].
I hope that answers at least some of the concerns already raised. Rest
assured that we are working on way more things to improve the experience
than the updater, all of which is, as always, available to see on our
backlog board ([7]) and workboard ([8]).
Any and all feedback welcome!
Regards,
Zbyszko
[1]
https://grafana.wikimedia.org/d/000000489/wikidata-query-service?viewPanel=…
[2] https://grafana-rw.wikimedia.org/d/yCBd7Tdnk/wdqs-lag-slo
[3] https://phabricator.wikimedia.org/T272120
[4] https://phabricator.wikimedia.org/T291609
[5] https://phabricator.wikimedia.org/T279541
[6] https://phabricator.wikimedia.org/T293862
[7] https://phabricator.wikimedia.org/tag/wikidata-query-service/
[8] https://phabricator.wikimedia.org/project/view/1227/
--
Zbyszko Papierski (He/Him)
Senior Software Engineer
Wikimedia Foundation <https://wikimediafoundation.org/>
Hi all!
This is a reminder / retroactive breaking change announcement for the
Wikidata Query Service and Wikibase RDF model.
The change was already announced
<https://lists.wikimedia.org/hyperkitty/list/wikidata@lists.wikimedia.org/me…>,
and we invite you to review that announcement if you haven’t seen it
before; but that email was quite a while ago, and we also didn’t send it to
all of the usual channels, so we wanted to send this email to ensure
everyone is aware of the change.
The most important part: queries that tested for blank nodes using the
standard isBlank() SPARQL function must now use wikibase:isSomeValue()
instead.
Best regards,
—
*Mike Pham* (he/him)
Sr Product Manager, Search
Wikimedia Foundation <https://wikimediafoundation.org/>
Hello all,
The Wikidata ecosystem is a huge galaxy of exciting content, tools,
projects, powered by the communities as well as organizations working with
the software and the data. For seven years, people are gathering, starting
projects, developing tools, improving the editors' workflows, filling
various gaps, working all together to give more people more access to more
knowledge.
In the frame of the WikidataCon 2021, we are organizing the *WikidataCon
community awards
<https://www.wikidata.org/wiki/Wikidata:WikidataCon_2021/Contribute/Communit…>*
to
celebrate the work of people and groups involved in Wikidata, and highlight
some projects nominated by the community.
Until *October 10th*, you can participate and *nominate one or several
Wikidata-related projects* that you like, that are useful for you or for
the community. Such a project can be for example: a community gathering or
other initiative that led to great results (WikiProject, event,
editathon…), a tool (gadget, script, external tool…) or any other action
that led to improving Wikidata’s data, the workflow of its editors or the
outreach.
The nomination process is taking place publicly and collaboratively on this
talk page
<https://www.wikidata.org/wiki/Wikidata_talk:WikidataCon_2021/Contribute/Com…>.
You can also help improving the description of projects that are already
nominated. After October 10th, the awards committee will select a few
projects that particularly caught their attention, and will present them
during the Wikidata community awards ceremony taking place during the first
day of the WikidataCon
<https://www.wikidata.org/wiki/Wikidata:WikidataCon_2021/Program/Day_1_-_Mai…>
.
In order to reach out to the broader audience as possible, feel free to
share this message on the talk pages, social media groups or other channels
where you are active and people from various groups and experiences on
Wikidata can participate. Thanks in advance for your help!
For the Awards committee,
Cheers,
--
Léa Lacroix
Community Engagement Coordinator
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.
Hello,
As you may know, Wikibase currently does not normalize pagenames/filenames
on save (e.g. underscores in the input for properties of datatype Commons
media are allowed). At the same time, Wikidata’s quality constraints
extension
<https://www.mediawiki.org/wiki/Extension:WikibaseQualityConstraints>
triggers a constraint violation after saving, if underscores are used. This
is by design as to long-established
<https://www.wikidata.org/wiki/Template:Constraint:Commons_link> Community
practices. As a result, this inconsistency leaves users with unnecessary
manual work.
We will update Wikibase so that when a new edit is saved via UI or API, and
a pagename/filename is added or changed in that edit, then this
pagename/filename will be normalized on save ("My file_name.jpg" -> "My
file name.jpg").
More generally, the breaking change is that a user of the Wikibase API may
send one data value when saving an edit, and get back a slightly different
(normalized) data value after the edit was made: it is no longer the case
that data values are either saved unmodified or totally rejected (e.g. if a
file doesn’t exist on Commons). Since this guarantee is being removed with
this breaking change announcement, we may introduce further normalizations
in the future and only announce them as significant changes, not breaking
changes.
The change is currently available on test.wikidata.org and
test-commons.wikimedia.org. It will be deployed on Wikidata on or shortly
after September 6th. If you have any questions or feedback, please feel
free to let us know in this ticket
<https://phabricator.wikimedia.org/T251480>.
Cheers,
Lucas Werkmeister
--
Lucas Werkmeister (he/er)
Full Stack Developer
Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin
Phone: +49 (0)30 219 158 26-0
https://wikimedia.de
Imagine a world in which every single human being can freely share in the
sum of all knowledge. Help us to achieve our vision!
https://spenden.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.
Hi all,
At around 1PM UTC today (Sep 3) we started experiencing stability issues
with WDQS, localized (at least at the moment) to a single, of two,
datacenter. Unfortunately, we haven't been able to pinpoint the issue as of
now. We suspect that someone is running a query that affects Blazegraph -
that happened a few times in the past. Unfortunately, our usual tactics did
help us to find which one.
We are working on identifying the issue, but it's clear that this could in
a few hours bring the service down, so we are working on a quick
workaround. Since we observed the issue is only causing actual service
failures after ~2h after restart, for now we are going to introduce a
procedure that will restart servers randomly, so that uptime for each will
be at max around 1h. Only one server should be restarted at any given time.
This will cause some queries to be killed, when each of the servers is
restarted, but the alternative is worse.
We'll continue to work to find the root cause and will inform you of all of
our progress. We will also post our progress here: [1].
Regards,
Zbyszko Papierski
[1] https://phabricator.wikimedia.org/T290330
--
Zbyszko Papierski (He/Him)
Senior Software Engineer
Wikimedia Foundation <https://wikimediafoundation.org/>
Dear all,
[Apologies for cross-posting]
I'm posting here because there is an open devOps position at the Open
Science Lab in TIB Hannover where I work, and it might be of interest to
people on this list.
>>
https://www.tib.eu/en/tib/careers-and-apprenticeships/vacancies/details/
job-advertisement-no-62-2021
<https://www.tib.eu/en/tib/careers-and-apprenticeships/vacancies/details
/job-advertisement-no-62-2021>
We are looking for someone with experience in OSS / Mediawiki / Wikibase
software (ideally) hence I'm posting here. Please feel free to spread
the word if you know anyone who might be interested and feel free to
reach out to me directly at lozana.rossenova(a)tib.eu
<mailto:lozana.rossenova@tib.eu> if you have any questions and want to
learn more.
The position is in Germany, but remote work is also possible.
Cheers,
Lozana Rossenova
--
Research Associate
Open Science Lab
This breaking change is relevant for anyone who consumes Wikidata RDF data
through Special:EntityData (rather than the dumps) without using the “dump”
flavor.
When an Item references other entities (e.g. the statement P31:Q5), the
non-dump (?flavor=dump) RDF output of that Item would include the labels
and descriptions of the referenced entities (e.g. P31 and Q5) in all
languages. That bloats the output drastically and causes performance
issues. See Special:EntityData/Q1337.rdf
<https://www.wikidata.org/wiki/Special:EntityData/Q1337.rdf> as an example.
We will change this so that for referenced entities, only labels and
descriptions in the request language (set e.g. via ?uselang=) and its
fallback languages are included in the response. For the main entity being
requested, labels, descriptions and aliases are still included in all
languages available, of course.
If you don’t actually need this “stub” data of referenced entities at all,
and are only interested in data about the main entity being requested, we
encourage you to use the “dump” flavor instead (include flavor=dump in the
URL parameters). In that case, this change will not affect you at all,
since the dump flavor includes no stub data, regardless of language.
This change is currently available for testing at test.wikidata.org. It
will be deployed on Wikidata on August 23rd. You are welcome to give us
general feedback by leaving a comment in this ticket
<https://phabricator.wikimedia.org/T285795>.
If you have any questions please do not hesitate to ask.
Cheers,
--
Mohammed Sadat
*Community Communications Manager for Wikidata/Wikibase*
Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin
Phone: +49 (0)30 219 158 26-0
https://wikimedia.de
Keep up to date! Current news and exciting stories about Wikimedia,
Wikipedia and Free Knowledge in our newsletter (in German): Subscribe now
<https://www.wikimedia.de/newsletter/>.
Imagine a world in which every single human being can freely share in the
sum of all knowledge. Help us to achieve our vision!
https://spenden.wikimedia.de
Wikimedia Deutschland – Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.
I thought that "," comma was being added to the Elasticsearch token filter
as a stopword and excluded from simple search now?
Or did I miss something?
[image: image.png]
Or NO and U+002C comma was decided against being added, and we must use the
Advanced Search on Wikidata or the API ?
I noticed that the string "foot locker inc" will not show the entity in the
dropdown, but only "foot locker, inc." ?
(I've since added the full legal name into the alias to improve
searchability, but still would like to know the stopword decision)
Thad
https://www.linkedin.com/in/thadguidry/https://calendly.com/thadguidry/
Hi everyone,
This change is relevant for everyone who uses the JSON serialization of
Lexeme, Form or Sense entities.
Currently, when a Lexeme Form or Sense has no statements stored, it is
rendered as an empty array [] in JSON. (Example <
https://www.wikidata.org/w/api.php?action=wbgetentities&format=json&ids=L38…>).
We want to serialize it as an empty object {} instead. This change will
ease the deserialization process and bring more consistency in our code as
nonempty statements are already serialized as objects, not arrays.
The impact of this change will be in the output of wbgetentities and
editing APIs, as well as Special:EntityData and the Lexeme JSON dumps.
If you’re maintaining tools that use lexicographical data, you may want to
check your code to make sure that it reflects this change, e.g. lexeme
forms or senses with no statements are properly deserialized by your tool.
This change is available for testing at test.wikidata.org before deployment
to www.wikidata.org. You are welcome to give us general feedback by leaving
a comment in this ticket <https://phabricator.wikimedia.org/T241422>.
If you have any questions please do not hesitate to ask.
Cheers,
--
Mohammed Sadat
*Community Communications Manager for Wikidata/Wikibase*
Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin
Phone: +49 (0)30 219 158 26-0
https://wikimedia.de
Keep up to date! Current news and exciting stories about Wikimedia,
Wikipedia and Free Knowledge in our newsletter (in German): Subscribe now
<https://www.wikimedia.de/newsletter/>.
Imagine a world in which every single human being can freely share in the
sum of all knowledge. Help us to achieve our vision!
https://spenden.wikimedia.de
Wikimedia Deutschland – Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.