Hi!
In RDF exports of Wikidata[1] and in Wikidata Query Service, sitelinks were always encoded by url-encoding the sitelink text - i.e. link to "Category:Stuffed animals" were encoded as /wiki/Category%3AStuffed%20animals.
While this encoding produces a working link, after some time we've arrived to a conclusion that such encoding is very inconvenient, due to mismatch with how titles are encoded in Mediawiki, and this mismatch makes it harder to look up the links. See more in https://phabricator.wikimedia.org/T131960
We have decided to change the encoding, so that the encoding of the sitelink above would be /wiki/Category:Stuffed_animals. The encoding now should match how titles are encoded in Mediawiki codebase (non-ASCII characters that Mediawiki encodes will still be encoded as before).
Implementation of this change will require database reload, and during that time there might be inconsistent results returned for some time (some entities may have new sitelink encoding and some the old one). I apologize in advance for any inconvenience caused by that. I will announce additionally when the switch is process has started and when it is complete.
Thanks, [1] https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format
Hi!
We have decided to change the encoding, so that the encoding of the sitelink above would be /wiki/Category:Stuffed_animals. The encoding now should match how titles are encoded in Mediawiki codebase (non-ASCII characters that Mediawiki encodes will still be encoded as before).
Heads up: this change now has been merged but not deployed yet. It will be deployed in the next regular Wikidata deployment.
Thanks,
Hi!
We have decided to change the encoding, so that the encoding of the sitelink above would be /wiki/Category:Stuffed_animals. The encoding now should match how titles are encoded in Mediawiki codebase (non-ASCII characters that Mediawiki encodes will still be encoded as before).
Continuing the saga, the change now has been deployed on Wikidata. Which means, unfortunately, that for some time some entities (newly edited) would have the new encoding for sitelinks, and some (not edited recently) may still have the old encoding. As soon as the new dump will be produced (next Tuesday) we will begin reloading the servers, after which all the data will be using the new encoding, matching the encoding in Mediaiwki. I apologize in advance for the inconvenience that may be caused by having inconsistent data for a while (shouldn't be more than a week). I will make an announcement (hopefully the final one on the topic :) when it is all done.
Thanks,