Wikidata February 2015

wikidata@lists.wikimedia.org

46 participants
31 discussions

[Wikidata-l] Call for development openness
by Ricordisamoa 17 Jul '15

17 Jul '15

Hi. I recently started following mediawiki/extensions/Wikibase on Gerrit, and quite astonishingly found that nearly all of the 100 most recently updated changes appear to be owned by WMDE employees (exceptions being one change by Legoktm and some from L10n-bot). This is not the case, for example, with mediawiki/core. While this may be desired by the Wikidata team for corporate reasons, I feel that encouraging code review by volunteers would empower both Wikidata and third-party communities with new ways of contributing to the project and raise awareness of the development team's goals in the long term. The messy naming conventions play a role too, i.e. Extension:Wikibase <https://www.mediawiki.org/w/index.php?title=Extension:Wikibase&redirect=no> is supposed to host technical documentation but instead redirects to the Wikibase <https://www.mediawiki.org/wiki/Wikibase> portal, with actual documentation split into Extension:Wikibase Repository <https://www.mediawiki.org/wiki/Extension:Wikibase_Repository> and Extension:Wikibase Client <https://www.mediawiki.org/wiki/Extension:Wikibase_Client>, apparently ignoring the fact that the code is actually developed in a single repository (correct me if I'm wrong). Just to add some more confusion, there's also Extension:Wikidata build <https://www.mediawiki.org/wiki/Extension:Wikidata_build> with no documentation. And what about wmde on GitHub <https://github.com/wmde> with countless creatively-named repos? They make life even harder for potential contributors. Finally, the ever-changing client-side APIs make gadgets development a pain in the ass. Sorry if this sounds like a slap in the face, but it had to be said.

10 23

[Wikidata-l] mapping template parameters using Wikidata?
by Amir E. Aharoni 16 Apr '15

16 Apr '15

Hi, TL;DR: Did anybody consider using Wikidata items of Wikipedia templates to store multilingual template parameters mapping? Full explanation: As in many other projects in the Wikimedia world, templates are one of the biggest challenges in developing the ContentTranslation extension. Translating a template between languages is tedious - many templates are language-specific, many others have a corresponding template, but incompatible parameters, and even if the parameters are compatible, there is usually no comfortable mapping. Some work in that direction was done in DBpedia, but AFAIK it's far from complete. In ContentTranslation we have a simplistic mechanism for mapping between template parameters in pairs of languages, with proof of concept for three templates. We can enhance it with more templates, but the question is how much can it scale. Some templates shouldn't need such mapping at all - they should pull their data from Wikidata. This is gradually being done for infoboxes in some languages, and it's great. But not all templates can be easily mapped to Wikidata data. For example - reference templates, various IPA and language templates, quotation formatting, and so on. For these, parameter mapping could be useful, but doing this for a single language pair doesn't seem robust and reminds me of the old ways in which interlanguage links were stored. So, did anybody consider using Wikidata items of templates to store multilingual template parameters mapping? -- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com ‪“We're living in pieces, I want to live in peace.” – T. Moore‬

8 20

[Wikidata-l] Update on data quality project
by Lydia Pintscher 30 Mar '15

30 Mar '15

Hey folks :) The students team working on data quality is making good progress. For some reason their emails don't go through to this list so I am sending it for them below. Cheers Lydia Hey :) We are a team of students from Hasso-Plattner-Institute in Potsdam, Germany. For our bachelor project we're working together with the team of Wikidata to ensure their data quality. On our wiki page (https://www.mediawiki.org/wiki/WikidataQuality) we introduce our projects in more detail. One of them deals with constraints, for which we need your input since we want to work on constraints as statements on properties and the final form of those still needs to be specified. So far, we hope you like our projects and we appreciate your input! Cheers, the Wikidata Quality Team -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

4 10

[Wikidata-l] Structured data on Commons update
by Keegan Peterzell 23 Mar '15

23 Mar '15

Greetings, After a delay in updates to the Structured data on Commons[1] project, I wanted to catch you up with what has been going on over the past three months. In short: The project is on hold, but that doesn't mean nothing is happening. The meeting in Berlin[2] in October provided the engineering teams with a lot to start on. Unfortunately the Structured Data on Commons project was put on hold not too long after this meeting. Development of the actual Structured data system for Commons will not begin until more resources can be allocated to it. The Wikimedia Foundation and Wikimedia Germany have been working to improve the Wikidata query process on the back-end. This is designed to be a production-grade replacement of WikidataQuery integrated with search. The full project is described at Mediawiki.org[3].This will benefit the structured data project greatly since developing a high-level search for Commons is a desired goal of this project. The Wikidata development team is working on the arbitrary access feature. Currently it's only possible to access items that are connected to the current page. So for example on Vincent van Gogh you can access the statements on Q5582, but you can't access these statements on Commons at Category:Vincent van Gogh or Creator:Vincent van Gogh. With arbitrary access enabled on Commons we no longer have this limitation. This opens up the possibility to use Wikidata data on Creator, Institution, Authority control and other templates instead of duplicating the data (what we do now). This will greatly enhance the usefulness of Wikidata for Commons. To use the full potential of arbitrary access the Commons community needs to reimplement several templates in LUA. In LUA it's possible to use the local fields and fallback to Wikidata if it's not locally available. Help with this conversion is greatly appreciated. The different tasks are tracked in Phabricator[4]. Volunteers are continuing to add data about artworks to Wikidata. Sometimes an institution website is used and sometimes data is being transfered from Commons to Wikidata. Wikidata now has almost 35.000 items about paintings. This is done as part of the Wikidata WikiProject "Sum of All Paintings"[5]. This helps us to learn how to refine metadata structure about artworks. Experience that will of course be very useful for Commons too. Additionally, the metadata cleanup drive continues to produce results[6]. The drive, which is intended to identify files missing {{information}} or the like structured data fields and to add such fields when absent, has reduced the number of files missing information by almost 100,000 on Commons. You can help by looking for files[7] with similarly-formatted description pages, and listing them at Commons:Bots/Work requests[8] so that a bot can add the {{information}} template on them. At the Amsterdam Hackathon in November 2014, a couple of different models were developed about how artwork can be viewed on the web using structured data from Wikidata. You can browse two examples[9][10]. These examples can give you an idea of the kind of data that file pages have the potential to display on-wiki in the future. The Structured Data project is a long-term one, and the volunteers and staff will continue working together to provide the structure and support in the back-end toward front-end development. There are still many things to do to help advance the project, and I hope to have more news for you in the near future. Contact me any time with questions, comments, concerns. 1. https://commons.wikimedia.org/wiki/Commons:Structured_data 2. https://commons.wikimedia.org/wiki/Commons:Structured_data/Berlin_bootcamp 3. https://www.mediawiki.org/wiki/Wikibase/Indexing 4. https://phabricator.wikimedia.org/T89594 5. https://www.wikidata.org/wiki/Wikidata:WikiProject_sum_of_all_paintings 6. https://meta.wikimedia.org/wiki/File_metadata_cleanup_drive 7. https://tools.wmflabs.org/mrmetadata/commons/commons/index.html 8. https://commons.wikimedia.org/wiki/Commons:Bots/Work_requests 9. http://www.zone47.com/crotos/?p=1&p276=190804&y1=1600&y2=2014 10. http://sum.bykr.org/432253 -- Keegan Peterzell Community Liaison, Product Wikimedia Foundation

3 2

[Wikidata-l] Empty references
by Markus Kroetzsch 12 Mar '15

12 Mar '15

Hi, In my previous post, I mentioned that there are very few empty-map issues remaining in the JSON dumps as well. I think there are only two right now: https://www.wikidata.org/wiki/Special:EntityData/Q2382164.json https://www.wikidata.org/wiki/Special:EntityData/Q4383128.json Both have the empty reference problem (a statement has a reference which contains no data whatsoever; look for: "snaks":[]). Maybe somebody wants to look into this (this should not really happen, even if the JSON were serialized correctly). Cheers, Markus -- Markus Kroetzsch Faculty of Computer Science Technische Universität Dresden +49 351 463 38486 http://korrekt.org/

4 8

[Wikidata-l] Wikidata Toolkit 0.4.0 released
by Markus Kroetzsch 01 Mar '15

01 Mar '15

Hi all, I am happy to announce the release of Wikidata Toolkit 0.4.0 [1], the Java library for programming with Wikidata and Wikibase. The main new features are: * Full support of a variety of new Wikidata features (including statements on properties and new datatypes) * More robust JSON parsing so as to tolerate bugs in the JSON structure * Several new utilities to copy and filter data * A new stand-alone command-line client to process the Wikidata dumps without actually writing your own Java program [2]. The current main use of this is to generate custom RDF dumps (we use it for the dumps we publish), but it can also be extended with other processing actions in the future. Maven users can get the library directly from Maven Central (see [1]); this is the preferred method of installation. There is also an all-in-one JAR at github [3] and of course the sources [4]. Feedback is very welcome. Developers are also invited to contribute via github. Cheers, Markus [1] https://www.mediawiki.org/wiki/Wikidata_Toolkit [2] https://www.mediawiki.org/wiki/Wikidata_Toolkit/Client [3] https://github.com/Wikidata/Wikidata-Toolkit/releases [4] https://github.com/Wikidata/Wikidata-Toolkit/

2 1

[Wikidata-l] Broken JSON in XML dumps
by Markus Kroetzsch 28 Feb '15

28 Feb '15

Hi, It's that time of the year again when I am sending a reminder that we still have broken JSON in the dump files ;-). As usual, the problem is that empty maps {} are serialized wrongly as empty lists []. I am not sure if there is any open bug that tracks this, so I am sending an email. There was one, but it was closed [1]. As you know (I had sent an email a while ago), there are some remaining problems of this kind in the JSON dump, and also in the live exported JSON, e.g., https://www.wikidata.org/wiki/Special:EntityData/Q4383128.json (uses [] as a value for snaks: this item has a reference with an empty list of snaks, which is an error by itself) However, the situation is considerably worse in the XML dumps, which have seen less usage since we have JSON, but as it turns out are still preferred by some users. Surprisingly (to me), the JSON content in the XML dumps is still not the same as in the JSON dumps. A large part of the records in the XML dump is broken because of the map-vs-list issue. For example, the latest dump of current revisions [2] has countless instances of the problem. The first is in the item Q3261 (empty list for claims), but you can easily find more by grepping for things like "claims":[] It seems that all empty maps are serialized wrongly in this dump (aliases, descriptions, claims, ...). In contrast, the site's export simply omits the key of empty maps entirely, see https://www.wikidata.org/wiki/Special:EntityData/Q3261.json The JSON in the JSON dumps is the same. Cheers, Markus [1] https://github.com/wmde/WikibaseDataModelSerialization/issues/77 [2] http://dumps.wikimedia.org/wikidatawiki/20150207/wikidatawiki-20150207-page… -- Markus Kroetzsch Faculty of Computer Science Technische Universität Dresden +49 351 463 38486 http://korrekt.org/

9 13

[Wikidata-l] Wikimania submissions - please vote
by Lydia Pintscher 27 Feb '15

27 Feb '15

Hey :) Here are all the Wikidata related submissions for Wikimania: https://wikimania2015.wikimedia.org/wiki/Category:Wikidata It'd be awesome if you could go and vote for your favorite ones. Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

1 0

[Wikidata-l] Wikibooks now gets language links via Wikidata
by Lydia Pintscher 25 Feb '15

25 Feb '15

Hey folks :) Last night we added Wikibooks to Wikidata. Wikibooks' language links can now be maintained on Wikidata. Please give them a warm welcome and keep an eye on https://www.wikidata.org/wiki/Wikidata:Wikibooks for questions if you can. Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

2 1

[Wikidata-l] (no subject)
by Sandeep Brijlal Yadav 25 Feb '15

25 Feb '15

Regards, Sandeep Yadav Human Resources [Description: cid:image001.jpg@01D04C4F.300D74E0] +91 2271174796 office Sandeep.BrijlalYadav(a)sitel.com<mailto:Sandeep.BrijlalYadav@sitel.com> sitel.com Experience shared. Please consider the environment before printing this mail.

1 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Wikidata February 2015