Hi.
I recently started following mediawiki/extensions/Wikibase on Gerrit,
and quite astonishingly found that nearly all of the 100 most recently
updated changes appear to be owned by WMDE employees (exceptions being
one change by Legoktm and some from L10n-bot). This is not the case, for
example, with mediawiki/core.
While this may be desired by the Wikidata team for corporate reasons, I
feel that encouraging code review by volunteers would empower both
Wikidata and third-party communities with new ways of contributing to
the project and raise awareness of the development team's goals in the
long term.
The messy naming conventions play a role too, i.e. Extension:Wikibase
<https://www.mediawiki.org/w/index.php?title=Extension:Wikibase&redirect=no>
is supposed to host technical documentation but instead redirects to the
Wikibase <https://www.mediawiki.org/wiki/Wikibase> portal, with actual
documentation split into Extension:Wikibase Repository
<https://www.mediawiki.org/wiki/Extension:Wikibase_Repository> and
Extension:Wikibase Client
<https://www.mediawiki.org/wiki/Extension:Wikibase_Client>, apparently
ignoring the fact that the code is actually developed in a single
repository (correct me if I'm wrong). Just to add some more confusion,
there's also Extension:Wikidata build
<https://www.mediawiki.org/wiki/Extension:Wikidata_build> with no
documentation.
And what about wmde on GitHub <https://github.com/wmde> with countless
creatively-named repos? They make life even harder for potential
contributors.
Finally, the ever-changing client-side APIs make gadgets development a
pain in the ass.
Sorry if this sounds like a slap in the face, but it had to be said.
Hi,
TL;DR: Did anybody consider using Wikidata items of Wikipedia templates to
store multilingual template parameters mapping?
Full explanation:
As in many other projects in the Wikimedia world, templates are one of the
biggest challenges in developing the ContentTranslation extension.
Translating a template between languages is tedious - many templates are
language-specific, many others have a corresponding template, but
incompatible parameters, and even if the parameters are compatible, there
is usually no comfortable mapping. Some work in that direction was done in
DBpedia, but AFAIK it's far from complete.
In ContentTranslation we have a simplistic mechanism for mapping between
template parameters in pairs of languages, with proof of concept for three
templates. We can enhance it with more templates, but the question is how
much can it scale.
Some templates shouldn't need such mapping at all - they should pull their
data from Wikidata. This is gradually being done for infoboxes in some
languages, and it's great.
But not all templates can be easily mapped to Wikidata data. For example -
reference templates, various IPA and language templates, quotation
formatting, and so on. For these, parameter mapping could be useful, but
doing this for a single language pair doesn't seem robust and reminds me of
the old ways in which interlanguage links were stored.
So, did anybody consider using Wikidata items of templates to store
multilingual template parameters mapping?
--
Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
http://aharoni.wordpress.com
“We're living in pieces,
I want to live in peace.” – T. Moore
Hey folks :)
The students team working on data quality is making good progress. For
some reason their emails don't go through to this list so I am sending
it for them below.
Cheers
Lydia
Hey :)
We are a team of students from Hasso-Plattner-Institute in Potsdam,
Germany. For our bachelor project we're working together with the team
of Wikidata to ensure their data quality.
On our wiki page (https://www.mediawiki.org/wiki/WikidataQuality) we
introduce our projects in more detail.
One of them deals with constraints, for which we need your input since
we want to work on constraints as statements on properties and the
final form of those still needs to be specified.
So far, we hope you like our projects and we appreciate your input!
Cheers,
the Wikidata Quality Team
--
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Greetings,
After a delay in updates to the Structured data on Commons[1] project, I
wanted to catch you up with what has been going on over the past three
months. In short: The project is on hold, but that doesn't mean nothing is
happening.
The meeting in Berlin[2] in October provided the engineering teams with a
lot to start on. Unfortunately the Structured Data on Commons project was
put on hold not too long after this meeting. Development of the actual
Structured data system for Commons will not begin until more resources can
be allocated to it.
The Wikimedia Foundation and Wikimedia Germany have been working to improve
the Wikidata query process on the back-end. This is designed to be a
production-grade replacement of WikidataQuery integrated with search. The
full project is described at Mediawiki.org[3].This will benefit the
structured data project greatly since developing a high-level search for
Commons is a desired goal of this project.
The Wikidata development team is working on the arbitrary access feature.
Currently it's only possible to access items that are connected to the
current page. So for example on Vincent van Gogh you can access the
statements on Q5582, but you can't access these statements on Commons at
Category:Vincent van Gogh or Creator:Vincent van Gogh. With arbitrary
access enabled on Commons we no longer have this limitation. This opens up
the possibility to use Wikidata data on Creator, Institution, Authority
control and other templates instead of duplicating the data (what we do
now). This will greatly enhance the usefulness of Wikidata for Commons.
To use the full potential of arbitrary access the Commons community needs
to reimplement several templates in LUA. In LUA it's possible to use the
local fields and fallback to Wikidata if it's not locally available. Help
with this conversion is greatly appreciated. The different tasks are
tracked in Phabricator[4].
Volunteers are continuing to add data about artworks to Wikidata. Sometimes
an institution website is used and sometimes data is being transfered from
Commons to Wikidata. Wikidata now has almost 35.000 items about paintings.
This is done as part of the Wikidata WikiProject "Sum of All Paintings"[5].
This helps us to learn how to refine metadata structure about artworks.
Experience that will of course be very useful for Commons too.
Additionally, the metadata cleanup drive continues to produce results[6].
The drive, which is intended to identify files missing {{information}} or
the like structured data fields and to add such fields when absent, has
reduced the number of files missing information by almost 100,000 on
Commons. You can help by looking for files[7] with similarly-formatted
description pages, and listing them at Commons:Bots/Work requests[8] so
that a bot can add the {{information}} template on them.
At the Amsterdam Hackathon in November 2014, a couple of different models
were developed about how artwork can be viewed on the web using structured
data from Wikidata. You can browse two examples[9][10]. These examples can
give you an idea of the kind of data that file pages have the potential to
display on-wiki in the future.
The Structured Data project is a long-term one, and the volunteers and
staff will continue working together to provide the structure and support
in the back-end toward front-end development. There are still many things
to do to help advance the project, and I hope to have more news for you in
the near future. Contact me any time with questions, comments, concerns.
1. https://commons.wikimedia.org/wiki/Commons:Structured_data
2.
https://commons.wikimedia.org/wiki/Commons:Structured_data/Berlin_bootcamp
3. https://www.mediawiki.org/wiki/Wikibase/Indexing
4. https://phabricator.wikimedia.org/T89594
5. https://www.wikidata.org/wiki/Wikidata:WikiProject_sum_of_all_paintings
6. https://meta.wikimedia.org/wiki/File_metadata_cleanup_drive
7. https://tools.wmflabs.org/mrmetadata/commons/commons/index.html
8. https://commons.wikimedia.org/wiki/Commons:Bots/Work_requests
9. http://www.zone47.com/crotos/?p=1&p276=190804&y1=1600&y2=2014
10. http://sum.bykr.org/432253
--
Keegan Peterzell
Community Liaison, Product
Wikimedia Foundation
Hi,
In my previous post, I mentioned that there are very few empty-map
issues remaining in the JSON dumps as well. I think there are only two
right now:
https://www.wikidata.org/wiki/Special:EntityData/Q2382164.jsonhttps://www.wikidata.org/wiki/Special:EntityData/Q4383128.json
Both have the empty reference problem (a statement has a reference which
contains no data whatsoever; look for: "snaks":[]). Maybe somebody wants
to look into this (this should not really happen, even if the JSON were
serialized correctly).
Cheers,
Markus
--
Markus Kroetzsch
Faculty of Computer Science
Technische Universität Dresden
+49 351 463 38486
http://korrekt.org/
Hi all,
I am happy to announce the release of Wikidata Toolkit 0.4.0 [1], the
Java library for programming with Wikidata and Wikibase. The main new
features are:
* Full support of a variety of new Wikidata features (including
statements on properties and new datatypes)
* More robust JSON parsing so as to tolerate bugs in the JSON structure
* Several new utilities to copy and filter data
* A new stand-alone command-line client to process the Wikidata dumps
without actually writing your own Java program [2]. The current main use
of this is to generate custom RDF dumps (we use it for the dumps we
publish), but it can also be extended with other processing actions in
the future.
Maven users can get the library directly from Maven Central (see [1]);
this is the preferred method of installation. There is also an
all-in-one JAR at github [3] and of course the sources [4].
Feedback is very welcome. Developers are also invited to contribute via
github.
Cheers,
Markus
[1] https://www.mediawiki.org/wiki/Wikidata_Toolkit
[2] https://www.mediawiki.org/wiki/Wikidata_Toolkit/Client
[3] https://github.com/Wikidata/Wikidata-Toolkit/releases
[4] https://github.com/Wikidata/Wikidata-Toolkit/
Hi,
It's that time of the year again when I am sending a reminder that we
still have broken JSON in the dump files ;-). As usual, the problem is
that empty maps {} are serialized wrongly as empty lists []. I am not
sure if there is any open bug that tracks this, so I am sending an
email. There was one, but it was closed [1].
As you know (I had sent an email a while ago), there are some remaining
problems of this kind in the JSON dump, and also in the live exported
JSON, e.g.,
https://www.wikidata.org/wiki/Special:EntityData/Q4383128.json
(uses [] as a value for snaks: this item has a reference with an empty
list of snaks, which is an error by itself)
However, the situation is considerably worse in the XML dumps, which
have seen less usage since we have JSON, but as it turns out are still
preferred by some users. Surprisingly (to me), the JSON content in the
XML dumps is still not the same as in the JSON dumps. A large part of
the records in the XML dump is broken because of the map-vs-list issue.
For example, the latest dump of current revisions [2] has countless
instances of the problem. The first is in the item Q3261 (empty list for
claims), but you can easily find more by grepping for things like
"claims":[]
It seems that all empty maps are serialized wrongly in this dump
(aliases, descriptions, claims, ...). In contrast, the site's export
simply omits the key of empty maps entirely, see
https://www.wikidata.org/wiki/Special:EntityData/Q3261.json
The JSON in the JSON dumps is the same.
Cheers,
Markus
[1] https://github.com/wmde/WikibaseDataModelSerialization/issues/77
[2]
http://dumps.wikimedia.org/wikidatawiki/20150207/wikidatawiki-20150207-page…
--
Markus Kroetzsch
Faculty of Computer Science
Technische Universität Dresden
+49 351 463 38486
http://korrekt.org/
Hey :)
Here are all the Wikidata related submissions for Wikimania:
https://wikimania2015.wikimedia.org/wiki/Category:Wikidata It'd be
awesome if you could go and vote for your favorite ones.
Cheers
Lydia
--
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Hey folks :)
Last night we added Wikibooks to Wikidata. Wikibooks' language links
can now be maintained on Wikidata. Please give them a warm welcome and
keep an eye on https://www.wikidata.org/wiki/Wikidata:Wikibooks for
questions if you can.
Cheers
Lydia
--
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.