Wikidata September 2014

wikidata@lists.wikimedia.org

56 participants
37 discussions

[Wikidata-l] mapping template parameters using Wikidata?
by Amir E. Aharoni 15 Apr '15

15 Apr '15

Hi, TL;DR: Did anybody consider using Wikidata items of Wikipedia templates to store multilingual template parameters mapping? Full explanation: As in many other projects in the Wikimedia world, templates are one of the biggest challenges in developing the ContentTranslation extension. Translating a template between languages is tedious - many templates are language-specific, many others have a corresponding template, but incompatible parameters, and even if the parameters are compatible, there is usually no comfortable mapping. Some work in that direction was done in DBpedia, but AFAIK it's far from complete. In ContentTranslation we have a simplistic mechanism for mapping between template parameters in pairs of languages, with proof of concept for three templates. We can enhance it with more templates, but the question is how much can it scale. Some templates shouldn't need such mapping at all - they should pull their data from Wikidata. This is gradually being done for infoboxes in some languages, and it's great. But not all templates can be easily mapped to Wikidata data. For example - reference templates, various IPA and language templates, quotation formatting, and so on. For these, parameter mapping could be useful, but doing this for a single language pair doesn't seem robust and reminds me of the old ways in which interlanguage links were stored. So, did anybody consider using Wikidata items of templates to store multilingual template parameters mapping? -- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com ‪“We're living in pieces, I want to live in peace.” – T. Moore‬

8 20

[Wikidata-l] Inclusion criteria for Wikidata items for paintings, engravings, illustrations, manuscript folios, photographs, old postcards, etc ?
by James Heald 27 Oct '14

27 Oct '14

Hi everybody, With the Structured Data for Commons project about to move into high gear, it seems to me that there's something the Wikidata community needs to have a serious discussion about, before APIs start getting designed and set in stone. Specifically: when should an object have an item with its own Q-number created for it on Wikidata? What are the limits? (Are there any limits?) The position so far seems to be essentially that a Wikidata item has only been created when an object either already has a fully-fledged Wikipedia article written for it, or reasonably could have. So objects that aren't particularly notable typically have not had Wikidata items made for them. Indeed, practically the first message Lydia sent to me when I started trying to work on Commons and Wikidata was to underline to me that Wikidata objects should generally not be created for individual Commons files. But, if I'm reading the initial plans and API thoughts of the Multimedia team correctly, eg https://commons.wikimedia.org/w/index.php?title=File%3AStructured_Data_-_Sl… and https://docs.google.com/document/d/1tzwGtXRyK3o2ZEfc85RJ978znRdrf9EkqdJ0zVj… there seems to be the key assumption that, for any image that contains information relating to something beyond the immediate photograph or scan, there will be some kind of 'original work' item on main Wikidata that the file page will be able to reference, such that the 'original work' Wikidata item will be able to act as a place to locate any information specifically relating to the original work. Now in many ways this is a very clean division to be able to make. It removes any question of having to judge "notability"; and it removes any ambiguity or diversity of where information might be located -- if the information relates to the original work, then it will be stored on Wikidata. But it would appear to imply a potentially *huge* increase in the inclusion criteria for Wikidata, and the number of Wikidata items potentially creatable. So it seems appropriate that the Wikidata community should discuss and sign off just what should and should not be considered appropriate, before things get much further. For example, a year ago the British Library released 1 million illustrations from out-of-copyright books, which increasingly have been uploaded to Commons. Recently the Internet Archive has announced plans to release a further 12 million, with more images either already uploading or to follow from other major repositories including eg the NYPL, the Smithsonian, the Wellcome Foundation, etc, etc. How many of these images, all scanned from old originals, are going to need new Q-numbers for those originals? Is this okay? Or are some of them too much? For example, for maps, cf this data schema https://docs.google.com/spreadsheets/d/1Hn8VQ1rBgXj3avkUktjychEhluLQQJl5v6W… , each map sheet will have a separate Northernmost, Southernmost, Easternmost, Westernmost bounding co-ordinates. Does that mean each map sheet should have its own Wikidata item? For book illustrations, perhaps it is would be enough just to reference the edition of the book. But if individual illustrations have their own artist and engraver details, does that mean the illustration needs to have its own Wikidata item? Similarly, if the same engraving has appeared in many books, is that also a sign that it should have its own Wikidata item? What about old photographs, or old postcards, similarly. When should these have their own Wikidata item? If they have their own known creator, and creation date, then is it most simple just to give them a Wikidata item, so that such information about an original underlying work is always looked for on Wikidata? What if multiple copies of the same postcard or photograph are known, published or re-published at different times? But the potential number of old postcards and photographs, like the potential number of old engravings, is *huge*. What if an engraving was re-issued in different "states" (eg a re-issued engraving of a place might have been modified if a tower had been built). When should these get different items? At https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Visual_arts#Wikidat… where I raised some of these issues a couple of weeks ago, there has even been the suggestion that particular individual impressions of an engraving might deserve their own separate items; or even everything with a separate accession number, so if a museum had three copies of an engraving, we would make three separate items, each carrying their own accession number, identifying the accession number that belonged to a particular File. (See also other sections at https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Visual_arts for further relevant discussions on how to represent often quite complicated relations with Wikidata properties). With enough items, we could re-create and represent essentially the entire FRBR tree. We could do this. We may even need to do this, if MM team's outline for Commons is to be implemented in its apparent current form. But it seems to me that we shouldn't just sleepwalk into it. It does seem to me that this does represent (at least potentially) a *very* large expansion in the number of items, and widening of the inclusion criteria, for what Wikidata is going to encompass. I'm not saying it isn't the right thing to do, but given the potential scale of the implications, I do think it is something we do need to have properly worked through as a community, and confirmed that it is indeed what we *want* to do. All best, James. (Note that this is a slightly different discussion, though related, to the one I raised a few weeks ago as to whether Commons categories -- eg for particular sets of scans -- should necessarily have their own Q-number on Wikidata. Or whether some -- eg some intersection categories -- should just have an item on Commons data. But it's clearly related: is the simplest thing just to put items for everything on Wikidata? Or does one try to keep Wikidata lean, and no larger than it absolutely needs to be; albeit then having to cope with the complexity that some categories would have a Q-number, and some would not.)

8 16

[Wikidata-l] How can I increase the throughput of ProteinBoxBot?
by Andra Waagmeester 18 Oct '14

18 Oct '14

Hi All, I have joined the development team of the ProteinBoxBot ( https://www.wikidata.org/wiki/User:ProteinBoxBot) . Our goal is to make Wikidata the canonical resource for referencing and translating identifiers for genes and proteins from different species. Currently adding all genes from the human genome and their related identifiers to Wikidata takes more then a month to complete. With the objective to add other species, as well as having frequent updates for each of the genomes, it would be convenient if we could increase this throughput. Would it be accepted if we increase the throughput by running multiple instances of ProteinBoxBot in parallel. If so, what would be an accepted number of parallel instances of a bot to run? We can run multiple instances from different geographical locations if necessary. Kind regards, Andra

8 9

[Wikidata-l] Item both subclass and instance?
by Alain Cuvillier 09 Oct '14

09 Oct '14

Hi all, why are both the subclass of and instance of properties set for the ethanol (showcase) item? For me ethanol is a single concrete alcohol and it is not a class. There is only one ethanol, with a single chemical formula and structure, so only the instance of property is right for this item ? https://www.wikidata.org/wiki/Q153 Thank you Alain

9 19

[Wikidata-l] weekly summary #126
by Lydia Pintscher 05 Oct '14

05 Oct '14

Hey folks :) Here's your summary of what happened around Wikidata over the past 2 weeks. Sorry for not sending one last week. John and I could use some help with putting the weekly summary together. Let us know if you want to help. Discussions - A new blocking policy <https://www.wikidata.org/wiki/Special:MyLanguage/Wikidata:Blocking_policy> has been accepted <https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/User_conduct_po…> . - Proposal for an individual engagement grant to use Wikidata items for citations <https://meta.wikimedia.org/wiki/Grants:IdeaLab/Tools_for_using_wikidata_ite…> needs your input - RfC: redirect vs. deletion <https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Redirect_vs._de…> Events <https://www.wikidata.org/wiki/Wikidata:Events>/Press/Blogs <https://www.wikidata.org/wiki/Wikidata:Press_coverage> - past: Open Government WikiHack organized by Wikimedia DC in Washington, DC - upcoming: Wikidata training organized by Wikimedia UK with Magnus in London <https://donate.wikimedia.org.uk/civicrm/event/info?id=147&reset=1> Other Noteworthy Stuff - With the deployment next Tuesday you will be able to edit all sitelinks at once as well as all fields of the "in other languages" box. This is an intermediate step towards the new user interface and will evolve further over the next weeks. You can see what is coming on Tuesday now already on test.wikidata.org. - WikiProject Names <https://www.wikidata.org/wiki/Wikidata:WikiProject_Names> aims to improve name related data on Wikidata. Initial focus is on first names (given names). Half of items for first names still need cleaning up, but 15% of items for persons already have a given name defined. Did you know? - Newest properties: Stack Exchange tag <https://www.wikidata.org/wiki/Property:P1482>, vici.org ID <https://www.wikidata.org/wiki/Property:P1481>, sourcing circumstances <https://www.wikidata.org/wiki/Property:P1480>, has contributing factor <https://www.wikidata.org/wiki/Property:P1479>, has immediate cause <https://www.wikidata.org/wiki/Property:P1478>, birth name (Monolingual text) <https://www.wikidata.org/wiki/Property:P1477>, title <https://www.wikidata.org/wiki/Property:P1476>, Nupill Literatura Digital - Document <https://www.wikidata.org/wiki/Property:P1474>, Nupill Literatura Digital - Author <https://www.wikidata.org/wiki/Property:P1473>, Commons Creator template <https://www.wikidata.org/wiki/Property:P1472>,monogram <https://www.wikidata.org/wiki/Property:P1543>, cause of <https://www.wikidata.org/wiki/Property:P1542>, Cycling Quotient ID <https://www.wikidata.org/wiki/Property:P1541>, male population <https://www.wikidata.org/wiki/Property:P1540>, female population <https://www.wikidata.org/wiki/Property:P1539>, number of households <https://www.wikidata.org/wiki/Property:P1538>, contributing factor of <https://www.wikidata.org/wiki/Property:P1537>, immediate cause of <https://www.wikidata.org/wiki/Property:P1536>, used by <https://www.wikidata.org/wiki/Property:P1535>, end cause <https://www.wikidata.org/wiki/Property:P1534>, family name identical to this first name <https://www.wikidata.org/wiki/Property:P1533>, country for sport <https://www.wikidata.org/wiki/Property:P1532>, parents of hybrids <https://www.wikidata.org/wiki/Property:P1531>, Glad identifier <https://www.wikidata.org/wiki/Property:P1529>, kulturnoe-nasledie.ru identifier <https://www.wikidata.org/wiki/Property:P1483> - Showcase items <https://www.wikidata.org/wiki/Wikidata:Showcase_items> : Hessian Broadcasting Corporation <https://www.wikidata.org/wiki/Q23565> , Fishing Creek <https://www.wikidata.org/wiki/Q5455008> Development - Jan Zerebecki has joined the Wikidata dev team. <https://lists.wikimedia.org/pipermail/wikidata-l/2014-September/004628.html> - Worked on supporting statements on properties in WikibaseDataModelSerialization (bugzilla:66425 <https://bugzilla.wikimedia.org/show_bug.cgi?id=66425>) - Fixed broken xml api output (bugzilla:70531 <https://bugzilla.wikimedia.org/show_bug.cgi?id=70531>), as well as some inconsistencies in the xml format and added tests that should help avoid future breakage in the xml format - Finished performance improvements for badges - Worked on entity usage tracking - Pietro from the EAGLE project came to visit us, one of the first 3rd party users of Wikibase. See http://www.eagle-network.eu - Added a hook point to allow 3rd party users (like the EAGLE project) of Wikibase to control what goes into the search index - Started work on a widget that lets you edit badges right in the item instead of going to the special page - Use checkboxes instead of a multiselect to edit badges on d:Special:SetSiteLink <https://www.wikidata.org/wiki/Special:SetSiteLink> - Work on hhvm-related issues in Wikibase and temporarily disabled the beta feature on Wikidata until fixes are deployed for the issues. - Deployed new code on test.wikidata! (to be deployed on wikidata on Tuesday), see mw:Wikidata_deployment#wmf.2F1.25wmf1 <https://www.mediawiki.org/wiki/Wikidata_deployment#wmf.2F1.25wmf1> - Work on fixing empty maps in the JSON serialization, differentiate them from empty lists (fixed prerequisite bugzilla:70606 <https://bugzilla.wikimedia.org/show_bug.cgi?id=70606>) - Jeroen made a little video demonstrating how you can get a clone of Wikibase DataModel, set it up, and run it's tests: https://asciinema.org/a/12530 See current sprint items <http://sb.wmflabs.org/p/wikidata/> for what we’re working on next. You can see all open bugs related to Wikidata here <https://bugzilla.wikimedia.org/buglist.cgi?emailcc1=1&list_id=151540&resolu…> Monthly Tasks - Hack on one of these <https://bugzilla.wikimedia.org/buglist.cgi?keywords=need-volunteer%2C%20&ke…> . - Help fix these items <https://www.wikidata.org/wiki/Wikidata:The_Game/Flagged_items> which have been flagged using Wikidata - The Game. - Help develop the next summary here! <https://www.wikidata.org/wiki/Wikidata:Status_updates/Next> - Contribute to a Showcase item <https://www.wikidata.org/wiki/Wikidata:Showcase> Anything to add? Please share! :) Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

3 5

[Wikidata-l] Wikidata Toolkit 0.3.0 released
by Markus Krötzsch 02 Oct '14

02 Oct '14

Dear all, I am happy to announce the third release of Wikidata Toolkit [1], the Java library for programming with Wikidata and Wikibase. The main new features are: * Full support for the (now) standard JSON format used by Wikidata * Huge performance improvements (decompressing and parsing the whole JSON dump now takes about 15min; was more like 80min before) * Many new example programs for inspiration and guidance [2] Maven users can get the library directly from Maven Central (see [1]); this is the preferred method of installation. There is also an all-in-one JAR at github [3] and of course the sources [4]. Version 0.3.0 is still in alpha. For the next release, we will focus on the following tasks: * Support a binary format for even faster random access (some of this is done already, but not quite ready for release yet) * A command-line tool for data processing/conversion tasks * Support for storing and querying data Feedback is very welcome. Developers are also invited to contribute via github. Cheers, Markus [1] https://www.mediawiki.org/wiki/Wikidata_Toolkit [2] https://github.com/Wikidata/Wikidata-Toolkit/tree/master/wdtk-examples (scroll down for documentation) [3] https://github.com/Wikidata/Wikidata-Toolkit/releases (you'll also need to install the third party dependencies manually when using this) [4] https://github.com/Wikidata/Wikidata-Toolkit/

3 2

[Wikidata-l] link von "de:afrikanische Pflaume" auf "fr:safou" - "unerwarteter fehler aufgetreten: $1"
by rupert THURNER 01 Oct '14

01 Oct '14

hi, i was wondering how to resolve the items for safou, and safoutier correctly. there are two entries: https://www.wikidata.org/wiki/Q2369010 https://www.wikidata.org/wiki/Q3461291 both called Dacryodes edulis. on the french wikipedia there are two articles, "safoutier" for the tree, and "safou" for the fruit. i noticed this when i created a forward "afrikanische Pflaume" pointing to "prunus" on the german wikipedia. i pressed than "add links" on the french wikipedia - which did not find "afrikanische pflaume". then i pressed add link on the german wikipedia which found "safou". pressing ok then gave the error "es ist ein unerwarteter fehler aufgetreten: $1. how is this expected to work, or, how could i fix this? rupert

2 2

[Wikidata-l] policy toward using non-CC0 licensed external databases as reference
by Maxime Lathuilière 30 Sep '14

30 Sep '14

Hi! Sorry, I'm quite sure I'm re-opening an issue already discussed but I can't find where; if so, please share the link. I'm working with cultural items data on wikidata and I'm wondering what I'm allowed to do when for instance: - I want to improve the item Q618719 <https://www.wikidata.org/wiki/Q618719> - I find information in Google Books API <https://www.googleapis.com/books/v1/volumes/?q=Asterix%20le%20gaulois> about this item that could help me fill isbn properties in wikidata (P212 and P957) Google Books API's Terms of Service <https://developers.google.com/books/terms> is elusive regarding the licence but it certainly isn't CC0. Meanwhile, I guess I'm allowed to copy the information by hand, right? It's just facts about the item, I'm using this API just like I would have used a newspaper or anything as a reference, right? But could I automate or semi-automate (à la wikidata game) the import process without being in infringement with either Google or Wikidata policies? I would just do the exact same thing - taking facts somewhere in the world and adding them to wikidata - but more efficiently, no? My vision is quite blurred on this, thanks in advance for clarification! Bests, Max -- Maxime Lathuilière maxlath.eu <http://maxlath.eu> @maxlath Zorglub27 <https://www.wikidata.org/wiki/User:Zorglub27>

11 16

[Wikidata-l] extensions not on translatewiki.net
by Amir E. Aharoni 27 Sep '14

27 Sep '14

Hi, Several Wikidata-related extensions are not translatable on translatewiki.net. The ones I could find are: * Wikibase DataModel * Wikibase DataModel JavaScript * Wikidata build * WikimediaBadges * The various DataValues extensions All extensions need at least a translatable description for Special:Version, and some of the above have actual messages to translate. If I understand correctly, the reasons for this is that it's more comfortable for their developers to have the code review in GitHub, and translatewiki's MediaWiki extensions L10n export scripts work only with Gerrit. The question is, is it feasible to sync GitHub and Gerrit, so that these extensions would be easily translatable? If I understand correctly, something like this is already done for some MediaWiki extensions, among the SemanticResultFormats and Maps. Thanks for any assistance. -- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com ‪“We're living in pieces, I want to live in peace.” – T. Moore‬

5 4

[Wikidata-l] packaging and deployment time
by Amir E. Aharoni 27 Sep '14

27 Sep '14

Hi, The following little change by myself was merged by Aude on September 9: https://gerrit.wikimedia.org/r/#/c/159070/ As far as I can see, it is not deployed to Wikipedia yet. It's not really urgent, but it made me curious: What is the deployment schedule for Wikidata extensions? -- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com ‪“We're living in pieces, I want to live in peace.” – T. Moore‬

3 3

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Wikidata September 2014