Wikidata January 2015

wikidata@lists.wikimedia.org

55 participants
31 discussions

by Paul Houle

I fed the Wikidata dump into a JSON profiling tool; in the first stage it identified unique paths one could follow through the JSON data structures. The table below shows a count of the literal data items that can be found behind a path -- we're not counting how many claims of P31[] have been made, we are also counting all of the literals inside the node, so the more information that is qualifying the claim the bigger this number gets. /claims/P31[]144350720 instance of/claims/P625[]35948377 geographic coordinates/claims/P17[]35165614 country: sovereign state/claims/P646[]31095359 freebase identifier/claims/P569[]30881885 date of birth/claims/P21[]30466476 sex or gender/claims/P105[] 29234406 taxon rank/claims/P225[]27808194 taxon name/claims/P131[]27806448 located in administrative div /claims/P171[]25159278 parent taxon None of those are a surprise at all: the two great hierarchies (spatial and biological) are represented and there are properties about people, oddly though the most documented property connected with creative works is P161, which ranks in at #20. Anyhow, it is not all claims, if you look at the highest level you see /datatype1328/id16647896/type16647896/aliases17824992/sitelinks82865847 /descriptions112796452/labels120721644/claims772152821 Everything above the /claims is part of what I have been calling the "taxonomic core". There are quite a few reasons to treat this data specially, and I'd guess this solved a chicken-vs-egg problem for WD. In Freebase the taxonomic core is roughly half the mass of the whole thing. The claims are certainly bulked up in Wikidata because of the qualifying information. If anything is weak about the fundamental data model it is that aliases and labels are not reified the way the claims are. This is a big deal if you want a usable lexical database. For instance, labels should be taggable as to * being potentially offensive (i.e. insults that start with "N") * generic name for drug/brand names for drug * Japanese labels should be available in kanji, hiragana, and romanized form and should be identifiable that way * in English we have it easy and you can generate "Mad Lib" style texts correctly if you can (1) know which article to use and (2) how to make both the plural and singular forms. (1) is easy to guess if you have semantic data and you can get away with being imperfect at (2). * for German however you need to tag by grammatical gender and the choice of the article is a function of said gender and the relationship between the concept and the predicate as well as the verb tense * similar things exist for most of the other languages... * various organizations have defined viewpoints on terminology; for instance firefighters want you to say 'flammable' because people might get the morphology wrong on 'inflammable'; in the army you could be sexually harassed if you call your "Rifle" your "Gun" -- Paul Houle (607) 539 6254 paul.houle on Skype ontology2(a)gmail.com http://legalentityidentifier.info/lei/lookup

9 years, 3 months

[Wikidata-l] Questions about Wikidata subclass-of, instance-of, and part-of, guidelines

by james＠j1w.xyz

I've noticed that many of the items in Wikidata have as their root item (via combinations of subclass-of, instance-of, and part-of relationships) the item named "entity" (Q35120). For example, the musical note named "E♭" is rooted at "entity" via various paths, including the following: E♭ (Q633464) instance-of note (Q263478) part-of musical notation (Q233861) part-of music (Q638) subclass-of art (Q735) subclass-of process (Q3249551) subclass-of event (Q1190554) subclass-of entity (Q35120) I have the following questions related to the example above: 1) Is the intent that all items in Wikidata have as their root item (via combinations of subclass-of, instance-of, and part-of relationships) the item named "entity" (Q35120)? 2) Is it the intent that any item (e.g. music) that is a subclass-of another item (e.g. art), have as its root item (via only subclass-of relationships) the item named "entity" (Q35120)? To summarize these questions as they apply to a project on which I'm working: Given a tree that has as its root the item named "entity" (Q35120), would/should it be possible to navigate from the root to every item (QXXXXXXX) in the Wikidata database? Regards, James Weaver

9 years, 4 months

[Wikidata-l] Conflict of Interest policy for Wikidata

by Denny Vrandečić

I found out the other day that there's an item about myself, and I wanted to edit it, and got a weird feeling about it. So I raised the question on the project chat https://www.wikidata.org/wiki/Wikidata:Project_chat#COI_and_editing and got told that an RFC would be a good idea. So I tried one. I don't think it has caused problems yet, though - but it might be easier to discuss these things before they cause problems. https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Conflict_of_Int… Input is highly appreciated.

9 years, 4 months

[Wikidata-l] Queries related question : relationship beetween queries

by Thomas Douillard

Hi, I got an open question about Wikidata concepts, partly related to the idea of selecting a templates wrt. a query for placeholder articles. One question about this idea is : what to do when several templates are possible for an item, for example the item with no article is in the result set of several queries associated with article stubs templates, say: * the query "anything", that could be associated with a totally generic templates that shows a Wikibase page like article templates that shows all the claims about this item * a more specific query "living organism" * another even more specific query like "animal" * ... In this example each more specific query results is obviously a subset of each more generic one. In such cases it could be useful to choose the template of the most specific one. In the same spirit of the "subclass of" property we can create (or reuse it) for the queries. But as no property has in Wikibase itself a meaning, this means the choice of the template would not be possible using raw Wikibase concepts, which partly breaks the interests of the idea. Any thoughts about this problem ? Cheers, TomT0m

9 years, 4 months

Re: [Wikidata-l] Organisation Linked Data dataset

by Kingsley Idehen

On 12/18/14 10:29 AM, Bianca Pereira wrote: > Hello, > > I was looking for a Linked Data dataset and I always find very > challenging to find a dataset related to a given subject. I am > specifically looking for a dataset (not wikipedia-based such as > DBPedia, Yago and so on) that contains information about organizations. > > I found there was a Linked CrunchBase [1] at some point but it does > not seem to work anymore. Does anyone know any Linked Data dataset > about organizations? > > Best Regards, > Bianca > > [1] http://cbasewrap.ontologycentral.com/ Bianca, Here's a little Linked Open Data follow-your-nose sequence that exposes data sources that could be relevant to your quest: 1. https://legalentityidentifier.info/lei/get/787RXPR0UX0O0XUXPZ81 -- https://legalentityidentifier.info/lei/lookup/ 2. http://lod.openlinksw.com/describe/?url=http%3A%2F%2Frdf.basekb.com%2Fns%2F… -- LOD Cloud Cache 3. http://lod.openlinksw.com/describe/?url=http%3A%2F%2Frdf.basekb.com%2Fns%2F… -- instances of Company (from :baseKB) 4. http://lod.openlinksw.com/c/IMQDH3A -- by industry. There's a lot more in the LOD Cloud cache, assuming this piques your interest :) -- Regards, Kingsley Idehen Founder & CEO OpenLink Software Company Web: http://www.openlinksw.com Personal Weblog 1: http://kidehen.blogspot.com Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen Twitter Profile: https://twitter.com/kidehen Google+ Profile: https://plus.google.com/+KingsleyIdehen/about LinkedIn Profile: http://www.linkedin.com/in/kidehen Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this

9 years, 4 months

[Wikidata-l] How are language fallbacks working for you?

by Lydia Pintscher

Hey folks :) Language fallbacks are live for a bit now and I'd like to get some feedback. Are they working for you? Anything where they're horribly wrong? Anything where they could be better? I'd especially like feedback from people who use Wikidata in a language other than English. Everything I am already aware of is linked as "blocked by" tickets of https://phabricator.wikimedia.org/T76216 Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

9 years, 4 months

[Wikidata-l] How to deal with bad data / vandalism?

by Ben McCann

Hi, I see the page for Google lists Mario Artero as the founder and CEO of Google. This is definitely not correct. What is the procedure for fixing this? I can't tell from the history page which edit added that. http://www.wikidata.org/wiki/Q95 Thanks, Ben -- about.me/benmccann

9 years, 4 months

[Wikidata-l] Scaling Wikidata: success means making the pie bigger

by Lydia Pintscher

Hey folks :) Happy new year everyone. It is surely going to be an exciting one for Wikidata. Over the last weeks I've been thinking a lot about the year ahead of us. One thing is clear to me: It will be about successfully scaling Wikidata and keeping all the amazing things we have achieved in the process. I've written down my thoughts on the subject in a blog post to kick off some thinking and discussions: http://blog.wikimedia.de/2015/01/03/scaling-wikidata-success-means-making-t… Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

9 years, 4 months

[Wikidata-l] weekly summary #139

by Lydia Pintscher

Hey folks :) Here's your summary of what happened around Wikidata over the past 2 weeks. Enjoy! Discussions - Provide your input on article placeholders <https://www.wikidata.org/wiki/Wikidata:Article_placeholder_input> based on Wikidata - new RfC: CoI editing <https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Conflict_of_Int…> - Anonymous artists <https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Visual_arts#Anonymo…> at wiki project visual arts Events <https://www.wikidata.org/wiki/Wikidata:Events>/Press/Blogs <https://www.wikidata.org/wiki/Wikidata:Press_coverage> - We need to start talking about scaling Wikidata over the next months and years: Scaling Wikidata: success means making the pie bigger <http://blog.wikimedia.de/2015/01/03/scaling-wikidata-success-means-making-t…> - The next big step for Wikidata—forming a hub for researchers <https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2014-12-31/News_…> - wikidata4research4all <http://gondwanaland.com/mlog/2014/12/21/wd4r4a/> - There was a well-attended Wikidata meetup at 31C3 <http://events.ccc.de/congress/2014/wiki/Session:Wikidata:_The_free_knowledg…> Other Noteworthy Stuff - Wikidata Query <https://wdq.wmflabs.org/> now runs on multiple, load-balanced, self-restarting servers. Thanks to Yuvi for the help. - Vandals suck. Don't know how to help us fight them? Here are two pages to get you started: Wikidata:Vandalism <https://www.wikidata.org/wiki/Wikidata:Vandalism> and Wikidata:Abuse filter <https://www.wikidata.org/wiki/Wikidata:Abuse_filter>. - The Wikidata BEACON generator <https://tools.wmflabs.org/wikidata-todo/beacon.php> was updated by Magnus. It now uses all properties with “formatter URL”, so always up-to-date with target URLs. It is faster, too. - Number of visual artworks on Wikidata by institution <http://www.zone47.com/crotos/collections.php> Did you know? - Newest properties: Terminologia Histologica (TH) <https://www.wikidata.org/wiki/Property:P1694>, Terminologia Embryologica (TE) <https://www.wikidata.org/wiki/Property:P1693>, ICD-9-CM <https://www.wikidata.org/wiki/Property:P1692>, operations and procedures key (OPS) <https://www.wikidata.org/wiki/Property:P1691>, ICD-10-PCS <https://www.wikidata.org/wiki/Property:P1690>, central government debt as a percent of GDP <https://www.wikidata.org/wiki/Property:P1689>, AniDB identifier <https://www.wikidata.org/wiki/Property:P1688>, main property <https://www.wikidata.org/wiki/Property:P1687>, awarded for work <https://www.wikidata.org/wiki/Property:P1686>, Pokémon browser number <https://www.wikidata.org/wiki/Property:P1685>, inscription <https://www.wikidata.org/wiki/Property:P1684>, quote <https://www.wikidata.org/wiki/Property:P1683>, subtitle <https://www.wikidata.org/wiki/Property:P1680>, BBC Your Paintings artwork identifier <https://www.wikidata.org/wiki/Property:P1679>, has vertex figure <https://www.wikidata.org/wiki/Property:P1678>, index case of <https://www.wikidata.org/wiki/Property:P1677>, number suspected <https://www.wikidata.org/wiki/Property:P1676>, number probable <https://www.wikidata.org/wiki/Property:P1675>, number confirmed <https://www.wikidata.org/wiki/Property:P1674>, general formula <https://www.wikidata.org/wiki/Property:P1673>, this taxon is source of <https://www.wikidata.org/wiki/Property:P1672>, Route number <https://www.wikidata.org/wiki/Property:P1671>, LAC identifier <https://www.wikidata.org/wiki/Property:P1670>, CONA <https://www.wikidata.org/wiki/Property:P1669>, ATCvet <https://www.wikidata.org/wiki/Property:P1668>, TGN <https://www.wikidata.org/wiki/Property:P1667>, Chess Club ID <https://www.wikidata.org/wiki/Property:P1666>, Chess Games ID <https://www.wikidata.org/wiki/Property:P1665>, Cycling Database ID <https://www.wikidata.org/wiki/Property:P1664>, ProCyclingStats ID <https://www.wikidata.org/wiki/Property:P1663>, DOI Prefix <https://www.wikidata.org/wiki/Property:P1662>, Alexa rank <https://www.wikidata.org/wiki/Property:P1661>, has index case <https://www.wikidata.org/wiki/Property:P1660>, related property <https://www.wikidata.org/wiki/Property:P1659>, number of faces <https://www.wikidata.org/wiki/Property:P1658>, MPAA film rating <https://www.wikidata.org/wiki/Property:P1657>, unveiled by <https://www.wikidata.org/wiki/Property:P1656>, station number <https://www.wikidata.org/wiki/Property:P1655>, wing configuration <https://www.wikidata.org/wiki/Property:P1654>, TERYT municipality code <https://www.wikidata.org/wiki/Property:P1653>, referee <https://www.wikidata.org/wiki/Property:P1652>, YouTube video identifier <https://www.wikidata.org/wiki/Property:P1651>, BBF identifier <https://www.wikidata.org/wiki/Property:P1650>, Korean Movie Database ID <https://www.wikidata.org/wiki/Property:P1649>, Dictionary of Welsh Biography ID <https://www.wikidata.org/wiki/Property:P1648> - Showcase items <https://www.wikidata.org/wiki/Wikidata:Showcase_items>: Iggy Azalea <https://www.wikidata.org/wiki/Q2748803>, Helsinki <https://www.wikidata.org/wiki/Q1757> Development - Happy new year! :) It'll be a great one for Wikidata! - Have you filed bugs in the past? Awesome! It'd be super helpful if you have a look at your old bugs and see if they are still relevant. You can find them at https://phabricator.wikimedia.org/maniphest/query/authored/ (make sure you're logged in on Phabricator) You can see all open bugs related to Wikidata here <https://phabricator.wikimedia.org/maniphest/query/4RotIcw5oINo/#R> Monthly Tasks - Hack on one of these <https://phabricator.wikimedia.org/maniphest/query/R8GRzX1eH0tb/#R>. - Help fix these items <https://www.wikidata.org/wiki/Wikidata:The_Game/Flagged_items> which have been flagged using Wikidata - The Game. - Help develop the next summary here! <https://www.wikidata.org/wiki/Wikidata:Status_updates/Next> - Contribute to a Showcase item <https://www.wikidata.org/wiki/Wikidata:Showcase_items> Anything to add? Please share! :) Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

9 years, 4 months

[Wikidata-l] subclass-of vs. instance-of

by james＠j1w.xyz

Having followed Freebase and the announcement about migrating to Wikidata, I'm trying to get up to speed on the structure of Wikidata. I read on the site that relationships such as subclass-of and instance-of are managed by everyone. Looking at automobile (Q1420) I see that it is both subclass-of and instance-of motor road vehicle, which I imagine is not correct. Are there processes in place to manage the integrity of these structural components of Wikidata? Thanks, James Weaver

9 years, 4 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Wikidata January 2015