Hey folks :)
Happy new year everyone. It is surely going to be an exciting one for Wikidata. Over the last weeks I've been thinking a lot about the year ahead of us. One thing is clear to me: It will be about successfully scaling Wikidata and keeping all the amazing things we have achieved in the process.
I've written down my thoughts on the subject in a blog post to kick off some thinking and discussions: http://blog.wikimedia.de/2015/01/03/scaling-wikidata-success-means-making-th...
Cheers Lydia
Hoi, I understand why you used the pie notion however, Wikidata is more like a 3D ball [1]. A ball that is dense in places and more open in others. If there is one challenge that we have to face and where we will find no help elsewhere it is very much in the representation of our data in other languages.
Language diversity is in my opinion the hardest problem to tackle. Interconnecting to Wikipedia or to other sources is something that is well understood. It will become increasingly easy and obvious as we add more statements. The labels however.. there is not much help to be found. It exists in dictionaries but that is a route that is sadly closed for now.
Yes 2015 will be exciting. Thanks, GerardM
[1] http://ultimategerardm.blogspot.nl/2015/01/wikidata-is-more-like-ball-and-no...
On 3 January 2015 at 15:48, Lydia Pintscher lydia.pintscher@wikimedia.de wrote:
Hey folks :)
Happy new year everyone. It is surely going to be an exciting one for Wikidata. Over the last weeks I've been thinking a lot about the year ahead of us. One thing is clear to me: It will be about successfully scaling Wikidata and keeping all the amazing things we have achieved in the process.
I've written down my thoughts on the subject in a blog post to kick off some thinking and discussions:
http://blog.wikimedia.de/2015/01/03/scaling-wikidata-success-means-making-th...
Cheers Lydia
-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
I like that phrase
"Is the data going to be used? Data that is not used is exponentially harder to maintain because less people see it"
To take a specific example I've been building a name and identifier recognition system based on data from Freebase that is focused on certain kinds of spatial regions. I'm going to underline that this is not an academic project (where, in the worst case, I might proudly announce that I got 81.2% accuracy and that this beats the last group that got 80.3%) but a commercial system that (1) needs to be hyperaccurate (at least three nines if not four) and (2) where I need to fix anything that management or customers find wrong right away.
Another aspect of it is that I can get (barely) two nine accuracy for entities while only resolving about 40% of place names that appear once because these entities are concentrated in certain places. Many of the most popular regions need data corrections to resolve correctly because they tend to be national capitals where there are multiple geographic entities that occupy the same land area or they are ontologically troubled islands.
Looking in Freebase I don't find 100% of the identifiers that are used in my data set, and another issues is that some containment relationships are missing because sometimes @fbase couldn't figure out the relative hierarchy of places.
I address both of those issues by applying "fact patches" to my knowledge base.
In principle I could push these changes back to @fbase, but since mqlwrite is broken and @fbase is heading towards EOL, I won't.
There are other problems though, that I end up addressing in my rule base, or that I have add different vocabulary if I want to solve them. For instance, I get a lot of references to "Hong Kong Island" which is not to be confused with
http://en.wikipedia.org/wiki/Islands_District
it turns out HKI has four administrative districts. With a little more logic I can probably figure out which district these things are in, but maybe it doesn't make any real difference to end users and I'm not sure mail would be delivered to the "Central and Western District", so I could make HKI an "honorary" administrative district (something I wouldn't push back to upstream)
So you notice two themes here. Some of my patches are things that would belong in Wikidata because they are filling in fields that Wikidata already has and following conventional conventions.
There are other patches I need to make to reflect requirements of my application that I'd never want to push upstream because they are "correct" in the context of my application but "incorrect" or "arguable" in general.
-----
One of the troubles people have consistently had with DBpedia has been trying to get a list of the top cities in the world (by one or another metric) It's hard to do for two reasons:
(1) some facts are absent in DBpedia, and (2) many of the biggest/most important "cities" in the world such as London and Tokyo are not, technically, cities.
Success in this project, therefore, requires patching absent or incorrect facts in DBpedia, but also the creation of a vernacular concept of "city" which reflects the "common sense" perception here.
On Sat, Jan 3, 2015 at 9:48 AM, Lydia Pintscher < lydia.pintscher@wikimedia.de> wrote:
Hey folks :)
Happy new year everyone. It is surely going to be an exciting one for Wikidata. Over the last weeks I've been thinking a lot about the year ahead of us. One thing is clear to me: It will be about successfully scaling Wikidata and keeping all the amazing things we have achieved in the process.
I've written down my thoughts on the subject in a blog post to kick off some thinking and discussions:
http://blog.wikimedia.de/2015/01/03/scaling-wikidata-success-means-making-th...
Cheers Lydia
-- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata
Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l