Wikidata May 2013

wikidata@lists.wikimedia.org

36 participants
27 discussions

[Wikidata-l] Creating an item from another item

by David Cuenca

Hi there! Now we have properties that require a datatype item. In terms of interface interaction I miss the option to create an item from that field, a "create item" button that would add the missing entry, and let adding properties to it. It is similar to the case of adding qualifiers to an item. How it is now: - A property is added to an item - Type to see if the item exists - If not, click on create new item (link on the left bar) - (1st screen) enter label and description - (2nd screen) enter label and description - copy and paste item number into the property from the item that required it A quicker method could be: - A property is added to an item - Type to see if the item exists - Show a "create item" message at the bottom of the suggestion list - The item is created with the specified label - Further properties can be added to the newly created item in the same way as statements What is your opinion on this method? Cheers David --Micru

10 years, 11 months

[Wikidata-l] 1 en article appears in 2 Q entities

by Jiang BIAN

Hi, Is this expected result on wikidata or just a dirty result caused by unknown issues during porting interlanguage link? http://en.wikipedia.org/wiki/Chitete appears on the following Q entities: http://www.wikidata.org/wiki/Q11203372 http://www.wikidata.org/wiki/Q11202679 Also I noticed on http://sw.wikipedia.org/wiki/Chitete, there is no lnterlang tool on the left panel, although it appears in the Q entity: http://www.wikidata.org/wiki/Q11202679 -- Jiang BIAN This email may be confidential or privileged. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it went to the wrong person. Thanks.

10 years, 11 months

[Wikidata-l] weekly summary #58

by Lydia Pintscher

Heya folks :) I've got a fresh summary of what happened around Wikidata this week for you: http://meta.wikimedia.org/wiki/Wikidata/Status_updates/2013_05_17 Thanks everyone who's helping me write these at http://www.wikidata.org/wiki/Wikidata:Status_updates/Next Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Community Communications for Technical Projects Wikimedia Deutschland e.V. Obentrautstr. 72 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

10 years, 12 months

[Wikidata-l] Code review of the Wikidata code

by Denny Vrandečić

Hey, in order to sanity check the code we have written in the Wikidata project, we have asked an external company to review our code and discuss it with the team. The effort was very instructional for us. We want to share the results with you. The report looks dauntingly big, but this is mostly due to the appendixes. The first 20 pages are quite worth a read. Since the Wikidata code is an extension to MediaWiki, a number of the issues raised are also relevant to the MediaWiki code proper. I would hope that this code review can be regarded as a contribution towards the discussion of the status of our shared code-base as well. I will unfortunately not be at the Hackathon, but a number of Wikidata developers will, please feel free to chat them up. Daniel Kinzler is also preparing a presentation to discuss a few lessons learned and ideas at the Hackathon. The review is available through this page: < http://meta.wikimedia.org/wiki/Wikidata/Development/Code_Review> Cheers, Denny -- Project director Wikidata Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin Tel. +49-30-219 158 26-0 | http://wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

10 years, 12 months

[Wikidata-l] Sex Ratios in Wikidata and Authorty Files

by Klein,Max

Hello All, I wanted to share with you a visualisation about sex ratios in Wikidata, after the so called categorygate New York Times article. I'm really excited about how Wikidata is going to allow us to compare Claims data against the interwiki link section. Is there going to be official support for this in phase 3? This is how Items with Property:Sex (P21) compare by language: [http://hangingtogether.org/wp-content/uploads/2013/05/WikidataSexRatiosByLa…] In the full blog post I compare it against Sex data from Library Authority Files if you're curious http://hangingtogether.org/?p=2877 Maximilian Klein Wikipedian in Residence, OCLC +17074787023

10 years, 12 months

[Wikidata-l] Question about wikipedia categories.

by Chris Maloney

I am just curious if there has ever been discussion about the potential for reimplementing / replacing the category system in Wikipedia with semantic tagging in WikiData. It seem to me that the recent kerfuffle with regards to "American women writers" would not have happened if the pages were tagged with simple RDF assertions instead of these convoluted categories. I know, of course, that it would be a huge undertaking, but I just don't see how the category system can continue to scale (I'm amazed it has scaled as well as it has already, of course). I am trying to learn more about wikidata, and have perused the various infos and FAQs for the last two hours, and can't find any discussion of this particular issue. -- Chris

11 years

[Wikidata-l] weekly summary #57

by Lydia Pintscher

Heya folks :) Here's your summary of what happened around Wikidata this week: http://meta.wikimedia.org/wiki/Wikidata/Status_updates/2013_05_10 Help write the next one? http://www.wikidata.org/wiki/Wikidata:Status_updates/Next Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Community Communications for Technical Projects Wikimedia Deutschland e.V. Obentrautstr. 72 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

11 years

Re: [Wikidata-l] Question about wikipedia categories.

by Paul A. Houle

Statistical methods can deal with black swans, but you've got to get away from normal distributions and also model the risk that your model is wrong. Since training sets come from the same place sausage comes from, training sets in machine learning rarely teach the algorithm the correct prior distribution of the class. Punch a new prior into the system and it will perform much better. Some kinds of sampling biases can be somewhat overcome. Involvement of multiple people smoothes out individual bias. (Kurzweil's project of stealing a human soul with a neural network is already being scoops by projects that are stealing statistical models of many souls.) Language zone Wikipedias are obviously biased towards the viewpoint of people in that language zone. Mostly that's a good thing, because a Chinese knowledge base that reflected an Anglophone bias would seem unnatural to Chinese speakers. And that's the point. Useful systems don't "eliminate bias" but they are given the bias that they need in order to do their job. I agree categories are most useful when they are the categories you need. The toolbox above can help you estimate these with precision so high that it's difficult to measure. Arnold S isn't the best case for categories because humans, bodybuilders, places, chemicals and such are well ontologized. Look at the collection that comes up for the word "Intersection", http://en.wikipedia.org/wiki/Intersection Most of these are connected to the larger mass through just a few categories that would be hard to express as restriction types. Wikipedia is reasonable to require concepts to have a category because really, if you want to assert something exists and can't find some category that this thing is a member of, I wouldn't be so sure that this thing exists. I'm not sure if there is anything I can't do with the current situation, but bear in mind that I'm going to look at DBpedia, Wikidata and Freebase facts too and be willing to do data cleaning processing and hand cleaning of results that I cannot accept. It's a tricky and somewhat expensive process (though it's cheaper than conventional ontology construction), so cleaner data makes this process cheaper and quicker and available to more end users personalized to their own needs to define the categories they need.

11 years

[Wikidata-l] Pulling item name translation from a linked Wikipedia article name

by Jan Kučera

Hi, will this be implemented? Sounds like a reasonable feature to me... Maybe it could be actually prohibited to locally specify (or actually override) a name of Wikidata item in certain Wwikidata viewing language if that item has Wikipedia link in that particular language... We could save a lot of editing work like this and resolve name differencies, am I right? Kozuch

11 years

Re: [Wikidata-l] Question about wikipedia categories.

by Paul A. Houle

From my viewpoint, biases are an issue of statistical sampling. Wikipedia is an encyclopedia by humans for humans so of course it has a anthropocentric background, in which the mass of all the concepts swirling around the Earth like an atmosphere curves the graph, keeping the Sun in orbit around our world. I find Wikipedia categories useful today, warts and all. They've got two things going for them: (1) Class and out-of-class dichotomies are the atom of ontology. Well-designed categories have an operational definition that allows class members to be determined with practically perfect precision (2) They are densely populated. Look at the categories on this guy's web page http://en.wikipedia.org/wiki/Arnold_Schwarzenegger each one of those categories states a useful and correct fact, even if the organization of those facts is entirely haphazard. For instance, it would be better if he was coded as an "American" and an "Austrian", "Californian", "Los Angelino" and he is also a "Bodybuilder" and an "Actor" and a zillion other things and then infer that he was a "American Bodybuilder", "Austrian Actor" and such. But it's not that easy because he was an "Austrian soldier" but not an "American soldier" and I'd feel uncomfortable calling him an "Austrian Politician". A lot of nuance is encoded in that sticky mess. It's very easy to analyze those categories and produce desired concepts like "Car" and "Bodybuilder" from junky categories like "Front-wheel drive vehicle," "General Motors Concept Cars", "Bodybuilder Actor" and "Actor Bodybuilder", in fact, that's exactly what the semantic web is for. There is so much rich and precise information in the categories that you get great results despite sampling error caused by low recall in the categories. I'd love to see better structure, but not at the cost of fact density or precision. If we can take advantage of the knowledge in the graph to exert gentle pressure that improves categorization in Wikipedia that would be great. It's definitely time for the social industry to move beyond "tags"

11 years

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Wikidata May 2013