Now we have properties that require a datatype item. In terms of interface
interaction I miss the option to create an item from that field, a "create
item" button that would add the missing entry, and let adding properties to
it. It is similar to the case of adding qualifiers to an item.
How it is now:
- A property is added to an item
- Type to see if the item exists
- If not, click on create new item (link on the left bar)
- (1st screen) enter label and description
- (2nd screen) enter label and description
- copy and paste item number into the property from the item that required
A quicker method could be:
- A property is added to an item
- Type to see if the item exists
- Show a "create item" message at the bottom of the suggestion list
- The item is created with the specified label
- Further properties can be added to the newly created item in the same way
What is your opinion on this method?
in order to sanity check the code we have written in the Wikidata project,
we have asked an external company to review our code and discuss it with
the team. The effort was very instructional for us.
We want to share the results with you. The report looks dauntingly big, but
this is mostly due to the appendixes. The first 20 pages are quite worth a
Since the Wikidata code is an extension to MediaWiki, a number of the
issues raised are also relevant to the MediaWiki code proper. I would hope
that this code review can be regarded as a contribution towards the
discussion of the status of our shared code-base as well.
I will unfortunately not be at the Hackathon, but a number of Wikidata
developers will, please feel free to chat them up. Daniel Kinzler is also
preparing a presentation to discuss a few lessons learned and ideas at the
The review is available through this page: <
Project director Wikidata
Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
Tel. +49-30-219 158 26-0 | http://wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/681/51985.
I wanted to share with you a visualisation about sex ratios in Wikidata, after the so called categorygate New York Times article. I'm really excited about how Wikidata is going to allow us to compare Claims data against the interwiki link section. Is there going to be official support for this in phase 3?
This is how Items with Property:Sex (P21) compare by language:
In the full blog post I compare it against Sex data from Library Authority Files if you're curious http://hangingtogether.org/?p=2877
Wikipedian in Residence, OCLC
I am just curious if there has ever been discussion about the
potential for reimplementing / replacing the category system in
Wikipedia with semantic tagging in WikiData. It seem to me that the
recent kerfuffle with regards to "American women writers" would not
have happened if the pages were tagged with simple RDF assertions
instead of these convoluted categories. I know, of course, that it
would be a huge undertaking, but I just don't see how the category
system can continue to scale (I'm amazed it has scaled as well as it
has already, of course).
I am trying to learn more about wikidata, and have perused the various
infos and FAQs for the last two hours, and can't find any discussion
of this particular issue.
Statistical methods can deal with black swans, but you've got to get
away from normal distributions and also model the risk that your model is
Since training sets come from the same place sausage comes from,
training sets in machine learning rarely teach the algorithm the correct
prior distribution of the class. Punch a new prior into the system and it
will perform much better.
Some kinds of sampling biases can be somewhat overcome. Involvement of
multiple people smoothes out individual bias. (Kurzweil's project of
stealing a human soul with a neural network is already being scoops by
projects that are stealing statistical models of many souls.)
Language zone Wikipedias are obviously biased towards the viewpoint of
people in that language zone. Mostly that's a good thing, because a
Chinese knowledge base that reflected an Anglophone bias would seem
unnatural to Chinese speakers.
And that's the point. Useful systems don't "eliminate bias" but they
are given the bias that they need in order to do their job.
I agree categories are most useful when they are the categories you
need. The toolbox above can help you estimate these with precision so high
that it's difficult to measure.
Arnold S isn't the best case for categories because humans,
bodybuilders, places, chemicals and such are well ontologized. Look at
the collection that comes up for the word "Intersection",
Most of these are connected to the larger mass through just a few
categories that would be hard to express as restriction types. Wikipedia
is reasonable to require concepts to have a category because really, if you
want to assert something exists and can't find some category that this thing
is a member of, I wouldn't be so sure that this thing exists.
I'm not sure if there is anything I can't do with the current situation,
but bear in mind that I'm going to look at DBpedia, Wikidata and Freebase
facts too and be willing to do data cleaning processing and hand cleaning of
results that I cannot accept. It's a tricky and somewhat expensive process
(though it's cheaper than conventional ontology construction), so cleaner
data makes this process cheaper and quicker and available to more end users
personalized to their own needs to define the categories they need.
will this be implemented? Sounds like a reasonable feature to me... Maybe
it could be actually prohibited to locally specify (or actually override) a
name of Wikidata item in certain Wwikidata viewing language if that item
has Wikipedia link in that particular language... We could save a lot of
editing work like this and resolve name differencies, am I right?
From my viewpoint, biases are an issue of statistical sampling.
Wikipedia is an encyclopedia by humans for humans so of course it has a
anthropocentric background, in which the mass of all the concepts swirling
around the Earth like an atmosphere curves the graph, keeping the Sun in
orbit around our world.
I find Wikipedia categories useful today, warts and all. They've got
two things going for them:
(1) Class and out-of-class dichotomies are the atom of ontology.
Well-designed categories have an operational definition that allows class
members to be determined with practically perfect precision
(2) They are densely populated.
Look at the categories on this guy's web page
each one of those categories states a useful and correct fact, even if the
organization of those facts is entirely haphazard.
For instance, it would be better if he was coded as an "American" and an
"Austrian", "Californian", "Los Angelino" and he is also a "Bodybuilder"
and an "Actor" and a zillion other things and then infer that he was a
"American Bodybuilder", "Austrian Actor" and such. But it's not that easy
because he was an "Austrian soldier" but not an "American soldier" and I'd
feel uncomfortable calling him an "Austrian Politician". A lot of nuance is
encoded in that sticky mess.
It's very easy to analyze those categories and produce desired concepts like
"Car" and "Bodybuilder" from junky categories like "Front-wheel drive
vehicle," "General Motors Concept Cars", "Bodybuilder Actor" and "Actor
Bodybuilder", in fact, that's exactly what the semantic web is for.
There is so much rich and precise information in the categories that you get
great results despite sampling error caused by low recall in the categories.
I'd love to see better structure, but not at the cost of fact density or
If we can take advantage of the knowledge in the graph to exert gentle
pressure that improves categorization in Wikipedia that would be great.
It's definitely time for the social industry to move beyond "tags"