Date: Mon, 6 May 2013 15:21:23 -0400
From: voldrani(a)gmail.com
To: wikidata-l(a)lists.wikimedia.org
Subject: Re: [Wikidata-l] Question about wikipedia categories.
Michael, that's really closely in line with what I was thinking. Why
don't you take a crack at improving
I am not sure if this is just a crazy pipe dream or not, but I can't
help but be a little bit excited at the possibility that it might
actually get done, and I think it would be a huge improvement.
On Mon, May 6, 2013 at 2:32 PM, Michael Hale <hale.michael.jr(a)live.com> wrote:
I agree they are extremely useful for many
scenarios already. Earlier today
I sorted the human proteins category by popularity, and by reading the
articles for the most popular ones that I didn't know I felt like I was
browsing the table of contents of a live molecular biology book that was
more comprehensive than any existing book in print. I do think we are on
track for undeniable improvements though. Arnold Schwarzenegger is in about
40 categories right now. His Wikidata item has about 20 statements.
Eventually, at least all of the information you can gleam from those
categories will be contained in the statements on Wikidata. Then we could
update the pages so that the links at the bottom aren't to relevant
categories, but are to relevant queries. At first, it would look sort of the
same. You can click on the 20th-century American actors category now, and
you could click on the 20th-century American actors query in the future. But
when you get to the query page you can easily specialize or generalize the
query with another click in many more directions than are currently
supported in the category system. Right now, I can specialize the pages I
see by going to the subcategory for American silent film actors. I can
generalize the pages I see by going to a supercategory that drops the
American requirement, the actor requirement, or the 20th century
requirement. But if your first click away from the article doesn't take you
to a category, but instead takes you to a query page you now have many more
options. For example, you could delete the 20th-century requirement and add
a politician requirement to the actor requirement. Then you are looking at
Americans that are actors and politicians, which you can't do in the
category system.
From: paul(a)ontology2.com
To: wikidata-l(a)lists.wikimedia.org
Date: Mon, 6 May 2013 18:08:04 +0000
Subject: Re: [Wikidata-l] Question about
wikipedia categories.
From my viewpoint, biases are an issue of statistical sampling.
Wikipedia is an encyclopedia by humans for humans so of course it has a
anthropocentric background, in which the mass of all the concepts swirling
around the Earth like an atmosphere curves the graph, keeping the Sun in
orbit around our world.
I find Wikipedia categories useful today, warts and all. They've got
two things going for them:
(1) Class and out-of-class dichotomies are the atom of ontology.
Well-designed categories have an operational definition that allows class
members to be determined with practically perfect precision
(2) They are densely populated.
Look at the categories on this guy's web page
http://en.wikipedia.org/wiki/Arnold_Schwarzenegger
each one of those categories states a useful and correct fact, even if the
organization of those facts is entirely haphazard.
For instance, it would be better if he was coded as an "American" and an
"Austrian", "Californian", "Los Angelino" and he is also a
"Bodybuilder"
and an "Actor" and a zillion other things and then infer that he was a
"American Bodybuilder", "Austrian Actor" and such. But it's not
that easy
because he was an "Austrian soldier" but not an "American soldier"
and I'd
feel uncomfortable calling him an "Austrian Politician". A lot of nuance
is
encoded in that sticky mess.
It's very easy to analyze those categories and produce desired concepts
like
"Car" and "Bodybuilder" from junky categories like "Front-wheel
drive
vehicle," "General Motors Concept Cars", "Bodybuilder Actor" and
"Actor
Bodybuilder", in fact, that's exactly what the semantic web is for.
There is so much rich and precise information in the categories that you
get
great results despite sampling error caused by low recall in the
categories.
I'd love to see better structure, but not at the cost of fact density or
precision.
If we can take advantage of the knowledge in the graph to exert gentle
pressure that improves categorization in Wikipedia that would be great.
It's definitely time for the social industry to move beyond "tags"
_______________________________________________
Wikidata-l mailing list
Wikidata-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l
_______________________________________________
Wikidata-l mailing list
Wikidata-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l
_______________________________________________
Wikidata-l mailing list
Wikidata-l(a)lists.wikimedia.org