On Wed, Jan 28, 2009 at 2:52 PM, David Gerard dgerard@gmail.com wrote:
http://ebiquity.umbc.edu/blogger/2009/01/27/extracting-wikipedia-infoboxes-v...
Some infoboxes are designed for that sort of thing, some aren't. Some have footnotes for example, and lots of flexibility, which makes it harder, but not impossible, to parse the data. And some projects (for good reason) still virulently reject infoboxes, mainly because people who don't understand a particular subject try to force simplified statements (i.e. sentences, not words or numbers) inside an infobox, and lose nuance and context in the process, devaluing the article as a whole (reading the full text is ultimately more educational).
And not all such data is in infoboxes:
http://en.wikipedia.org/wiki/Wikipedia:Metadata
Something I tried to improve, which still needs expansion and TLC.
Some areas of data are in separate templates (not infobox templates) and some are in categories.
I'd like to add some of the data-heavy infoboxes to that list, like the ones in maths, physics, astronomy, geography, geology and chemistry, and the other 'hard' sciences. Are any of those infoboxes organised for the extraction of data the way the geographical co-ords templates are?
Carcharoth