maru dubshinki schrieb:
Magnus: I'm glad to hear someone is working on the extracting-persondata-from-article idea, which has been bruited about for a while (but never actually acted on); but how exactly does it work? Does it parse the first bolded run of text as the name, the first two years as birth and death dates, and ditto for locations and birth and death places, etc? Is it a AWB program, a pywikipedia (my personal hope), a standalone program, what? I am rather interested in such a program. Also, will it run on Linux and be Free?
It's a PHP script running on the toolserver: http://tools.wikimedia.de/~magnus/persondata.php
Add "category=XYZ" or "title=XYZ" to have it scan a whole category or a single article, respectively.
Currently, it ignores the "bold" marker but uses my algorithm to find the first text paragraph of an article, as implemented in commons_sumitup.php (same location).
As all my tools, it is GPL. You can get the source code at the above URL.
Magnus