---- Magnus Manske magnus.manske@web.de wrote:
Kirill Lokshin schrieb:
On 7/11/06, Magnus Manske magnus.manske@web.de wrote:
Are we prefering {{Persondata}} or {{Infobox biography}}?
Should they be merged, somehow?
I'm asking this because I'm writing a tool to generate Persondata from article text as a copy&paste text. Can scan whole categories at once.
One ({{Persondata}}) is raw metadata, but is applicable to _all_ biographies; the other ({{Infobox biography}}) is designed for display to users, but has been replaced in certain types of biographies (politicians, royalty, military leaders) with more specialized templates.
I'll stick to Persondata, then.
Ideally, we could have a tool that would be able to parse at least the major infobox types and fill out the persondata fields; but I'm not sure how much work it would be, considering that there is a certain variation in how different infoboxes deal with particular data.
I've done raw text extraction for the German Personendaten. While the en version is still experimental, it works OK in many cases: http://tools.wikimedia.de/~magnus/persondata.php?category=1984_deaths
I could add popular infoboxes (any suggestions?) and German Personendaten, if a de article exists.
Magnus
Hello Magnus,
Could you explain a little more about how you plan to use the tool?
I have some concerns about using it on Biographies of living people on Wikipedia-en. We have too much unverified information added to these article. I strongly believe that each article needs to be examined closely to verify the content meets our Wikipedia:BLP guidelines before it is added to a template.
Regards, Sydney Poore
poore5@adelphia.net wrote:
Hello Magnus,
Could you explain a little more about how you plan to use the tool?
I have some concerns about using it on Biographies of living people on Wikipedia-en. We have too much unverified information added to these article. I strongly believe that each article needs to be examined closely to verify the content meets our Wikipedia:BLP guidelines before it is added to a template.
Well, it can't run fully automatically, there has to be a user making the changes. I could prevent the tool from showing a template for people with no death date (=probably living;-) if there's great concern about that. Otherwise, the tool doesn't generate new data; it merely tries to extract the data already in the article and put it into a form that is more machine-readable.
Magnus
On 7/11/06, Magnus Manske magnus.manske@web.de wrote:
poore5@adelphia.net wrote:
Hello Magnus,
Could you explain a little more about how you plan to use the tool?
I have some concerns about using it on Biographies of living people on Wikipedia-en. We have too much unverified information added to these article. I strongly believe that each article needs to be examined closely to verify the content meets our Wikipedia:BLP guidelines before it is added to a template.
Well, it can't run fully automatically, there has to be a user making the changes. I could prevent the tool from showing a template for people with no death date (=probably living;-) if there's great concern about that. Otherwise, the tool doesn't generate new data; it merely tries to extract the data already in the article and put it into a form that is more machine-readable.
Magnus
Magnus: I'm glad to hear someone is working on the extracting-persondata-from-article idea, which has been bruited about for a while (but never actually acted on); but how exactly does it work? Does it parse the first bolded run of text as the name, the first two years as birth and death dates, and ditto for locations and birth and death places, etc? Is it a AWB program, a pywikipedia (my personal hope), a standalone program, what? I am rather interested in such a program. Also, will it run on Linux and be Free?
~maru
maru dubshinki schrieb:
Magnus: I'm glad to hear someone is working on the extracting-persondata-from-article idea, which has been bruited about for a while (but never actually acted on); but how exactly does it work? Does it parse the first bolded run of text as the name, the first two years as birth and death dates, and ditto for locations and birth and death places, etc? Is it a AWB program, a pywikipedia (my personal hope), a standalone program, what? I am rather interested in such a program. Also, will it run on Linux and be Free?
It's a PHP script running on the toolserver: http://tools.wikimedia.de/~magnus/persondata.php
Add "category=XYZ" or "title=XYZ" to have it scan a whole category or a single article, respectively.
Currently, it ignores the "bold" marker but uses my algorithm to find the first text paragraph of an article, as implemented in commons_sumitup.php (same location).
As all my tools, it is GPL. You can get the source code at the above URL.
Magnus
On 7/12/06, Magnus Manske magnus.manske@web.de wrote:
It's a PHP script running on the toolserver: http://tools.wikimedia.de/~magnus/persondata.php
Add "category=XYZ" or "title=XYZ" to have it scan a whole category or a single article, respectively.
Currently, it ignores the "bold" marker but uses my algorithm to find the first text paragraph of an article, as implemented in commons_sumitup.php (same location).
As all my tools, it is GPL. You can get the source code at the above URL.
Magnus
Forgive my cluelessness, but what do you mean by "Add 'category...'"? There seems to be no text input box; I can't figure out how and whether that should be encoded into the URL; and I'm fairly but not very sure you don't mean download the php and add those lines, since variables in PHP usually start with $ don't they?
~maru
maru dubshinki schrieb:
On 7/12/06, Magnus Manske magnus.manske@web.de wrote:
It's a PHP script running on the toolserver: http://tools.wikimedia.de/~magnus/persondata.php
Add "category=XYZ" or "title=XYZ" to have it scan a whole category or a single article, respectively.
Currently, it ignores the "bold" marker but uses my algorithm to find the first text paragraph of an article, as implemented in commons_sumitup.php (same location).
As all my tools, it is GPL. You can get the source code at the above URL.
Magnus
Forgive my cluelessness, but what do you mean by "Add 'category...'"? There seems to be no text input box; I can't figure out how and whether that should be encoded into the URL; and I'm fairly but not very sure you don't mean download the php and add those lines, since variables in PHP usually start with $ don't they?
http://tools.wikimedia.de/~magnus/persondata.php?category=XYZ
Magnus
I would say that most of the very basic data persondata is concerned with is largely correct. I have found that simple things like birth and death dates are usually correct. However, since this tool is human mediated, it may be a good opportunity to verify basic information provided by many of our biographies.
On 11/07/06, poore5@adelphia.net poore5@adelphia.net wrote:
---- Magnus Manske magnus.manske@web.de wrote:
Kirill Lokshin schrieb:
On 7/11/06, Magnus Manske magnus.manske@web.de wrote:
Are we prefering {{Persondata}} or {{Infobox biography}}?
Should they be merged, somehow?
I'm asking this because I'm writing a tool to generate Persondata
from
article text as a copy&paste text. Can scan whole categories at once.
One ({{Persondata}}) is raw metadata, but is applicable to _all_ biographies; the other ({{Infobox biography}}) is designed for display to users, but has been replaced in certain types of biographies (politicians, royalty, military leaders) with more specialized templates.
I'll stick to Persondata, then.
Ideally, we could have a tool that would be able to parse at least the major infobox types and fill out the persondata fields; but I'm not sure how much work it would be, considering that there is a certain variation in how different infoboxes deal with particular data.
I've done raw text extraction for the German Personendaten. While the en version is still experimental, it works OK in many cases: http://tools.wikimedia.de/~magnus/persondata.php?category=1984_deaths
I could add popular infoboxes (any suggestions?) and German Personendaten, if a de article exists.
Magnus
Hello Magnus,
Could you explain a little more about how you plan to use the tool?
I have some concerns about using it on Biographies of living people on Wikipedia-en. We have too much unverified information added to these article. I strongly believe that each article needs to be examined closely to verify the content meets our Wikipedia:BLP guidelines before it is added to a template.
Regards, Sydney Poore
WikiEN-l mailing list WikiEN-l@Wikipedia.org To unsubscribe from this mailing list, visit: http://mail.wikipedia.org/mailman/listinfo/wikien-l