jayvdb created this task. jayvdb added a subscriber: jayvdb. jayvdb added a project: Pywikibot-Wikidata. jayvdb changed Security from none to none.
TASK DESCRIPTION Some data in Wikipedia is easier to extract from the rendered html than from the templates, and it puts the values into microformats. There may also be other webpages which use microformats which could be used to extract information and add it to wikidata. I expect this should be done in a new script, but it would be based on script harvest_templates.py
https://en.wikipedia.org/wiki/Help:Microformats .
birthdate and deathdate are good examples, where on English Wikipedia they are placed in special spans, using a constant format.
view-source:https://en.wikipedia.org/wiki/Benjamin_Franklin
<span class="bday">1706-01-17</span> <span class="dday deathdate">1790-04-17</span>
The {{Persondata}} template is relatively easy to parse the template, but it is also well labelled in the HTML. https://en.wikipedia.org/wiki/Wikipedia:Persondata
<table id="persondata" class="persondata noprint" style="border:1px solid #aaa; display:none; speak:none;"> <tr> <th colspan="2"><a href="/wiki/Wikipedia:Persondata" title="Wikipedia:Persondata">Persondata</a></th> </tr> <tr> <td class="persondata-label" style="color:#aaa;">Name</td> <td>Franklin, Benjamin</td> </tr> <tr> <td class="persondata-label" style="color:#aaa;">Alternative names</td> <td></td> </tr> <tr> <td class="persondata-label" style="color:#aaa;">Short description</td> <td>American printer, writer, politician</td> </tr> <tr> <td class="persondata-label" style="color:#aaa;">Date of birth</td> <td>January 17, 1706</td> </tr> <tr> <td class="persondata-label" style="color:#aaa;">Place of birth</td> <td>Boston, Massachusetts</td> </tr> <tr> <td class="persondata-label" style="color:#aaa;">Date of death</td> <td>April 17, 1790</td> </tr> <tr> <td class="persondata-label" style="color:#aaa;">Place of death</td> <td><a href="/wiki/Philadelphia" title="Philadelphia">Philadelphia</a>, Pennsylvania</td> </tr> </table>
More at https://en.wikipedia.org/wiki/Wikipedia:Metadata
A list of templates which generate microformats is at https://en.wikipedia.org/wiki/Category:Templates_generating_microformats , and sample pages can be found by using 'whatlinkshere'.
e.g. vcard with fn org can be seen in the source of the infobox here:
view-source:https://en.wikipedia.org/wiki/Manchester_Ship_Canal
TASK DETAIL https://phabricator.wikimedia.org/T78416
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: jayvdb Cc: Aklapper, jayvdb, pywikipedia-bugs