George Herbert wrote:
This discussion brings to mind several historical threads.
I wonder if a project to simply mine the whole article contents and provide a DB of some sort with the articles and infobox contents would be worthwhile. Develop a specific parser and generate and publish the complete set of article-infobox-(key-value) sets...
I don't know anybody on the data side at Metaweb anymore, but I know that they did something like that to import a lot of structured Wikipedia data into their Freebase project. They publish some sort of data dump here:
http://download.freebase.com/wex/
Perhaps they'd be willing to open-source their parser.
William