2009/10/23 William Pietri william@scissor.com:
George Herbert wrote:
This discussion brings to mind several historical threads. I wonder if a project to simply mine the whole article contents and provide a DB of some sort with the articles and infobox contents would be worthwhile. Develop a specific parser and generate and publish the complete set of article-infobox-(key-value) sets...
I don't know anybody on the data side at Metaweb anymore, but I know that they did something like that to import a lot of structured Wikipedia data into their Freebase project. They publish some sort of data dump here: http://download.freebase.com/wex/ Perhaps they'd be willing to open-source their parser.
They're right into open source, I suspect they would.
- d.