2009/10/23 William Pietri <william(a)scissor.com>om>:
George Herbert wrote:
> This discussion brings to mind several historical
threads.
> I wonder if a project to simply mine the whole article contents and
> provide a DB of some sort with the articles and infobox contents would
> be worthwhile. Develop a specific parser and generate and publish the
> complete set of article-infobox-(key-value) sets...
I don't know anybody on the data side at Metaweb
anymore, but I know
that they did something like that to import a lot of structured
Wikipedia data into their Freebase project. They publish some sort of
data dump here:
http://download.freebase.com/wex/
Perhaps they'd be willing to open-source their parser.
They're right into open source, I suspect they would.
- d.