Re: [Wikitech-l] Datamining infoboxes

23 Oct 2009


      On Fri, Oct 23, 2009 at 08:37, George Herbert george.herbert@gmail.com wrote:
...
I wonder if a project to simply mine the whole article contents and
provide a DB of some sort with the articles and infobox contents would
be worthwhile.  Develop a specific parser and generate and publish the
complete set of article-infobox-(key-value) sets...
That's what DBpedia is doing.
The extracted data can be found here, in N-Triples and CSV format:
http://wiki.dbpedia.org/Downloads
The entries in the row labelled 'Infoboxes' are files
that contain the extracted values of all template
properties in each page of a Wikipedia instance.
For large Wikipedias like en, the unzipped files are
pretty big (several GB).
Most of the extraction code can be found in these
PHP classes:
https://dbpedia.svn.sourceforge.net/svnroot/dbpedia/extraction/extractors/In...
https://dbpedia.svn.sourceforge.net/svnroot/dbpedia/extraction/extractors/in...
Christopher

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Datamining infoboxes