[Foundation-l] Wikipedia Thesaurus Posted enwiki-20070206

Jeffrey V. Merkey jmerkey at wolfmountaingroup.com
Wed Apr 4 10:30:13 UTC 2007


A machine generated thesaurus constructed from interwiki and wiki links 
and tags contained in the XML dumps has been posted at

ftp://ftp.wikigadugi.org/wiki/thesaurus.

This build analyzes the 20070206 enwiki dumps and contructs a thesaurus 
based upon relationships between wiki links and
interwiki links contained within the XML dumps.  Included are raw files 
of links, lexicon, and XML dump created based
upon the embedded Thesaurus inside of Wikipedia.

Text file of stripped links and interwiki links with tags:

ftp://www.wikigadugi.org/wiki/thesaurus/wikipedia-thesaurus-20070206.links.bz2

Machine generated text lexicon of stripped links and interwiki links:

ftp://www.wikigadugi.org/wiki/thesaurus/wikipedia-thesaurus-20070206.lex.bz2

Machine Generated XML MediaWiki dump which can be imported as a basic 
Thesaurus has a few title problems, I will finish it
up in the morning and post to the thesarus area.

Jeff



More information about the foundation-l mailing list