---------- Forwarded message ---------- From: Jeffrey V. Merkey jmerkey@wolfmountaingroup.com Date: 04-Apr-2007 11:30 Subject: [Wikitech-l] Wikipedia Thesaurus Posted enwiki-20070206 To: Wikimedia Foundation Mailing List foundation-l@lists.wikimedia.org, Wikimedia developers wikitech-l@lists.wikimedia.org
A machine generated thesaurus constructed from interwiki and wiki links and tags contained in the XML dumps has been posted at
ftp://ftp.wikigadugi.org/wiki/thesaurus.
This build analyzes the 20070206 enwiki dumps and contructs a thesaurus based upon relationships between wiki links and interwiki links contained within the XML dumps. Included are raw files of links, lexicon, and XML dump created based upon the embedded Thesaurus inside of Wikipedia.
Text file of stripped links and interwiki links with tags:
ftp://www.wikigadugi.org/wiki/thesaurus/wikipedia-thesaurus-20070206.links.bz2
Machine generated text lexicon of stripped links and interwiki links:
ftp://www.wikigadugi.org/wiki/thesaurus/wikipedia-thesaurus-20070206.lex.bz2
Machine Generated XML MediaWiki dump which can be imported as a basic Thesaurus has a few title problems, I will finish it up in the morning and post to the thesarus area.
Jeff
_______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l