[WikiEN-l] I want to do an analysis of wikipedia

Andrew Gray shimgray at gmail.com
Wed Jun 4 10:00:17 UTC 2008


2008/6/4 Sylvan Arevalo <khakiducks at gmail.com>:

> Oh and if anyone has suggestions on the best way to make the database of
> hyperlinks that reference each other (spidering all of wikipedia, or
> is there a better way to do it?)

Spidering is bad!

(It's both time-consuming for you and very annoying for us)

You can get the dataset you're looking for via dumps.wikimedia.org -
you want the enwiki pagelinks.sql.gz file, I believe. Not entirely
sure what you'd do with it after that, but it ought to have the data
you're looking for in a suitably stripped-down form.

-- 
- Andrew Gray
 andrew.gray at dunelm.org.uk



More information about the WikiEN-l mailing list