[WikiEN-l] I want to do an analysis of wikipedia
Andrew Gray
shimgray at gmail.com
Wed Jun 4 10:00:17 UTC 2008
2008/6/4 Sylvan Arevalo <khakiducks at gmail.com>:
> Oh and if anyone has suggestions on the best way to make the database of
> hyperlinks that reference each other (spidering all of wikipedia, or
> is there a better way to do it?)
Spidering is bad!
(It's both time-consuming for you and very annoying for us)
You can get the dataset you're looking for via dumps.wikimedia.org -
you want the enwiki pagelinks.sql.gz file, I believe. Not entirely
sure what you'd do with it after that, but it ought to have the data
you're looking for in a suitably stripped-down form.
--
- Andrew Gray
andrew.gray at dunelm.org.uk
More information about the WikiEN-l
mailing list