Re: [WikiEN-l] I want to do an analysis of wikipedia

4 Jun 2008


      A dump is indeed your best bet, especially if you intend to spider all of
2,400,000 articles.
In response to whether anything similar has been done before, you might
want to have a look at the six-degrees project
(http://toolserver.org/~river/pages/projects/six-degrees) which was a
replication-based implementation of something similar to what you're
suggesting. It's since been taken down, but there's a similar tool at
http://www.netsoc.tcd.ie/~mu/wiki/ ("find shortest paths") which queries
the last database dump for the closest path between two articles.
- H
Andrew Gray wrote (Wed 04/06/2008 11:00):
...
2008/6/4 Sylvan Arevalo khakiducks@gmail.com:
...
Oh and if anyone has suggestions on the best way to make the database 
of hyperlinks that reference each other (spidering all of wikipedia, 
or is there a better way to do it?)
Spidering is bad!
(It's both time-consuming for you and very annoying for us)
You can get the dataset you're looking for via dumps.wikimedia.org - you
want the enwiki pagelinks.sql.gz file, I believe. Not entirely sure what
you'd do with it after that, but it ought to have the data you're looking
for in a suitably stripped-down form.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

Re: [WikiEN-l] I want to do an analysis of wikipedia