Re: [Wikitech-l] Fwd: Reg. Research using Wikipedia

10 Mar 2011


      James Linden wrote:
...
...
Why do you need to access the live wikipedia for this?
Using categorylinks.sql and page.sql you should be able to fetch the
same data. Probably faster.
In my research, the answer to this question is two-fold
A) Creating a local copy of wikipedia (using mediawiki and various
import tools) is quite a process, and requires a significant
investment of time and research unto itself.
You don't need to do a full copy to eg. fetch infoboxes.
...
B) A few months ago, I pulled 333 semi-random articles from the live
API -- of those, 329 of them have significant enough changes since
20100312 dump (which was the newest dump at the time). A new check
against the 20110115 dump has similar percentage.
Getting updated data may be a reason, but I don't think that's what
Ramesh wanted.
Plus, you wanted 333 articles, not the 3 million...

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Fwd: Reg. Research using Wikipedia