Re: [Wikitech-l] Error in html-dump of category-pages

30 May 2007

On Wed, 30 May 2007 16:56:11 +0200, Frank Schumacher wrote:

FS: Dear NG,
FS: 
FS: I use the html-download of wikipedia to extract a net of main- and 
FS: subcategories with the connected articles.
FS: 
FS: To achieve this, I parse all Category~*.* pages.
FS: 
FS: Now it happens, that categories with count of (i.e) subcategories 
FS: greater than 200 aren't represented completely in the html-dump. The 
FS: page only contains the first 200 elements, further elements are not in 
FS: anymore. The link "next 200" redirects to itself and actually, no page 
FS: with the "next 200" can be found.
FS: 
FS: So I can only extract the first 200 elements. Can anything be done about 
FS: this?

You can work with the xml dumps. Import them into mysql and have a look to
http://meta.wikimedia.org/wiki/Database_layout

--
Emmanuel

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Error in html-dump of category-pages