Dear NG,
I use the html-download of wikipedia to extract a net of main- and
subcategories with the connected articles.
To achieve this, I parse all Category~*.* pages.
Now it happens, that categories with count of (i.e) subcategories
greater than 200 aren't represented completely in the html-dump. The
page only contains the first 200 elements, further elements are not in
anymore. The link "next 200" redirects to itself and actually, no page
with the "next 200" can be found.
So I can only extract the first 200 elements. Can anything be done about
this?
Thanks in advance,
Frank