[Mediawiki-l] parsing Allpages?

Juhana Sadeharju kouhia at nic.funet.fi
Sat Aug 26 19:44:52 UTC 2006


>From: "Rob Church" <robchur at gmail.com>
>
[ ... ]

Really that Special:AllpagesRaw would help. What was your opinion on it?

I'm downloading a few wikis not maintained by me. I'm not registered
as an author in them.

The export help told to check the pages from Special:Allpages.
I want do that automatically, and now it is not that simple.
I have started writing a C code to download the Allpages but
it is not a simple task. I can do trickery and be able to extract
all the pages that way, but this is trickery of a questionable sort.
(E.g., first search for "<hr", then the pages are in the next table.)
 
My wikiget will download both the html versions and xml versions
of the pages. And the images, small and large. The urls will be
converted to relative. The filenames will be converted from
"Category:Functions" to "Category-Functions.html" because
lynx, galeon, mozilla does not like ":" chars in filenames.

Juhana
-- 
  http://music.columbia.edu/mailman/listinfo/linux-graphics-dev
  for developers of open source graphics software



More information about the MediaWiki-l mailing list