This is actually quite a cool solution!!!
It is a little roundabout but its good to know I have alternatives!!
Harish
On 1/18/07, Anthony <wikitech(a)inbox.org> wrote:
On 1/17/07, Harish TM <harish.tmh(a)gmail.com> wrote:
Just to further clarify what it is that I am
looking for - Lets say I
want to PRINT out a copy of wikipedia ( I know thats insane - but I
need text to be as clean as if I were printing it out ), with the
articles indexed as per Title and category, how would I get that
data??
The easiest way would probably be:
1) download the dumps
2) install mediawiki
3) have a bot scrape from *your own* mediawiki installation (printable
version)
4) use the categorylinks table to sort by category (or scrape the
categories at the bottom of each article)
On the other hand, you could always write your own parser :).
Anthony