On 1/17/07, Harish TM harish.tmh@gmail.com wrote:
Just to further clarify what it is that I am looking for - Lets say I want to PRINT out a copy of wikipedia ( I know thats insane - but I need text to be as clean as if I were printing it out ), with the articles indexed as per Title and category, how would I get that data??
The easiest way would probably be: 1) download the dumps 2) install mediawiki 3) have a bot scrape from *your own* mediawiki installation (printable version) 4) use the categorylinks table to sort by category (or scrape the categories at the bottom of each article)
On the other hand, you could always write your own parser :).
Anthony