Delip Rao wrote:
Hello!
We have been working on a Java API for reading Wikipedia XML dumps for sometime and it's now reasonably functional. Check out:
http://code.google.com/p/wikixmlj/
*Features:*
* Easy access to important elements of a Wikipedia page * Also provides interfaces for Wiki text parsing. * Memory efficient o SAX interface for parsing o Lazy loading of files for DOM o Callback support with DOM * Directly operate on compressed wikipedia dumps (gzip/bzip2/native xml supported)
Interesting. Is it usable by people who don't know what a "DOM parser" or "SAX interface" is?