Hello!
We have been working on a Java API for reading Wikipedia XML dumps for
sometime and it's now reasonably functional. Check out:
http://code.google.com/p/wikixmlj/
*Features:*
- Easy access to important elements of a Wikipedia page
- Also provides interfaces for Wiki text parsing.
- Memory efficient
- SAX interface for parsing
- Lazy loading of files for DOM
- Callback support with DOM
- Directly operate on compressed wikipedia dumps (gzip/bzip2/native xml
supported)
Best,
Delip