Delip Rao wrote:
Hello!
We have been working on a Java API for reading Wikipedia XML dumps for
sometime and it's now reasonably functional. Check out:
http://code.google.com/p/wikixmlj/
*Features:*
* Easy access to important elements of a Wikipedia page
* Also provides interfaces for Wiki text parsing.
* Memory efficient
o SAX interface for parsing
o Lazy loading of files for DOM
o Callback support with DOM
* Directly operate on compressed wikipedia dumps (gzip/bzip2/native
xml supported)
Interesting. Is it usable by people who don't know what a "DOM parser"
or "SAX interface" is?
--
Piotr Konieczny
"The problem about Wikipedia is, that it just works in reality, not in
theory."