On Tue, Nov 24, 2009 at 10:46 PM, Piotr Konieczny <piokon(a)post.pl> wrote:
Delip Rao wrote:
We have been working on a Java API for reading Wikipedia XML dumps for
sometime and it's now reasonably functional. Check out:
* Easy access to important elements of a Wikipedia page
* Also provides interfaces for Wiki text parsing.
* Memory efficient
o SAX interface for parsing
o Lazy loading of files for DOM
o Callback support with DOM
* Directly operate on compressed wikipedia dumps (gzip/bzip2/native
Interesting. Is it usable by people who don't know what a "DOM parser"
or "SAX interface" is?
Thanks Piotr and everyone else for the interest. Yes, the API does not
require any understanding of the underlying parser. Hope it fits your needs
but let me know if there is anything I can accommodate.
"The problem about Wikipedia is, that it just works in reality, not in
Wiki-research-l mailing list