On Tue, Nov 24, 2009 at 10:46 PM, Piotr Konieczny <piokon@post.pl> wrote:

Delip Rao wrote:
> Hello!
>
> We have been working on a Java API for reading Wikipedia XML dumps for
> sometime and it's now reasonably functional. Check out:
>
> http://code.google.com/p/wikixmlj/
>
> *Features:*
>
> * Easy access to important elements of a Wikipedia page
> * Also provides interfaces for Wiki text parsing.
> * Memory efficient

> o SAX interface for parsing
> o Lazy loading of files for DOM
> o Callback support with DOM

> * Directly operate on compressed wikipedia dumps (gzip/bzip2/native
> xml supported)

Interesting. Is it usable by people who don't know what a "DOM parser"
or "SAX interface" is?

Thanks Piotr and everyone else for the interest. Yes, the API does not require any understanding of the underlying parser. Hope it fits your needs but let me know if there is anything I can accommodate.

Happy Thanksgiving!

--
Piotr Konieczny

"The problem about Wikipedia is, that it just works in reality, not in
theory."

_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l