Hello!
We have been working on a Java API for reading Wikipedia XML dumps for sometime and it's now reasonably functional. Check out:
http://code.google.com/p/wikixmlj/
*Features:*
- Easy access to important elements of a Wikipedia page - Also provides interfaces for Wiki text parsing. - Memory efficient - SAX interface for parsing - Lazy loading of files for DOM - Callback support with DOM - Directly operate on compressed wikipedia dumps (gzip/bzip2/native xml supported)
Best, Delip
Really interesting contribution.
I'll give it a try and let you know :).
Thanks!
F --
--- El mié, 18/11/09, Delip Rao deliprao@gmail.com escribió:
De: Delip Rao deliprao@gmail.com Asunto: [Wiki-research-l] Java API for reading Wikipedia XML dumps Para: wiki-research-l@lists.wikimedia.org Fecha: miércoles, 18 de noviembre, 2009 18:20 Hello!We have been working on a Java API for reading Wikipedia XML dumps for sometime and it's now reasonably functional. Check out:
http://code.google.com/p/wikixmlj/Features: Easy access to important elements of a Wikipedia pageAlso provides interfaces for Wiki text parsing.Memory efficient
SAX interface for parsingLazy loading of files for DOMCallback support with DOMDirectly operate on compressed wikipedia dumps (gzip/bzip2/native xml supported)Best,
Delip
-----Adjunto en línea a continuación-----
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Delip Rao wrote:
Hello!
We have been working on a Java API for reading Wikipedia XML dumps for sometime and it's now reasonably functional. Check out:
http://code.google.com/p/wikixmlj/
*Features:*
* Easy access to important elements of a Wikipedia page * Also provides interfaces for Wiki text parsing. * Memory efficient o SAX interface for parsing o Lazy loading of files for DOM o Callback support with DOM * Directly operate on compressed wikipedia dumps (gzip/bzip2/native xml supported)
Interesting. Is it usable by people who don't know what a "DOM parser" or "SAX interface" is?
On Tue, Nov 24, 2009 at 10:46 PM, Piotr Konieczny piokon@post.pl wrote:
Delip Rao wrote:
Hello!
We have been working on a Java API for reading Wikipedia XML dumps for sometime and it's now reasonably functional. Check out:
http://code.google.com/p/wikixmlj/
*Features:*
* Easy access to important elements of a Wikipedia page * Also provides interfaces for Wiki text parsing. * Memory efficient o SAX interface for parsing o Lazy loading of files for DOM o Callback support with DOM * Directly operate on compressed wikipedia dumps (gzip/bzip2/native xml supported)
Interesting. Is it usable by people who don't know what a "DOM parser" or "SAX interface" is?
Thanks Piotr and everyone else for the interest. Yes, the API does not require any understanding of the underlying parser. Hope it fits your needs but let me know if there is anything I can accommodate.
Happy Thanksgiving!
-- Piotr Konieczny
"The problem about Wikipedia is, that it just works in reality, not in theory."
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
wiki-research-l@lists.wikimedia.org