Hello,
I am making an application in Python which will make heavy use of Wikipedia data. I've downloaded the 10.4GB .zim archive of Wikipedia's English articles, so that I can run this applicaiton offline, and so I don't have to rely on extensive hits to Wikipedia's API or website.
Perhaps my google-fu is very rusty, but I can't find any documentation or tools for accessing the zim file from python. I've found pyzim (https://code.launchpad.net/zim/pyzim), but this appears to be a python build of the zim reader and writer, not a python library to allow me to search and access information in the zim file from my own code. If does include a library of such functions, I can't find them, or any documentation or examples, etc. Ideally, I'd like to be able to search the zim file just like searching wikipedia (the zim file is already indexed as I understand it?), and I'd like to be able to pull out articles, in json form for example.
Can someome point me to a resource to get me started with this?
I hope this is an ok question for this mailing list. If not, any suggestions for a better place to ask this question would be most welcome!
Thanks in advance!
mediawiki-l@lists.wikimedia.org