I'm happy to introduce you to Python-libzim.
Python-libzim package allows you to read/write ZIM files in Python. It
provides a shallow Python interface on top of the libzim C++ library. It
supports out-of-the-box macOS and GNU/Linux. For the other OSes you will
have to compile the libzim manually.
After Node.js, this is the second scripting language for which openZIM
proposes a binding of its famous reference implementation of the ZIM
open specification. This move is really important to allow more people
to benefit of the file format and ZIM files already published.
On our side, Python-libzim was critical for a few other projects which
are currently running. In the next months a few critical scrapers will
be migrated from zimwriterfs to python-libzim and benefit of a sensitive
code simplification and speed-up.
Install easily python-libzim with pip and give it a try:
Kiwix - Wikipedia Offline & more
* Web: https://kiwix.org/
* Twitter: https://twitter.com/KiwixOffline
* Wiki: https://wiki.kiwix.org/
Congratulations on completing the huge task of producing the EN Wikip.
I also look forward to pyzim.
Looking at the notes I see an example of reading an article. However, I
would like to be able to read the zim metadata. Is this possible? Even
further afield would it be possible to extract the search index so as to
merge it with another index, even one prepared from another source?
As always great work,
On Sat, Jul 4, 2020 at 8:00 AM <offline-l-request(a)lists.wikimedia.org>
> Send Offline-l mailing list submissions to
> Today's Topics:
> 1. At long last, a new version of offline enwp
> (Stephane Coillet-Matillon)
> 2. Re: At long last, a new version of offline enwp
> (Emmanuel Engelhart)
> 3. [OPENZIM] Introducing Python-libzim (Emmanuel Engelhart)
> 4. Re: [OPENZIM] Introducing Python-libzim (Wilfredo Rodríguez)
quick announcement for a major success on our side: we finally released late last night an updated version of the English Wikipedia. The last full version (ie. with images) we had was from October 2018 (!), and since then we had been plagued by regressions, bugs, resource limitations and probably some very dark magic.
The new .zim file adds 900,000 articles (6.1 vs. 5.2 millions) and a healthy 11 Gb in size (89 vs. 78 Gb). The numbers are somewhat misleading because we need to include internal links and redirects, which brings the total to 100+ million interlinked items. Emmanuel will have more details on the hurdles that he had to deal with.
Updates will now run on a monthly basis, which is another major improvement: we had initially planned on bimonthly updates as a single run used to take up to three weeks. It can now be done in 5-6 days \o/
Congrats to everyone involved or who supported us one way or another, including the Foundation with the new & bigger servers they recently gave us access to. Hopefully now we can move on to newer problems.
 http://download.kiwix.org/zim/wikipedia/wikipedia_en_all_maxi_2020-06.zim <http://download.kiwix.org/zim/wikipedia/wikipedia_en_all_maxi_2020-06.zim>
You can also torrent it by adding .torrent at the end (seeders welcome, as a matter of fact)
A recent discussion around high-volume APIs led to a ticket about
generating HTML dumps on the foundation servers:
Kelson weighed in there, highlighting that it isn't clear everyone working
on this is aware of the existing kiwix pipeline. Seemed worth mentioning
Sj and I have discussed and decided the following.
1) We have decided that the offline wikimedians UG will not answer the
Main argument is purely technical. Due to the complex design of the
survey, we consider it too difficult to collect your collective opinions
and draw relevant conclusions to inject into the survey.
We could have made it simpler is filling a survey that would reflect Sj
and I opinion, but we did not feel we should do that.
So, if you want to provide your opinion and answer the WMF survey,
please do so at the individual level (we strongly suggest you do)
2) With regards to signing the COOL (Community Open Letter on renaming),
we would like to ask you to please vote by saying :
either "Yes, we should sign the letter"
or "No, we should not sign the letter"
Neutral opinion will be counted as No
If we have at least 7 members voting and at least 2/3 YES, we will sign
If we have at least 7 members voting and less than 2/3 YES, we will not
sign the letter.
If we have less than 7 members voting, we will not sign the letter.
Please vote by answering to this email (public statement)
Link to the letter :
At the moment, 34 affiliates signed it (and 335 individuals)
Flo and Sj