Hi
I'm happy to introduce you to Python-libzim.
Python-libzim package allows you to read/write ZIM files in Python. It
provides a shallow Python interface on top of the libzim C++ library. It
supports out-of-the-box macOS and GNU/Linux. For the other OSes you will
have to compile the libzim manually.
After Node.js, this is the second scripting language for which openZIM
proposes a binding of its famous reference implementation of the ZIM
open specification. This move is really important to allow more people
to benefit of the file format and ZIM files already published.
On our side, Python-libzim was critical for a few other projects which
are currently running. In the next months a few critical scrapers will
be migrated from zimwriterfs to python-libzim and benefit of a sensitive
code simplification and speed-up.
Install easily python-libzim with pip and give it a try:
https://pypi.org/project/libzim/
Happy coding!
Regards
Emmanuel
--
Kiwix - Wikipedia Offline & more
* Web: https://kiwix.org/
* Twitter: https://twitter.com/KiwixOffline
* Wiki: https://wiki.kiwix.org/
Congratulations on completing the huge task of producing the EN Wikip.
I also look forward to pyzim.
Looking at the notes I see an example of reading an article. However, I
would like to be able to read the zim metadata. Is this possible? Even
further afield would it be possible to extract the search index so as to
merge it with another index, even one prepared from another source?
As always great work,
Tim
On Sat, Jul 4, 2020 at 8:00 AM <offline-l-request(a)lists.wikimedia.org>
wrote:
> Send Offline-l mailing list submissions to
> offline-l(a)lists.wikimedia.org
>
>
> Today's Topics:
>
> 1. At long last, a new version of offline enwp
> (Stephane Coillet-Matillon)
> 2. Re: At long last, a new version of offline enwp
> (Emmanuel Engelhart)
> 3. [OPENZIM] Introducing Python-libzim (Emmanuel Engelhart)
> 4. Re: [OPENZIM] Introducing Python-libzim (Wilfredo Rodríguez)
>
>
>
Hi everyone,
quick announcement for a major success on our side: we finally released late last night an updated version of the English Wikipedia[1]. The last full version (ie. with images) we had was from October 2018 (!), and since then we had been plagued by regressions, bugs, resource limitations and probably some very dark magic.
The new .zim file adds 900,000 articles (6.1 vs. 5.2 millions) and a healthy 11 Gb in size (89 vs. 78 Gb). The numbers are somewhat misleading because we need to include internal links and redirects, which brings the total to 100+ million interlinked items. Emmanuel will have more details on the hurdles that he had to deal with.
Updates will now run on a monthly basis, which is another major improvement: we had initially planned on bimonthly updates as a single run used to take up to three weeks. It can now be done in 5-6 days \o/
Congrats to everyone involved or who supported us one way or another, including the Foundation with the new & bigger servers they recently gave us access to. Hopefully now we can move on to newer problems.
Stephane
[1] http://download.kiwix.org/zim/wikipedia/wikipedia_en_all_maxi_2020-06.zim <http://download.kiwix.org/zim/wikipedia/wikipedia_en_all_maxi_2020-06.zim>
You can also torrent it by adding .torrent at the end (seeders welcome, as a matter of fact)
A recent discussion around high-volume APIs led to a ticket about
generating HTML dumps on the foundation servers:
https://phabricator.wikimedia.org/T254275
Kelson weighed in there, highlighting that it isn't clear everyone working
on this is aware of the existing kiwix pipeline. Seemed worth mentioning
here.
SJ
--
Hello all
Sj and I have discussed and decided the following.
1) We have decided that the offline wikimedians UG will not answer the
WMF survey.
Main argument is purely technical. Due to the complex design of the
survey, we consider it too difficult to collect your collective opinions
and draw relevant conclusions to inject into the survey.
We could have made it simpler is filling a survey that would reflect Sj
and I opinion, but we did not feel we should do that.
So, if you want to provide your opinion and answer the WMF survey,
please do so at the individual level (we strongly suggest you do)
--->
https://meta.wikimedia.org/wiki/Communications/Wikimedia_brands/2030_moveme…
2) With regards to signing the COOL (Community Open Letter on renaming),
we would like to ask you to please vote by saying :
either "Yes, we should sign the letter"
or "No, we should not sign the letter"
Neutral opinion will be counted as No
If we have at least 7 members voting and at least 2/3 YES, we will sign
the letter.
If we have at least 7 members voting and less than 2/3 YES, we will not
sign the letter.
If we have less than 7 members voting, we will not sign the letter.
Please vote by answering to this email (public statement)
Deadline Wenesday
Link to the letter :
https://meta.wikimedia.org/wiki/Community_open_letter_on_renaming
At the moment, 34 affiliates signed it (and 335 individuals)
Flo and Sj
I support signing COOL!
On Sun, Jun 28, 2020 at 2:00 PM <offline-l-request(a)lists.wikimedia.org>
wrote:
> Send Offline-l mailing list submissions to
> offline-l(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/offline-l
> or, via email, send a message with subject or body 'help' to
> offline-l-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> offline-l-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Offline-l digest..."
>
>
> Today's Topics:
>
> 1. Rebranding: please give your opinion NOW (Florence Devouard)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sun, 28 Jun 2020 00:59:28 +0200
> From: Florence Devouard <anthere(a)anthere.org>
> To: Using Wikimedia projects and MediaWiki offline
> <offline-l(a)lists.wikimedia.org>
> Subject: [Offline-l] Rebranding: please give your opinion NOW
> Message-ID: <7275054a-ad0a-708e-fcbe-e08788db305f(a)anthere.org>
> Content-Type: text/plain; charset=utf-8; format=flowed
>
> Hello all
>
>
> Sj and I have discussed and decided the following.
>
>
> 1) We have decided that the offline wikimedians UG will not answer the
> WMF survey.
> Main argument is purely technical. Due to the complex design of the
> survey, we consider it too difficult to collect your collective opinions
> and draw relevant conclusions to inject into the survey.
> We could have made it simpler is filling a survey that would reflect Sj
> and I opinion, but we did not feel we should do that.
> So, if you want to provide your opinion and answer the WMF survey,
> please do so at the individual level (we strongly suggest you do)
> --->
>
> https://meta.wikimedia.org/wiki/Communications/Wikimedia_brands/2030_moveme…
>
>
> 2) With regards to signing the COOL (Community Open Letter on renaming),
> we would like to ask you to please vote by saying :
>
> either "Yes, we should sign the letter"
> or "No, we should not sign the letter"
>
> Neutral opinion will be counted as No
>
>
> If we have at least 7 members voting and at least 2/3 YES, we will sign
> the letter.
> If we have at least 7 members voting and less than 2/3 YES, we will not
> sign the letter.
> If we have less than 7 members voting, we will not sign the letter.
>
> Please vote by answering to this email (public statement)
> Deadline Wenesday
>
> Link to the letter :
> https://meta.wikimedia.org/wiki/Community_open_letter_on_renaming
> At the moment, 34 affiliates signed it (and 335 individuals)
>
> Flo and Sj
>
>
>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> Offline-l mailing list
> Offline-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/offline-l
>
>
> ------------------------------
>
> End of Offline-l Digest, Vol 100, Issue 11
> ******************************************
>
--
Michael Graaf, M.I.T.(UCT)
Researcher, Editor & Community
Informatics Practitioner
Mob +27795487242
WhatsApp +27647754342
ORCID 0000-0002-1951-5739
Hello everyone,
We might as well keep this list rolling - it’s been an eventful couple of months and there’s plenty to tell.
Just as COVID-19 lockdowns started to roll out across much of the world, our good friends at Orange (the French Telco) reached out asking for us to roll out Kiwix directly onto their West African network. So yes, here’s the short story of us making offline available online (!)
Background
In a nutshell, it’s easier/faster for a telco to carry data on its own, local network than it is to carry that same amount of data internationally. It does make sense in hindsight, particularly if you think of the internets as a series of tube.
Mission
They asked that we roll out Kiwix and a collection of ZIMs in Arabic, French and English onto their Ivory Coast hub: Orange customers were to be directed to a specific page[1] and would be offered the content at zero-rating or special low rate (markets could chose their pricing model). 11 markets were selected for the operation (mostly sub-saharan Africa).
We rolled-out the whole thing in a few days using Kiwix-serve[2] - most of the time needed was for them to secure a big-ass server and grant us root access. It’s been running smoothly ever since - up to 100,000 users/month at peak, which was nice. Contents deployed were Wikipedia, Khan Academy, Wiktionary, Vikidia and a couple of video channels we also serve as ZIMs.
So what did we learn?
- Kiwix-serve is super easy to install, and can manage large loads robustly;
- Most demanded contents: Wikipedia and Khan Academy, then Wiktionary & Gutenberg library;
- Information circulated around somehow: we’ve had users from 130 countries so far (about 20-30% of total traffic), definitely not bots. A gentleman from An-Najah university in Palestine even reached out asking that we deploy the same thing on their local network.
- The URL that Orange set up was overly long, which probably impacted adoption. We lobbied to get https://kiwix.orange (they own the TLD) but to no avail :-/ There is also a huge difference between markets that communicated on the initiative in a sustained manner (e.g. Liberia) and those who did it as a one-off.
Cookie points
They made a simple but sweet video[3] - in French only but you’ll get the idea.
[1] https://kiwix.campusafrica.gos.orange.com/ <https://kiwix.campusafrica.gos.orange.com/>
[2] https://github.com/kiwix/kiwix-tools <https://github.com/kiwix/kiwix-tools>
[3] https://www.youtube.com/watch?v=2Ug0XEFhByc <https://www.youtube.com/watch?v=2Ug0XEFhByc>
Hi
There is a topic I wanted to talk about here for a long time and for
which I never have achieved to take the time to write something. A few
recent events have been a healthy remember that I should present one our
most recent and most useful tool: Zimfarm.
The Zimfarm is the online tool which is in charge of building and
publishing all our ZIM files. After years of creating ZIM files by
launching scrapers more or less manually, we had to automatise the
process to just be able to scale the operations, ie. publishing more and
more often ZIM files.
The effort started 3 years ago with the support of the WMF but we use it
only since Spring 2019 in production. The tool is now perfectly running
and we fully rely on it now. If we can publish an update of all our
wikis one time a month, this is thanks to this piece of software too.
The Zimfarm is a half-decentralized solution which has a central node
(called "dispatcher") in charge of orchestrating the work to do and
multiple decentralized nodes (called "workers") which run the scraping
tasks.
The dispatcher provides an API to manage the ZIM recipes and tasks, have
a look to https://api.farm.openzim.org/. We have setup a Web frontend on
this API to allow easy mgmt through a Web browser. For a better
transparency, even anonymous users can have a look and monitor what is
going on. Look at https://farm.openzim.org/.
One important point is that, like all the rest of our infrastructure,
the whole system is Dockerized. Which means, this is really easy to
install a Zimfarm worker and we invite anybody having a spare server to
help us to provide offline snapshots of the best of the Web. The
procedure is documented and a few volunteers have already joined in.
Look at https://farm.openzim.org/about for more details.
The development is fully transparent at
https://github.com/openzim/zimfarm. We have a few things which are on
the roadmap which would welcome volunteer Python developers. Look at the
good first issues and make your first PR!
https://github.com/openzim/zimfarm/issues?q=is%3Aissue+is%3Aopen+label%3A%2…
Regards
Emmanuel
--
Kiwix - Wikipedia Offline & more
* Web: https://kiwix.org/
* Twitter: https://twitter.com/KiwixOffline
* Wiki: https://wiki.kiwix.org/
_*New elements*_
1. The online meetings planned last week-end to discuss how to move
forward with regards to the survey were successfully held.
Essie (WMF staff in charge of collecting the survey) attended the first
part of the meeting to answer questions participants could have
Second part of the meeting was "without WMF staff". Long discussions etc.
Main point is that a team (Andrew Lih, Phoebe an Richard) *proposed to
write a Letter to the Board*. A draft was produced and will be made
public tomorrow (I will share the link, it is currently still private,
but Sj and I have access to current version).
*Essentially this Letter asks for a "pause" in the process to allow
further discussions.* Keep in mind that when the meeting was held, the
deadline to answer the survey was June 30th
This letter will be proposed tomorrow for signature for Affiliates.
<------ will have to be discussed from tomorrow on after link publicly
published.
2. In the meanwhile, *the board issued a statement*. That you can read
here :
https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Board_noticeboard/Boar…
Summary and key points
- it was published by Nataliia, and though a "board statement", mostly
read in the "I'. Apparently due to the urgency of the situation and not
all board members being available over the week-end.
- outline that, contrariwise to what Heather said in the exec statement,
the final decision has not been made yet. Likely to be made in August
2020 during the next board meeting
- and note that no decision has been made regarding the naming of the
affiliates
- Natalia mentions the future Wikipedia 20th Anniversary and the former
wish of the Board to get everything fixed before that date (which
actually came a bit oddly in the discussion because very few community
members discussed that in relation to the rebranding process)
3. This morning, Samir sent us an email and said t*he deadline to answer
the survey is extended till July 7th*. He also says
There are 3 office hours this weekwhere the Brand Project Team will
continue to answer questions. All links available here in the news
section :
https://meta.wikimedia.org/wiki/Communications/Wikimedia_brands/2030_moveme…
If you are interested in joining and asking questions, this is the right
time to do it. If you are short of time and/or already made it your time
---> drop
4. *There is a community feedback and straw poll* here *In light of
recent events, including the publication of survey text and naming
proposals, it may benefit the WMF to see how the community feels about
certain naming-related issues, transparently and on-wiki. Therefore, the
poll. This is an informal poll, and does not replace theWMF survey. At
least, the results of this one are publicly visible ;)*
https://meta.wikimedia.org/wiki/Communications/Wikimedia_brands/2030_moveme…
-----------
_*My personal take on this is*_ * that the board already decided a LONG
time ago to rename Wikimedia Foundation into Wikipedia something; and
that they will do that, no matter what. * that the two
communication/brand companies were only hired to facilitate the process,
and take the heat, being blamed for failing to provide good suggestions
or supposingly pushing the board to adopt a new name. Whatever, they are
just the safety valve. They will not take all the heat, but part of it
making the pill easier to go down the throat of the community * that the
process is being "pushed" with a feeling of urgency, which surprises
many because usually we benefit from longer timeframe and covid19
oblige, everything is slowed. But the truth is... the main benefit of
the renaming is likely to be financial, with an easier and better way to
fundraise. With the current crisis, it is likely the future fundraising
season will be bad. Fundraising season start around September. So the
name change should benefit to be done before this year fundraising
season. Additionnaly, the Wikipedia 20th birthday could be an excellent
communication opportunity to promote the new name of the WMF. Hence the
urgency and the unlikeliness that the process significantly slows down *
in comparison, the renaming (or not) of affiliates is perceived as non
urgent and non essential, which is quite logical in the WMF perspective.
So it is possible to cut some slack here to cool down spirits * Natalia
is possibly being the other sacrificed piece in the process.
----------
_*Your suggested todo list*_
1) Quickly read this page :
https://meta.wikimedia.org/wiki/Communications/Wikimedia_brands/2030_moveme…
2) read the executive statement if you have not done so. Always
interesting:
https://meta.wikimedia.org/wiki/Communications/Wikimedia_brands/2030_moveme…
3) read the board statement. Always interesting.
https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Board_noticeboard/Boar…
4) reflect on the implications of the WMF rename on your own activities
and structure
5) decide whether you will answer the *individual survey* (before July
07th) : https://wikimedia.qualtrics.com/jfe/form/SV_9G2dN7P0T7gPqpD
6) decide whether you want to take the *community poll *:
https://meta.wikimedia.org/wiki/Communications/Wikimedia_brands/2030_moveme…
7) tomorrow, read the open letter when I sent the link, and tell Sj and
I whether you think the Offline UG should sign it or not
8) tell Sj and I if you have a strong opinion on what we should answer
in the *Affiliate Survey*
Flo
Le 21/06/2020 à 19:40, Florence Devouard a écrit :
>
> It is a fair question Emmanuel
>
>
> Well, what you say is true. In short, if I summarize super briefly
>
> 1) According to Heather, the brand redefinition was a request from the
> board back in 2015. But there is no mention in board meeting minutes
> and two former board members do not remember this decision. Note: this
> was in Lila time.
> However, it seems indeed that the board confirmed its non-opposition
> to the communication team to work on that topic in 2018:
> https://foundation.wikimedia.org/wiki/Minutes/2018-11-9,10,11#Branding
> Note that this does not appear to be a request from the board to the
> staff, but rather a request from the staff to be allowed to explore.
>
> 2) Brand awareness survey done in 7 countries in 2017 showed poor
> visibility and understanding of the wikimedia brand
> https://meta.wikimedia.org/wiki/Strategy/Wikimedia_movement/2017/Sources/Br…
>
> 2) When a survey was done a bit later, the statistical results were
> displayed in such a way that the case was made from the brand team
> that there was very little opposition from the community
> https://meta.wikimedia.org/w/index.php?title=Communications%2FWikimedia_bra…
> https://meta.wikimedia.org/wiki/Requests_for_comment/Should_the_Foundation_…
> Evidence was made that the statistical presentation was broken and
> misleading.
> Arguments from opponents to the change include the fact the board
> members might have been mislead in believing there was no opposition
> from the community, and thus approved a rebranding without correct
> context.
>
> 3) Following that situation, a RFC was launched by the community, and
> show an overwelming opposition to replace Wikimedia with Wikipedia in
> our orgs and projects name.
> Note that RFC is opt-in only, so might over represent those who oppose
> the rebranding. Hence the case made for the final survey to poll
> community members about their position on the matter.
> Those who want to further explore:
> https://meta.wikimedia.org/wiki/Requests_for_comment/Should_the_Foundation_…
>
> 4) The Brand team continued its work. Extensive discussions followed,
> with face to face brainstorming events to try to identify "good
> ideas". And key argument to opponants was that it was still in
> discussion phase etc.
> Brand network was created to better inform etc., give arguments in
> favor of the change etc. (I joined it as representant of offline UG to
> keep track of what was going on)
> There was further information provided about a month ago during a
> public meeting, revealing a collection of "words/directions"
> There were repeated requests from the people following this topic, for
> the final survey to include the "no change please" option. But this
> has been dismissed repeatedly.
>
> 5) Then finally a new survey (the one I mentionned earlier) was
> proposed a few days ago with a short list of options. The "no option"
> is not proposed, and the three options include replacing wikimedia by
> wikipedia.
> This is creating social unrest. Best person to know more about that is
> Andrew Lih.
>
> 6) An executive statement was published 2 days ago, stating that a)
> this rebranding was done per board request, and 2) the rename will happen
> Quote: *"We should have been clearer: a rebrand will happen. This has
> already been decided by the Board. The place where we seek
> consultation and input is on what an optimal rebrand looks like, and
> what the path to get there will be."*
> To read full statement :
> https://meta.wikimedia.org/wiki/Communications/Wikimedia_brands/2030_moveme…
>
> 7) There is a boiling discussion on whether to set up a central banner
> to invite participants to respond the survey, with community
> opposition to set up the banner.
> I have actually been contacted by some staff about this, who were
> apparently trying to evaluate the level of risk of WMF staff to be
> unsysoped if they decided the get over the community and activate the
> banner anyway
> https://meta.wikimedia.org/wiki/CentralNotice/Request/Movement_Brand_naming…
> I am not sure the banner is live yet. At least, I see no banner
> myself. It should have gone live on the 16th
>
> 8) Thus followed much discussion after the executive statement, on
> telegram and on meta.
> Probably central place is here :
> https://meta.wikimedia.org/wiki/Talk:Communications/Wikimedia_brands/2030_m…
> APPARENTLY, a statement from the board is expected. Unless wrong, it
> has not been published yet.
>
> 9) There is a meeting TONIGHT (21h UTC+2), community organized, on the
> matter.
> https://meta.wikimedia.org/wiki/All-Affiliates_Brand_Meeting
> I'll attend and will try to summarize
>
>