Dear ZIM hackers,
I recently improved the Kiwix HTTP software called kiwix-serve. Small
reminder: this software is a HTTP server able to deliver ZIM file
contents, so it acts as a Web server. Kiwix-serve has the new ability to
deal with many ZIM files at the same time (so with only one binary
instance). That means: you have on a same Web server contents belonging
to many different ZIM files. You have a demo here: http://library.kiwix.org
Both, ZIM files we make at Kiwix and ZIM files generated from Wikipedia,
have articles HTML with absolute internal URLs. That means, in the HTML
of articles, for a link pointing to the article "Wikipedia" (this is an
example), we will have a URL like "/A/Wikipedia" (or "/A/Wikipedia.html"
in my case, but this does not matter).
Until now, this was not a problem because we always had a "one by one"
usage of ZIM files: the context was clear. But now, in my case, I need
to specify with which ZIM file I want to deal. If I want to open the
"Wikipedia" article in WPEN, I should have something like that:
Here is the problem: I have HTML code with URLs looking like
"/A/Wikipedia.html" and I need something
"/wikipedia_en_all_nopic/A/Wikipedia.html". I have found a workaround by
rewriting on the fly the URLs but this is a ugly solution which is
absolutely not sustainable.
As far as I know, we do not have any specification relating to that.
To my opinion, absolute internals URLs should be forbidden. If we
continue with my example: "/wikipedia_en_all_nopic/A/Wikipedia.html" ;
"wikipedia_en_all_nopic" is something decided by the kiwix-serve
operator, not something that should be imposed by the ZIM publisher. So,
the publisher can not assumed what could/should be the full absolute
path, so should not use absolute paths for internal URLs. So, URLs
should be relatives and I only see two options (I continue with my
example): if you are in the same namespace, simply use "Wikipedia.html"
otherwise come back to the relative root of the file "../A/Wikipedia.html".
Before starting to fill a feature request for the Mediawiki:Collection
extension and patching my own ZIM generation scripts, I think we should
discuss and take a decision about that (and also update afterward the
specs. on the wiki). So I wait to your feedbacks.
I recently made a "book" via the PediaPress Book Creator prior to my
trip to India, and it has been delightful to use and read on the flight and
in my hotel room here. It had been awhile since I tried to make one, and I
wanted to say great work and good job to PediaPress! Also, the integration
with Kiwix was wonderful, and I love that it now shows up so seamlessly in
my "Library" within Kiwix.
I am not sure if you are aware, but in the recent Readership survey of
Wikipedia readers (from Sept 2011, which is only just now being analyzed),
the *number one request by readers was saving of articles for offline use
(as a PDF): *40% of readers said they would be MORE LIKELY to use Wikipedia
if such a service was available (note: this % is even higher in target
areas like India (50%) and Brazil (52%).* *This is fascinating, for it
shows that we (a) have a broader desire for offline content than just those
without Internet access, and (b) indicates there is great opportunity for
marketing the "Book Creator" tool.
I want to discuss the points needed to get to (b). The Book Creator tool is
great, and I think is the exact right type of tool to meet the needs of our
readers; but there is much room for improvements. Right now, I personally
find the experience getting to and from the Book Creator tool to be not as
straight forward as would be most beneficial. As this service has such a
huge demand, I think there are some opportunities for the refining of the
"book creator" tool and process. I'd love thoughts on the following and
- *Rebranding: *What are our thoughts on the title "Book Creator"? I
wonder if the title itself is a bit confusing, since people are apparently
unaware of the ability to download as PDF at all! Plus, I personally don't
utilize the tools as a means for creating an actual book, though I
recognize this was the initial purpose: I view it as a way to read a couple
specific articles offline. I think using the word "collection," which we do
informally anyway, is likely more appropriate here. Perhaps "Offline
Collection Creator" or "Article Aggregator" (both terrible ideas, I know,
but I'm just throwing things out there:))
- *Website placement: *I think it is obvious the space the Book Creator
takes on the Left Hand tool bar is not enough to draw attention to the
feature. I wonder if we should attempt to have some sort of a "Save for
Offline Use" button on each article, which would then open a new window
into the collection creator screen? This could look similar to the "Share
this" links which exist on most information websites (for Facebook,
Twitter, email, etc.). This could be next to the "Print" button.
- *Marketing: *Once we feel a bit more confident about usability, it
would be great to market the tool. We can do this in three phases:
- Phase 1: emails to different mailing lists announcing the project,
and asking for suggestions and feedback on the tools
- Phase 2: "pilot" testing of the tool, with banner advertising to
- Phase 3: advertise this functionality via a banner at the top of
- *Measurement*: clearly, we should have careful tracking of *books
created* and *downloads by file type* by day. @PediaPress: is this
I have some other ideas as well, but wanted to throw these out there for
some immediate reactions. What are people's thoughts? Any other ideas?
Anyone good with website design who could help with rearranging of the
"Book Creator"?? :)
Looking forward to the discussion (which should be moved onto a wiki soon) -
Are there people who will be at Wikimania who would be interested in
meeting up to discuss Offline strategy? It appears there are perhaps just
three of us on the offline project pages. If there is a bigger group, it
would be great to set aside some time during the Hackathon to talk more
specifically about what progress we would like to see in Offline WIkipedia
going into the future.
Otherwise, it appears Manuel's submission got selected, and we have some
time during Wikimania itself! Since this session is only an hour,
though, it seems like all the time will be spent reporting what the
different activities going on around the world are - which is much needed,
but a slightly different purpose.
What do people think? Will anyone be around?
Global Development, Manager
Imagine a world in which every single human being can freely share in
the sum of all knowledge. Help us make it a reality!
Donate to Wikimedia <https://donate.wikimedia.org/>
I am wondering whether you could give me some tips to find an
appropriate solution for our scenario! We have got a small amount of
resources to throw at this, and can hopefully develop something that
will be useful for others.
We have a wiki here:
which has a small number of pages (by comparison!), but it does contain video.
We would like to be able to create offline (read-only) copy of this.
This is because some of it is aimed at teacher education in Africa,
where connectivity it poor. At one of the schools we work with we have
a low-power server, with a local wifi network, so basically we'd like
to get an offline copy onto that server. Html is probably the best
solution (as it could then be indexed for searching on our server),
and we want to be able to generate an update fairly frequently (over
low bandwidht) so updates need to be incremental.
One approach I've taken in the past was this: pull html versions of
pages via the API (and doing regexp on them, e.g. to preserve local
links between pages, but leave links for editing pointing at the
online version). This could be a good solution, but it requires
creating of a static html version on the server, that's then
periodically updates. It also requires a lot of hacky regexp to get
the pulled html into the right format. It would be good to have
something in the API that could output pages straight as a 'localised'
html format. Another issue was that the API doesn't easily allow
finding the most recent version of all pages that have changes since a
certain date, so I had to mess around with the OAI extension to get
this information. Overall, I got it to work, but it relied on so many
hacks that it wasn't really maintanable.
What would be ideal would be to have a local script (i.e. on the
remote server) managing this, without us having to create an
(intermediary) html copy ever so often. The remote server contacts our
wiki when it has connectivity, and fetches updates at night. The only
thing the wiki does is produce a list pages that have changed since a
certain date, and (via the API) provides suitable html for download. I
should say that massive scalability isn't an issue: We won't have
loads of pages any time soon, and we won't have lots of mirror sites.
Once we've got an html version (i.e. wiki -> local html), we'd also
like to build a little app for Android, that can download such an html
copy (again, ideally straight from a wiki, without intermediary html
copy somewhere). The app would manage downloading the updates, keep
content as html on the SD card, and would allow users to launch the SD
card content (into a mobile browser). (We'd need to take care of how
to hand video, probably use HTML5 video tags.)
I would really appreciate your feedback on this! The above strategy
(modify the API to give localised html) seems simple enough to me - do
you have particular views on what could work best?
(Btw. I am also interested in ePub, but that's a different scenario!)
Renaud will speak there about his experience as a wikipedian, Kiwix
developer and open source evangelist...
-------- Original Message --------
Subject: [Wiki-research-l] Wikimedia at Open Source Bridge conference,
late June, Portland
Date: Tue, 19 Jun 2012 06:36:58 -0400
From: Sumana Harihareswara <sumanah(a)wikimedia.org>
Reply-To: Research into Wikimedia content and communities
Organization: Wikimedia Foundation
To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org>,
wiki-research-l(a)lists.wikimedia.org, sstierch(a)wikimedia.org, Wikitext-l
June 26-29, a bunch of us will be in Portland, Oregon, USA for the Open
Source Bridge conference.
WMF is sponsoring the Friday unconference day, and will host a hacking
table that day as well as (I hope) the Tuesday "Hacker Lounge
Wikimedians are giving several talks during OSBridge:
"Identity, Reputation and Gratitude: Designing for a community" by
Brandon Harris: Tuesday, 1:30
"A snapshot of Open Source in West Africa" by Renaud Gaudin:
"Building A Visual Editor for Wikipedia" by Roan Kattouw and Trevor
Parscal: Tuesday, 4:45
"Internationalization @Wikipedia: Helping add the next billion web
users" by Alolita Sharma: Wednesday, 10am
"Why you need to host 100 new wikis just for yourself." by Ward
Cunningham: Wednesday, 2:30
"Outreach Events: My Triumphs, My Mistakes" by Asheesh Laroia and
me: Thursday, 3:45
I give the opening keynote address on Tuesday morning. My tentative
title: "Be Bold."
If you're in or near Portland and want to come, let me know; I might be
able to hook you up with a free conference pass.
Engineering Community Manager
Wiki-research-l mailing list