Hi Madeleine,
Madeleine Price Ball schrieb:
I was curious if you include images? If not, are you considering doing so, and what's stopping you? If so, how do you pick them?
we haven't done this on our "Test DVD" this summer even though this is easily possible. Well, easily means: It is easy for the format, but the problem is to choose the images. Emmanuel Engelhart (Kiwix, he is also part of the openZIM team) has made some perl scripts for that.
We didn't do it for two reasons: Lack of time, because even though the tools exist, it's a lot of work. Searching through the articles, get all the image URLs, get the images, decide in which size to resize them etc...
And the openZIM project is not a publisher of offline content. We are developing a stable, efficient format allowing free interchange of contents between reader applications and devices and providing a GPL'ed sample implementation of it.
I did the work for picking which articles & images went into the XO activity, based on traffic stats. (We only had 100MB, we got 24k articles in 80MB and spent the other 20MB on highly compressed images.) OLPC has other more critical things to worry about these days, but some of the volunteers who worked on that project might be interested in helping others.
Well, we had a "Offline Meeting" at Wikimania in Buenos Aires this summer where Samuel Klein was also participating. Our goal is to contribute the right technology to make all the offline projects able to collaborate. Currently everyone is reinventing the wheel when it comes to storage of the content. We think that the specific knowledge of the publishers should be how to select the content - which content goes where in which form - and not technical questions such as compression, storage or retrieving the data on the user's end.
Wikimedia Foundation is supporting us so far as they share our goal and work on a regular export of all Wikimedia wikis into ZIM format (like you already get SQL and XML dumps on download.wikimedia.org). They also have a lots of contacts whith publishers and other projects working on Wikipedia Offline and connect them with us, so we can improve ZIM to fit for all the Wikipedia Offline projects. That's why we defined a new format version at the last Developers Meeting.
Greets,
Manuel