I was curious
if you include images? If not, are you considering doing
so, and what's stopping you? If so, how do you pick them?
we haven't done this on our "Test DVD" this summer even though this is
easily possible.
Well, easily means: It is easy for the format, but the problem is to choose
the images. Emmanuel Engelhart (Kiwix, he is also part of the openZIM team)
has made some perl scripts for that.
We didn't do it for two reasons:
Lack of time, because even though the tools exist, it's a lot of work.
Searching through the articles, get all the image URLs, get the images,
decide in which size to resize them etc...
I could help with generating the list of images to pick, I've done
that work before for the OLPC activity. I use traffic stats. Traffic
stats (when used appropriately) work quite well for picking which
articles or images to include.
Ben Schwartz can help too, I believe he was responsible for
automatically acquiring and resizing images (and even converting svg
to jpg). He's the other major contributor to the OLPC activity that
still has interest in the general goal of offline Wikipedias.
And the openZIM project is not a publisher of offline
content. We are
developing a stable, efficient format allowing free interchange of contents
between reader applications and devices and providing a GPL'ed sample
implementation of it.
So should I be talking to someone else? Who should I talk to?
Well, we had a "Offline Meeting" at
Wikimania in Buenos Aires this summer
where Samuel Klein was also participating. Our goal is to contribute the
right technology to make all the offline projects able to collaborate.
Currently everyone is reinventing the wheel when it comes to storage of the
content.
Unfortunately, SJ had very little to do with the actual program, which
ended up being created by volunteers not on the wikibrowse mailing
list.
We think that the specific knowledge of the publishers
should be how to
select the content - which content goes where in which form - and not
technical questions such as compression, storage or retrieving the data on
the user's end.
OK, if I shouldn't be talking to you guys, tell me who to talk to.
Yes, selecting content is very difficult. I couldn't get Peru or SJ to
contribute meaningfully to generating a simple blacklist of articles
that should NOT be included on the OLPC activity. (Recall it is being
given to young children!) I ended up making the blacklist myself based
on my own gut feelings. If Peru's board of education or OLPC's
"director of content" couldn't get their act together for this simple
task, expecting others to do this task for you will be a huge
roadblock to getting content out.
Traffic based content is simple and effective and it doesn't involve a
lot of opinions on what should or should not be included.
- Madeleine