Hi Madeleine,
Madeleine Price Ball schrieb:
I was curious if you include images? If not, are you considering doing so, and what's stopping you? If so, how do you pick them?
we haven't done this on our "Test DVD" this summer even though this is easily possible. Well, easily means: It is easy for the format, but the problem is to choose the images. Emmanuel Engelhart (Kiwix, he is also part of the openZIM team) has made some perl scripts for that.
We didn't do it for two reasons: Lack of time, because even though the tools exist, it's a lot of work. Searching through the articles, get all the image URLs, get the images, decide in which size to resize them etc...
And the openZIM project is not a publisher of offline content. We are developing a stable, efficient format allowing free interchange of contents between reader applications and devices and providing a GPL'ed sample implementation of it.
I did the work for picking which articles & images went into the XO activity, based on traffic stats. (We only had 100MB, we got 24k articles in 80MB and spent the other 20MB on highly compressed images.) OLPC has other more critical things to worry about these days, but some of the volunteers who worked on that project might be interested in helping others.
Well, we had a "Offline Meeting" at Wikimania in Buenos Aires this summer where Samuel Klein was also participating. Our goal is to contribute the right technology to make all the offline projects able to collaborate. Currently everyone is reinventing the wheel when it comes to storage of the content. We think that the specific knowledge of the publishers should be how to select the content - which content goes where in which form - and not technical questions such as compression, storage or retrieving the data on the user's end.
Wikimedia Foundation is supporting us so far as they share our goal and work on a regular export of all Wikimedia wikis into ZIM format (like you already get SQL and XML dumps on download.wikimedia.org). They also have a lots of contacts whith publishers and other projects working on Wikipedia Offline and connect them with us, so we can improve ZIM to fit for all the Wikipedia Offline projects. That's why we defined a new format version at the last Developers Meeting.
Greets,
Manuel
I was curious if you include images? If not, are you considering doing so, and what's stopping you? If so, how do you pick them?
we haven't done this on our "Test DVD" this summer even though this is easily possible. Well, easily means: It is easy for the format, but the problem is to choose the images. Emmanuel Engelhart (Kiwix, he is also part of the openZIM team) has made some perl scripts for that.
We didn't do it for two reasons: Lack of time, because even though the tools exist, it's a lot of work. Searching through the articles, get all the image URLs, get the images, decide in which size to resize them etc...
I could help with generating the list of images to pick, I've done that work before for the OLPC activity. I use traffic stats. Traffic stats (when used appropriately) work quite well for picking which articles or images to include.
Ben Schwartz can help too, I believe he was responsible for automatically acquiring and resizing images (and even converting svg to jpg). He's the other major contributor to the OLPC activity that still has interest in the general goal of offline Wikipedias.
And the openZIM project is not a publisher of offline content. We are developing a stable, efficient format allowing free interchange of contents between reader applications and devices and providing a GPL'ed sample implementation of it.
So should I be talking to someone else? Who should I talk to?
Well, we had a "Offline Meeting" at Wikimania in Buenos Aires this summer where Samuel Klein was also participating. Our goal is to contribute the right technology to make all the offline projects able to collaborate. Currently everyone is reinventing the wheel when it comes to storage of the content.
Unfortunately, SJ had very little to do with the actual program, which ended up being created by volunteers not on the wikibrowse mailing list.
We think that the specific knowledge of the publishers should be how to select the content - which content goes where in which form - and not technical questions such as compression, storage or retrieving the data on the user's end.
OK, if I shouldn't be talking to you guys, tell me who to talk to.
Yes, selecting content is very difficult. I couldn't get Peru or SJ to contribute meaningfully to generating a simple blacklist of articles that should NOT be included on the OLPC activity. (Recall it is being given to young children!) I ended up making the blacklist myself based on my own gut feelings. If Peru's board of education or OLPC's "director of content" couldn't get their act together for this simple task, expecting others to do this task for you will be a huge roadblock to getting content out.
Traffic based content is simple and effective and it doesn't involve a lot of opinions on what should or should not be included.
- Madeleine
Madeleine Price Ball schrieb:
We didn't do it for two reasons: Lack of time, because even though the tools exist, it's a lot of work. Searching through the articles, get all the image URLs, get the images, decide in which size to resize them etc...
I could help with generating the list of images to pick, I've done that work before for the OLPC activity. I use traffic stats. Traffic stats (when used appropriately) work quite well for picking which articles or images to include.
Ben Schwartz can help too, I believe he was responsible for automatically acquiring and resizing images (and even converting svg to jpg). He's the other major contributor to the OLPC activity that still has interest in the general goal of offline Wikipedias.
I think this is great news for Emmanuel and the WP1.0 project.
And the openZIM project is not a publisher of offline content. We are developing a stable, efficient format allowing free interchange of contents between reader applications and devices and providing a GPL'ed sample implementation of it.
So should I be talking to someone else? Who should I talk to?
Depends on what you actually need. We're the technicians. Out there are several projects and of course the Wikimedia Foundation.
Well, we had a "Offline Meeting" at Wikimania in Buenos Aires this summer where Samuel Klein was also participating. Our goal is to contribute the right technology to make all the offline projects able to collaborate. Currently everyone is reinventing the wheel when it comes to storage of the content.
Unfortunately, SJ had very little to do with the actual program, which ended up being created by volunteers not on the wikibrowse mailing list.
We think that the specific knowledge of the publishers should be how to select the content - which content goes where in which form - and not technical questions such as compression, storage or retrieving the data on the user's end.
OK, if I shouldn't be talking to you guys, tell me who to talk to.
I think the WP1.0 project is what you are looking for. They have these criteria to rate articles and select them by these ratings to compile special selections.
Emmanuel should get back to you.
/Manuel