Hi Chengbin,
ZIM is an upcoming standard for using HTML contents offline. It is derived from the Zeno file format used on the german Wikipedia DVDs since 2006 (ZIM = Zeno IMproved).
There are currently several reader applications for it, for instance the zimreader made by the openZIM project or Kiwix. There are some ports around like Kiwix on Windows and zimreader on openmoko / ARM.
The zimreader by openZIM works like a small webserver, it serves the contents of the ZIM file locally.
Once the HTML dump on static.wikimedia.org is fixed and ZIM file creation has been integrated you will be able to download fresh ZIM files of all Wikimedia projects directly from download.wikimedia.org.
Currently the Kiwix team has created some ZIM files and we try to build a ZIM file directory: http://openzim.org/ZIM_File_Archive
ZIM actually stores the article text portion of the HTML output of the Wiki in a compressed cluster. It can hold also all type of other MIME types such as images, CSS files etc. http://openzim.org/ZIM_File_Format
It is an open standard and has currently been developed and implemented by the openZIM team (sponsored by Wikimedia CH) in C++. There is a library (zimlib) which can be integrated in other reader or dumping applications to make them ZIM-aware.
Using the open documentation ZIM can be implemented in any other language as well. The idea of ZIM is to make the data files freely interchangeable with any reader application. It is also flexible enough to store other works than only data from Wikipedia/MediaWiki. Then it tries to keep the reader application as simple and stupid as possible. There is only uncompression and HTML rendering to be done while a HTML renderer should be available on nearly all devices.
Greets,
Manuel
Am Mittwoch, 2. September 2009 schrieb Chengbin Zheng:
On Wed, Sep 2, 2009 at 8:13 AM, Manuel Schneider <
manuel.schneider@wikimedia.ch> wrote:
Hi Chengbin, hi list,
static.wikimedia.org is currently not being updated and while the dumps processing has been assigned to and completely rewritten by Tomasz Finc (developer at WMF), there has not been made any assignment concerning HTML dumps.
We had a Wikipedia Offline meeting at Wikimania last week and discussed several issues. One issue is the fact, that WMF wants to see the ZIM file format being used for offline dumps and has suggested to include it into the regular dumping process. So one question was: When will that happen, what is the status of WMF ZIM dumping? As ZIM uses HTML extracts Tomasz clarified that once static.wikimedia.orghas been rebuild to be stable and sutainable, integrating ZIM would be trivial. But he also informed us that this task has not yet been assigned.
As Brion Vibber and Erik Möller have been at the meeting as well we hope that this assignment will be made soon and this task has got higher priority.
This said I may also advise you not to you use the pure HTML dumps but the ZIM files for your Archos, because that's what they are meant for. A ZIM file containing all german Wikipedia articles (>900,000) is 1,4 GB, an additional full text search index takes another 1 GB.
Greets,
Manuel
Am Mittwoch, 2. September 2009 schrieb Chengbin Zheng:
I bring this old issue up because I want to know if (or if not) progress (or plans) are made to update the static HTML version of Wikipedia. B&H photos just leaked the next generation of Archos portable media players. Unbelievably, the rumors of a 500GB version is true! This is already tempting (especially the price at $420). Just waiting for specs
on
September 15, the Archos event. I really hope it will support NTFS so I
can
use the compression feature.
It would be really cool and convenient to have an offline copy of
Wikipedia
anywhere I go without the need of Wi-Fi. What am I gonna do with 500GB?
BTW, does anyone know what is the size of the current static HTML English Wikipedia version uncompressed? Thanks. _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- Regards Manuel Schneider
Wikimedia CH - Verein zur Förderung Freien Wissens Wikimedia CH - Association for the advancement of free knowledge www.wikimedia.ch
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
I'm not familiar with the file extension .zim. What is that? Some sort of compressed html format like .chm? Where can I get a .zim file? I need to get check if this format is compatible with my Archos's Opera browser. _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Wed, Sep 2, 2009 at 8:45 AM, Manuel Schneider < manuel.schneider@wikimedia.ch> wrote:
Hi Chengbin,
ZIM is an upcoming standard for using HTML contents offline. It is derived from the Zeno file format used on the german Wikipedia DVDs since 2006 (ZIM = Zeno IMproved).
There are currently several reader applications for it, for instance the zimreader made by the openZIM project or Kiwix. There are some ports around like Kiwix on Windows and zimreader on openmoko / ARM.
The zimreader by openZIM works like a small webserver, it serves the contents of the ZIM file locally.
Once the HTML dump on static.wikimedia.org is fixed and ZIM file creation has been integrated you will be able to download fresh ZIM files of all Wikimedia projects directly from download.wikimedia.org.
Currently the Kiwix team has created some ZIM files and we try to build a ZIM file directory: http://openzim.org/ZIM_File_Archive
ZIM actually stores the article text portion of the HTML output of the Wiki in a compressed cluster. It can hold also all type of other MIME types such as images, CSS files etc. http://openzim.org/ZIM_File_Format
It is an open standard and has currently been developed and implemented by the openZIM team (sponsored by Wikimedia CH) in C++. There is a library (zimlib) which can be integrated in other reader or dumping applications to make them ZIM-aware.
Using the open documentation ZIM can be implemented in any other language as well. The idea of ZIM is to make the data files freely interchangeable with any reader application. It is also flexible enough to store other works than only data from Wikipedia/MediaWiki. Then it tries to keep the reader application as simple and stupid as possible. There is only uncompression and HTML rendering to be done while a HTML renderer should be available on nearly all devices.
Greets,
Manuel
Am Mittwoch, 2. September 2009 schrieb Chengbin Zheng:
On Wed, Sep 2, 2009 at 8:13 AM, Manuel Schneider <
manuel.schneider@wikimedia.ch> wrote:
Hi Chengbin, hi list,
static.wikimedia.org is currently not being updated and while the
dumps
processing has been assigned to and completely rewritten by Tomasz Finc (developer at WMF), there has not been made any assignment concerning HTML dumps.
We had a Wikipedia Offline meeting at Wikimania last week and discussed several issues. One issue is the fact, that WMF wants to see the ZIM
file
format being used for offline dumps and has suggested to include it
into
the regular dumping process. So one question was: When will that happen, what is the status of WMF
ZIM
dumping? As ZIM uses HTML extracts Tomasz clarified that once static.wikimedia.orghas been rebuild to be stable and sutainable, integrating ZIM would be trivial. But he also informed us that this
task
has not yet been assigned.
As Brion Vibber and Erik Möller have been at the meeting as well we
hope
that this assignment will be made soon and this task has got higher
priority.
This said I may also advise you not to you use the pure HTML dumps but the ZIM files for your Archos, because that's what they are meant for. A ZIM file containing all german Wikipedia articles (>900,000) is 1,4
GB,
an additional full text search index takes another 1 GB.
Greets,
Manuel
Am Mittwoch, 2. September 2009 schrieb Chengbin Zheng:
I bring this old issue up because I want to know if (or if not) progress (or plans) are made to update the static HTML version of Wikipedia. B&H photos just leaked the next generation of Archos portable media players. Unbelievably, the rumors of a 500GB version
is
true! This is already tempting (especially the price at $420). Just waiting for specs
on
September 15, the Archos event. I really hope it will support NTFS so
I
can
use the compression feature.
It would be really cool and convenient to have an offline copy of
Wikipedia
anywhere I go without the need of Wi-Fi. What am I gonna do with
500GB?
BTW, does anyone know what is the size of the current static HTML English Wikipedia version uncompressed? Thanks. _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- Regards Manuel Schneider
Wikimedia CH - Verein zur Förderung Freien Wissens Wikimedia CH - Association for the advancement of free knowledge www.wikimedia.ch
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
I'm not familiar with the file extension .zim. What is that? Some sort of compressed html format like .chm? Where can I get a .zim file? I need to get check if this format is compatible with my Archos's Opera browser. _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- Regards Manuel Schneider
Wikimedia CH - Verein zur Förderung Freien Wissens Wikimedia CH - Association for the advancement of free knowledge www.wikimedia.ch
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Well, as I said, Archos devices are not computers. They're merely portable video players with an internet browser. That's why I seek the static HTML version of Wikipedia.
Will there be easy extraction of zim to HTML? Extracting a dump is too difficult.
Am Mittwoch, 2. September 2009 schrieb Chengbin Zheng:
Well, as I said, Archos devices are not computers. They're merely portable video players with an internet browser. That's why I seek the static HTML version of Wikipedia.
I see. But maybe it is possible to install a reader or at least the zimreader as webserver, which can be used with the built-in browser. At least it won't take much ressources.
Will there be easy extraction of zim to HTML? Extracting a dump is too difficult.
Of course it is possible, that's exactly what the zimreader does when serving pages. But as the HTML dump working on Wikimedia clusters is a requirement for ZIM file creation you can just go with the HTML dump from there as well.
Greets,
Manuel
Hi, everyone,
Wikimedia Commons, the media repository site used by Wikipedia, today just reached the 5 million media files milestone. Every one of these media files is available under a free license, such that anyone can use them for any purpose. Wikimedia Commons is the largest free media repository on the internet.
Zeyi He
Wikimedia UK
On Wed, Sep 2, 2009 at 6:14 AM, zh509@york.ac.uk wrote:
Hi, everyone,
Wikimedia Commons, the media repository site used by Wikipedia, today just reached the 5 million media files milestone. Every one of these media files is available under a free license, such that anyone can use them for any purpose. Wikimedia Commons is the largest free media repository on the internet.
Not counting the ~5000 non-free files that are currently identified as "Copyright by Wikimedia".
And not worrying about the 18M CC-BY / CC-BY-SA images on Flickr (which is arguably still the largest free content image repository, though that's not the only way Flickr is used).
-Robert Rohde
wikitech-l@lists.wikimedia.org