New subject: Any news to update static HTML Wikipedia?

2 Sep 2009


      Hi Chengbin,
ZIM is an upcoming standard for using HTML contents offline. It is derived 
from the Zeno file format used on the german Wikipedia DVDs since 2006 (ZIM = 
Zeno IMproved).
There are currently several reader applications for it, for instance the 
zimreader made by the openZIM project or Kiwix.
There are some ports around like Kiwix on Windows and zimreader on openmoko / 
ARM.
The zimreader by openZIM works like a small webserver, it serves the contents 
of the ZIM file locally.
Once the HTML dump on static.wikimedia.org is fixed and ZIM file creation has 
been integrated you will be able to download fresh ZIM files of all Wikimedia 
projects directly from download.wikimedia.org.
Currently the Kiwix team has created some ZIM files and we try to build a ZIM 
file directory:
http://openzim.org/ZIM_File_Archive
ZIM actually stores the article text portion of the HTML output of the Wiki in 
a compressed cluster. It can hold also all type of other MIME types such as 
images, CSS files etc.
http://openzim.org/ZIM_File_Format
It is an open standard and has currently been developed and implemented by the 
openZIM team (sponsored by Wikimedia CH) in C++. There is a library (zimlib) 
which can be integrated in other reader or dumping applications to make them 
ZIM-aware.
Using the open documentation ZIM can be implemented in any other language as 
well.
The idea of ZIM is to make the data files freely interchangeable with any 
reader application. It is also flexible enough to store other works than only 
data from Wikipedia/MediaWiki. Then it tries to keep the reader application 
as simple and stupid as possible. There is only uncompression and HTML 
rendering to be done while a HTML renderer should be available on nearly all 
devices.
Greets,
Manuel
Am Mittwoch, 2. September 2009 schrieb Chengbin Zheng:
...
On Wed, Sep 2, 2009 at 8:13 AM, Manuel Schneider <
manuel.schneider@wikimedia.ch> wrote:
...
Hi Chengbin, hi list,
static.wikimedia.org is currently not being updated and while the dumps
processing has been assigned to and completely rewritten by Tomasz Finc
(developer at WMF), there has not been made any assignment concerning
HTML dumps.
We had a Wikipedia Offline meeting at Wikimania last week and discussed
several issues. One issue is the fact, that WMF wants to see the ZIM file
format being used for offline dumps and has suggested to include it into
the
regular dumping process.
So one question was: When will that happen, what is the status of WMF ZIM
dumping?
As ZIM uses HTML extracts Tomasz clarified that once
static.wikimedia.orghas been rebuild to be stable and sutainable,
integrating ZIM would be trivial. But he also informed us that this task
has not yet been assigned.
As Brion Vibber and Erik Möller have been at the meeting as well we hope
that
this assignment will be made soon and this task has got higher priority.
This said I may also advise you not to you use the pure HTML dumps but
the ZIM
files for your Archos, because that's what they are meant for.
A ZIM file containing all german Wikipedia articles (>900,000) is 1,4 GB,
an
additional full text search index takes another 1 GB.
Greets,
Manuel
Am Mittwoch, 2. September 2009 schrieb Chengbin Zheng:
...
I bring this old issue up because I want to know if (or if not)
progress (or plans) are made to update the static HTML version of
Wikipedia. B&H photos just leaked the next generation of Archos
portable media players. Unbelievably, the rumors of a 500GB version is
true! This is already tempting (especially the price at $420). Just
waiting for specs
on
...
September 15, the Archos event. I really hope it will support NTFS so I
can
...
use the compression feature.
It would be really cool and convenient to have an offline copy of
Wikipedia
...
anywhere I go without the need of Wi-Fi. What am I gonna do with 500GB?
BTW, does anyone know what is the size of the current static HTML
English Wikipedia version uncompressed? Thanks.
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
--
Regards
Manuel Schneider
Wikimedia CH - Verein zur Förderung Freien Wissens
Wikimedia CH - Association for the advancement of free knowledge
www.wikimedia.ch

Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
I'm not familiar with the file extension .zim. What is that? Some sort of
compressed html format like .chm? Where can I get a .zim file? I need to
get check if this format is compatible with my Archos's Opera browser.
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
-- 
Regards
Manuel Schneider

Wikimedia CH - Verein zur Förderung Freien Wissens
Wikimedia CH - Association for the advancement of free knowledge
www.wikimedia.ch

Re: [Wikitech-l] Any news to update static HTML Wikipedia?