On Wed, Sep 2, 2009 at 8:45 AM, Manuel Schneider <
manuel.schneider(a)wikimedia.ch> wrote:
Hi Chengbin,
ZIM is an upcoming standard for using HTML contents offline. It is derived
from the Zeno file format used on the german Wikipedia DVDs since 2006 (ZIM
=
Zeno IMproved).
There are currently several reader applications for it, for instance the
zimreader made by the openZIM project or Kiwix.
There are some ports around like Kiwix on Windows and zimreader on openmoko
/
ARM.
The zimreader by openZIM works like a small webserver, it serves the
contents
of the ZIM file locally.
Once the HTML dump on
static.wikimedia.org is fixed and ZIM file creation
has
been integrated you will be able to download fresh ZIM files of all
Wikimedia
projects directly from
download.wikimedia.org.
Currently the Kiwix team has created some ZIM files and we try to build a
ZIM
file directory:
http://openzim.org/ZIM_File_Archive
ZIM actually stores the article text portion of the HTML output of the Wiki
in
a compressed cluster. It can hold also all type of other MIME types such as
images, CSS files etc.
http://openzim.org/ZIM_File_Format
It is an open standard and has currently been developed and implemented by
the
openZIM team (sponsored by Wikimedia CH) in C++. There is a library
(zimlib)
which can be integrated in other reader or dumping applications to make
them
ZIM-aware.
Using the open documentation ZIM can be implemented in any other language
as
well.
The idea of ZIM is to make the data files freely interchangeable with any
reader application. It is also flexible enough to store other works than
only
data from Wikipedia/MediaWiki. Then it tries to keep the reader application
as simple and stupid as possible. There is only uncompression and HTML
rendering to be done while a HTML renderer should be available on nearly
all
devices.
Greets,
Manuel
Am Mittwoch, 2. September 2009 schrieb Chengbin Zheng:
On Wed, Sep 2, 2009 at 8:13 AM, Manuel Schneider
<
manuel.schneider(a)wikimedia.ch> wrote:
> Hi Chengbin, hi list,
>
>
static.wikimedia.org is currently not being updated and while the
dumps
> processing has been assigned to and
completely rewritten by Tomasz Finc
> (developer at WMF), there has not been made any assignment concerning
> HTML dumps.
>
> We had a Wikipedia Offline meeting at Wikimania last week and discussed
> several issues. One issue is the fact, that WMF wants to see the ZIM
file
> format being used for offline dumps and has
suggested to include it
into
> the
> regular dumping process.
> So one question was: When will that happen, what is the status of WMF
ZIM
> dumping?
> As ZIM uses HTML extracts Tomasz clarified that once
> static.wikimedia.orghas been rebuild to be stable and sutainable,
> integrating ZIM would be trivial. But he also informed us that this
task
> has not yet been assigned.
>
> As Brion Vibber and Erik Möller have been at the meeting as well we
hope
> that
> this assignment will be made soon and this task has got higher
priority.
>
> This said I may also advise you not to you use the pure HTML dumps but
> the ZIM
> files for your Archos, because that's what they are meant for.
> A ZIM file containing all german Wikipedia articles (>900,000) is 1,4
GB,
> an
> additional full text search index takes another 1 GB.
>
> Greets,
>
>
> Manuel
>
> Am Mittwoch, 2. September 2009 schrieb Chengbin Zheng:
> > I bring this old issue up because I want to know if (or if not)
> > progress (or plans) are made to update the static HTML version of
> > Wikipedia. B&H photos just leaked the next generation of Archos
> > portable media players. Unbelievably, the rumors of a 500GB version
is
> > true! This is already tempting
(especially the price at $420). Just
> > waiting for specs
>
> on
>
> > September 15, the Archos event. I really hope it will support NTFS so
I
>
> can
>
> > use the compression feature.
> >
> > It would be really cool and convenient to have an offline copy of
>
> Wikipedia
>
> > anywhere I go without the need of Wi-Fi. What am I gonna do with
500GB?
BTW, does anyone know what is the size of the current static HTML
English Wikipedia version uncompressed? Thanks.
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
--
Regards
Manuel Schneider
Wikimedia CH - Verein zur Förderung Freien Wissens
Wikimedia CH - Association for the advancement of free knowledge
www.wikimedia.ch
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
I'm not familiar with the file extension .zim. What is that? Some sort of
compressed html format like .chm? Where can I get a .zim file? I need to
get check if this format is compatible with my Archos's Opera browser.
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
--
Regards
Manuel Schneider
Wikimedia CH - Verein zur Förderung Freien Wissens
Wikimedia CH - Association for the advancement of free knowledge
www.wikimedia.ch
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Well, as I said, Archos devices are not computers. They're merely portable
video players with an internet browser. That's why I seek the static HTML
version of Wikipedia.
Will there be easy extraction of zim to HTML? Extracting a dump is too
difficult.