Hi
For the first time, we have achieved to release a complete dump of all
encyclopedic articles of the Wikipedia in English, *with thumbnails*.
This ZIM file is 40 GB big and contains the current 4.5 million articles
with their 3.5 millions pictures:
http://download.kiwix.org/zim/wikipedia_en_all.zim.torrent
This ZIM file is directly and easily usable on many types of devices
like Android smartphones and Win/OSX/Linux PCs with Kiwix, or Symbian
with Wikionboard.
You don't need modern computers with big CPUs. You can for example
create a (read-only) Wikipedia mirror on a RaspberryPi for ~100USD by
using our ZIM dedicated Web server called kiwix-serve. A demo is
available here: http://library.kiwix.org/wikipedia_en_all/
Like always, we also provide a packaged version (for the main PC
systems) which includes fulltext search index+ZIM file+binaries:
http://download.kiwix.org/portable/wikipedia_en_all.zip.torrent
What is interesting too: This file was generated in less than 2 weeks
thanks to multiples recent innovations:
* The Parsoid (cluster), which gives us an HTML output with additional
semantic RDF tags
* mwoffliner, a nodejs script able to dumps pages based on the Mediawiki
API (and Parsoid API)
* zimwriterfs, a solution able to compile any local HTML directory to a
ZIM file
We have now an efficient way to generate new ZIM files. Consequently, we
will work to industrialize and automatize the ZIM file generation
process, one thing which is probably the most oldest and important
problem we still face at Kiwix.
All this would not have been possible without the support:
* Wikimedia CH and the "ZIM autobuild" project
* Wikimedia France and the Afripedia project
* Gwicke from the WMF Parsoid dev team.
BTW, we need additional developer helps with javascript/nodejs skills to
fix a few issues on mwoffliner:
* Recreate the "table of content" based on the HTML DOM (*)
* Scrape Mediawiki Resourceloader in a manner it will continue to work
offline (***)
* Scrape categories (**)
* Localized the script (*)
* Improve the global performance by introducing usage of workers (**)
* Create nodezim, the libzim nodejs binding and use it (***, need also
compilation and C++ skills)
* Evaluate necessary work to merge mwoffliner and new WMF PDF Renderer (***)
Emmanuel
--
Kiwix - Wikipedia Offline & more
* Web: http://www.kiwix.org
* Twitter: https://twitter.com/KiwixOffline
* more: http://www.kiwix.org/wiki/Communication
Le 01/03/2014 19:26, James Forrester a écrit :
> On Saturday, March 1, 2014, Emmanuel Engelhart <kelson(a)kiwix.org> wrote:
> fix a few issues on mwoffliner:
>> * Recreate the "table of content" based on the HTML DOM (*)
>
> We are currently working on doing similar work to this in VisualEditor (to
> provide for a Table of Contents that can change 'live' as the document is
> edited); this code may ultimately be used to generate the "real" Tables of
> Contents for the reading HTML, as part of the plans to replace the output
> of the PHP parser with Parsoid everywhere.
>
> It should be possible for Kiwix to re-use this in some way (rather than
> have to re-implement it!). We hope to have something to show in the next
> few weeks, if that's helpful.
Nice news!
Yes, indeed, this would be great to be able to re-use your work.
I have subscribed this Bugzilla entry, which is guess what you mean:
https://bugzilla.wikimedia.org/show_bug.cgi?id=49224
Of course, for our usage, the best solution would be to have the initial
TOC rendering done on Parsoid side, which would also offer the advantage
of speeding-up initial VE rendering.
Emmanuel
--
Kiwix - Wikipedia Offline & more
* Web: http://www.kiwix.org
* Twitter: https://twitter.com/KiwixOffline
* more: http://www.kiwix.org/wiki/Communication