Hi
For the first time, we have achieved to release a complete dump of all encyclopedic articles of the Wikipedia in English, *with thumbnails*.
This ZIM file is 40 GB big and contains the current 4.5 million articles with their 3.5 millions pictures: http://download.kiwix.org/zim/wikipedia_en_all.zim.torrent
This ZIM file is directly and easily usable on many types of devices like Android smartphones and Win/OSX/Linux PCs with Kiwix, or Symbian with Wikionboard.
You don't need modern computers with big CPUs. You can for example create a (read-only) Wikipedia mirror on a RaspberryPi for ~100USD by using our ZIM dedicated Web server called kiwix-serve. A demo is available here: http://library.kiwix.org/wikipedia_en_all/
Like always, we also provide a packaged version (for the main PC systems) which includes fulltext search index+ZIM file+binaries: http://download.kiwix.org/portable/wikipedia_en_all.zip.torrent
What is interesting too: This file was generated in less than 2 weeks thanks to multiples recent innovations: * The Parsoid (cluster), which gives us an HTML output with additional semantic RDF tags * mwoffliner, a nodejs script able to dumps pages based on the Mediawiki API (and Parsoid API) * zimwriterfs, a solution able to compile any local HTML directory to a ZIM file
We have now an efficient way to generate new ZIM files. Consequently, we will work to industrialize and automatize the ZIM file generation process, one thing which is probably the most oldest and important problem we still face at Kiwix.
All this would not have been possible without the support: * Wikimedia CH and the "ZIM autobuild" project * Wikimedia France and the Afripedia project * Gwicke from the WMF Parsoid dev team.
BTW, we need additional developer helps with javascript/nodejs skills to fix a few issues on mwoffliner: * Recreate the "table of content" based on the HTML DOM (*) * Scrape Mediawiki Resourceloader in a manner it will continue to work offline (***) * Scrape categories (**) * Localized the script (*) * Improve the global performance by introducing usage of workers (**) * Create nodezim, the libzim nodejs binding and use it (***, need also compilation and C++ skills) * Evaluate necessary work to merge mwoffliner and new WMF PDF Renderer (***)
Emmanuel
On Saturday, March 1, 2014, Emmanuel Engelhart kelson@kiwix.org wrote:
Hi
For the first time, we have achieved to release a complete dump of all encyclopedic articles of the Wikipedia in English, *with thumbnails*.
Great news, Emmanuel – congratulations!
[Snip]
BTW, we need additional developer helps with javascript/nodejs skills to
fix a few issues on mwoffliner:
- Recreate the "table of content" based on the HTML DOM (*)
We are currently working on doing similar work to this in VisualEditor (to provide for a Table of Contents that can change 'live' as the document is edited); this code may ultimately be used to generate the "real" Tables of Contents for the reading HTML, as part of the plans to replace the output of the PHP parser with Parsoid everywhere.
It should be possible for Kiwix to re-use this in some way (rather than have to re-implement it!). We hope to have something to show in the next few weeks, if that's helpful.
J.
Le 01/03/2014 19:26, James Forrester a écrit :
On Saturday, March 1, 2014, Emmanuel Engelhart kelson@kiwix.org wrote: fix a few issues on mwoffliner:
- Recreate the "table of content" based on the HTML DOM (*)
We are currently working on doing similar work to this in VisualEditor (to provide for a Table of Contents that can change 'live' as the document is edited); this code may ultimately be used to generate the "real" Tables of Contents for the reading HTML, as part of the plans to replace the output of the PHP parser with Parsoid everywhere.
It should be possible for Kiwix to re-use this in some way (rather than have to re-implement it!). We hope to have something to show in the next few weeks, if that's helpful.
Nice news!
Yes, indeed, this would be great to be able to re-use your work.
I have subscribed this Bugzilla entry, which is guess what you mean: https://bugzilla.wikimedia.org/show_bug.cgi?id=49224
Of course, for our usage, the best solution would be to have the initial TOC rendering done on Parsoid side, which would also offer the advantage of speeding-up initial VE rendering.
Emmanuel
On Sun, 2 Mar 2014, at 4:01, Emmanuel Engelhart wrote:
Hi
For the first time, we have achieved to release a complete dump of all encyclopedic articles of the Wikipedia in English, *with thumbnails*.
This ZIM file is 40 GB big and contains the current 4.5 million articles with their 3.5 millions pictures: http://download.kiwix.org/zim/wikipedia_en_all.zim.torrent
This ZIM file is directly and easily usable on many types of devices like Android smartphones and Win/OSX/Linux PCs with Kiwix, or Symbian with Wikionboard.
You don't need modern computers with big CPUs. You can for example create a (read-only) Wikipedia mirror on a RaspberryPi for ~100USD by using our ZIM dedicated Web server called kiwix-serve. A demo is available here: http://library.kiwix.org/wikipedia_en_all/
Do this for other sister projects perhaps.
Emmanuel,
On 03/01/2014 09:01 AM, Emmanuel Engelhart wrote:
Hi
For the first time, we have achieved to release a complete dump of all encyclopedic articles of the Wikipedia in English, *with thumbnails*.
this is great news. Congratulations!
Gabriel
wikitech-l@lists.wikimedia.org