Hi!
I would like to build a ZIM file of the Simple English Wikipedia including the full resolution images for educational use in a region without reliable Internet access.
So far as I can tell, I would need to:
* grab a copy of the current dump and use something like http://meta.wikimedia.org/wiki/Wikix to populate it with images
* set it up to be served via HTTP, possibly by importing into a local mediawiki instance
...alternatively:
* http://www.nongnu.org/wp-mirror/
And then:
* spider the whole thing into html/images using curl/wget
* use Zimwriter+buildZimFromDirectory as per here: https://openzim.org/Build_your_ZIM_file
Good plan? Better plan? Is there a flag on any of this to limit things to redistributable images so I can add a torrent for the resulting (est. 50-60Gb) file to the kiwix site?
Best, Jason Skomorowski
Just a quick answer before someone who actually is knowledgeable can help you: I think people dumping from a live MediaWiki just use InstantCommons https://www.mediawiki.org/wiki/InstantCommons
Nemo
Thanks, hadn't heard about InstantCommons. So the idea is to set up the dump locally in mediawiki with that and it will be populated as I wget it?
Seems like that API has some bandwidth limitations built-in...
J
On 12-10-10 02:51 PM, Federico Leva (Nemo) wrote:
Just a quick answer before someone who actually is knowledgeable can help you: I think people dumping from a live MediaWiki just use InstantCommons https://www.mediawiki.org/wiki/InstantCommons
Nemo
Offline-l mailing list Offline-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/offline-l
Jason Skomorowski, 10/10/2012 20:58:
Thanks, hadn't heard about InstantCommons. So the idea is to set up the dump locally in mediawiki with that and it will be populated as I wget it?
No wget, you'd be exporting it with something like dumpHTML from your wiki installation, MediaWiki caches only thumbs. But again, don't trust me. ;-)
Seems like that API has some bandwidth limitations built-in...
Somehow, but you'd also probably not download the biggest resolution etc.
Nemo
Dear Jason, In regards to WP-MIRROR, version 0.4 is currently in development and, when ready, will install the 'simple' wiki by default. A key design objective of v0.4 is to completely automate the configuration of its dependencies (MySQL, MediaWiki, cURL, etc). The idea is that it should 'just work'. If you are pressed for time, you can try using version 0.3. But, as other users have remarked, it requires a fair bit of configuration of dependencies. Please see the discussion on http://lists.nongnu.org/archive/html/wp-mirror-list/. In regards to disk space, I have a laptop that mirrors the 'simple' wiki. Images and data currently occupy a bit under 50G. Of course this will grow over time as contributors add to the 'simple' wiki.
Sincerely Yours, Kent
On 10/10/12, Jason Skomorowski jason@skomorowski.net wrote:
Hi!
I would like to build a ZIM file of the Simple English Wikipedia including the full resolution images for educational use in a region without reliable
Internet access.
So far as I can tell, I would need to:
<snip>
...alternatively:
<snip>
Good plan? Better plan? Is there a flag on any of this to limit things to redistributable images so I can add a torrent for the resulting (est. 50-60Gb) file to the kiwix site?
<snip>
Thanks for that Kent, especially the size info, helped me provision. And for building it!
I'm fairly savvy with most of the deps but not lisp. However I look at that as an adventure :)
Is there a source repo someplace where I can play with what there is so far? Seems from the mailing list like the current 0.3 version will be hassles to get going on Ubuntu 12.10 beyond just setting up a mediwiki deployment and installing a few debs as it seems on squeeze. So may as well contribute back as I research how to wedge things into there.
Best, Jason
On 12-10-10 04:28 PM, wp mirror wrote:
Dear Jason, In regards to WP-MIRROR, version 0.4 is currently in development and, when ready, will install the 'simple' wiki by default. A key design objective of v0.4 is to completely automate the configuration of its dependencies (MySQL, MediaWiki, cURL, etc). The idea is that it should 'just work'. If you are pressed for time, you can try using version 0.3. But, as other users have remarked, it requires a fair bit of configuration of dependencies. Please see the discussion on http://lists.nongnu.org/archive/html/wp-mirror-list/. In regards to disk space, I have a laptop that mirrors the 'simple' wiki. Images and data currently occupy a bit under 50G. Of course this will grow over time as contributors add to the 'simple' wiki.
Sincerely Yours, Kent
On 10/10/12, Jason Skomorowski jason@skomorowski.net wrote:
Hi!
I would like to build a ZIM file of the Simple English Wikipedia including the full resolution images for educational use in a region without reliable
Internet access.
So far as I can tell, I would need to:
<snip> > ...alternatively: > > * http://www.nongnu.org/wp-mirror/ > <snip> > Good plan? Better plan? Is there a flag on any of this to limit things to > redistributable images so I can add a torrent for the resulting (est. > 50-60Gb) file to the kiwix site? <snip>
Offline-l mailing list Offline-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/offline-l
Le 10/10/2012 20:32, Jason Skomorowski a écrit :
Hi!
I would like to build a ZIM file of the Simple English Wikipedia including the full resolution images for educational use in a region without reliable Internet access.
So far as I can tell, I would need to:
- grab a copy of the current dump and use something like
http://meta.wikimedia.org/wiki/Wikix to populate it with images
- set it up to be served via HTTP, possibly by importing into a local
mediawiki instance
...alternatively:
And then:
spider the whole thing into html/images using curl/wget
use Zimwriter+buildZimFromDirectory as per here:
This is perfectly feasible in that way... but won't be easy to get a good result.
Emmanuel
On 12-10-10 05:15 PM, Emmanuel Engelhart wrote:
Le 10/10/2012 20:32, Jason Skomorowski a écrit :
Hi!
I would like to build a ZIM file of the Simple English Wikipedia including the full resolution images for educational use in a region without reliable Internet access.
So far as I can tell, I would need to:
- grab a copy of the current dump and use something like
http://meta.wikimedia.org/wiki/Wikix to populate it with images
- set it up to be served via HTTP, possibly by importing into a local
mediawiki instance
...alternatively:
And then:
spider the whole thing into html/images using curl/wget
use Zimwriter+buildZimFromDirectory as per here:
This is perfectly feasible in that way... but won't be easy to get a good result.
Beyond getting all those pieces running, what hurdles do you anticipate? Does the dump not import cleanly into a stock mediawiki?
Le 24/10/2012 16:22, Jason Skomorowski a écrit :
On 12-10-10 05:15 PM, Emmanuel Engelhart wrote:
Le 10/10/2012 20:32, Jason Skomorowski a écrit :
Hi!
I would like to build a ZIM file of the Simple English Wikipedia including the full resolution images for educational use in a region without reliable Internet access.
So far as I can tell, I would need to:
- grab a copy of the current dump and use something like
http://meta.wikimedia.org/wiki/Wikix to populate it with images
- set it up to be served via HTTP, possibly by importing into a local
mediawiki instance
...alternatively:
And then:
spider the whole thing into html/images using curl/wget
use Zimwriter+buildZimFromDirectory as per here:
This is perfectly feasible in that way... but won't be easy to get a good result.
Beyond getting all those pieces running, what hurdles do you anticipate? Does the dump not import cleanly into a stock mediawiki?
* mwDumper is not really maintained, but this is not the worth part (**) * Then you will have to import the pictures from simple and commons (**) * Configure everything with DumpHTML to get a quality result (***) * Build the ZIM file (**)
Emmanuel