Hi, everyone.
Can someone explain what procedure you use to add (some) images to the dump before packaging a ZIM file?
I am preparing a fresh Hebrew Wikipedia ZIM file, and would like to test the integration of images as well as recent improvements to Kiwix.
So far, I found Wikix http://meta.wikimedia.org/wiki/Wikix, and no specific instructions on including images in Emmanuel's ZIM-building script, so I'm guessing that if images are downloaded and integrated into the local Wikipedia server it's enough?
My questions are:
1. How do you know which images are referenced by the local Wikipedia? I see Wikix extracts this information into a bunch of shell script files, but maybe there's another/better way? What do you use?
2. Given a list of images, what is the best way to retrieve them without pounding the Wikimedia servers? Is there an accepted way? Should I coordinate it with anyone? The shell scripts generated by Wikix don't seem to make any provision for delays or anything, and I'm afraid running them would get me banned. Again, what do you use?
3. What if we want only the thumbnail/low-res version incorporated in the articles themselves, and not the full resolution version from commons etc.?
4. Once you have a local tree of the image files (in directories 0, 1, 2,..., f), what else do you need to do to get Emmanuel's buildZim....pl script to include them in the ZIM file?
Many thanks in advance,
Asaf Bartov Wikimedia Israel
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Asaf Bartov a écrit :
Can someone explain what procedure you use to add (some) images to the dump before packaging a ZIM file?
Yes, at least I can try :)
I am preparing a fresh Hebrew Wikipedia ZIM file, and would like to test the integration of images as well as recent improvements to Kiwix.
So far, I found Wikix http://meta.wikimedia.org/wiki/Wikix, and no specific instructions on including images in Emmanuel's ZIM-building script, so I'm guessing that if images are downloaded and integrated into the local Wikipedia server it's enough?
Looks good, but I don't know how Wikix exactly works.
My questions are:
- How do you know which images are referenced by the local Wikipedia?
Thanks the Mediawiki API.
I have code to handle that easily, and also a perl script call "listDependences.pl". But the whole stuff is done in daily business with the mirroring script.
I see Wikix extracts this information into a bunch of shell script files, but maybe there's another/better way? What do you use?
Personally, with my script "mirrorMediawikiPages.pl"
- Given a list of images, what is the best way to retrieve them without
pounding the Wikimedia servers? Is there an accepted way? Should I coordinate it with anyone? The shell scripts generated by Wikix don't seem to make any provision for delays or anything, and I'm afraid running them would get me banned. Again, what do you use?
I think someone from the Foundation can better answer to your last questions. I try not to make too much requests against the Mediawiki instance at the same time.
- What if we want only the thumbnail/low-res version incorporated in the
articles themselves, and not the full resolution version from commons etc.?
Write your own skin and add hooks (or stuff like that) to rewrite on the fly the generated HTML, that's what I do.
- Once you have a local tree of the image files (in directories 0, 1,
2,..., f), what else do you need to do to get Emmanuel's buildZim....pl script to include them in the ZIM file?
I think nothing... buildZimFileFromDirectory.pl, simply put a valid HTML directory tree into a ZIM file. If that does not work (you do not get with the ZIM file, what you can see offline in your HTML directory/file tree)... that's mean I have certainly a bug in my script :)
Regards Emmanuel