Le lun 28/09/09 14:52, Asaf Bartov asaf.bartov(a)gmail.com a écrit:
In trying to retrieve the images for the Hebrew
Wikipedia ZIM Im
making, I tried running Emmanuels script MIRRORMEDIAWIKIPAGES.PL. My
command line was this:
./MIRRORMEDIAWIKIPAGES.PL --SOURCEHOST=HE.WIKIPEDIA.ORG 
--DESTINATIONHOST=LOCALHOST --USEINCOMPLETEPAGESASINPUT --SOURCEPATH=W
bizarre... I do not have this issue by me. Please checkout a recent version of the svn to
This should also theoriticaly not happen, because at this moment you only have in the
worse case twice the list of article in memory.
After working for more than 20 hours, and still in the
populating the @pages with incomplete pages, it aborted with "out of
memory". The machine has 4GB physical memory, and the last time I
checked -- several hours before it aborted -- the script was consuming
Is there a way to do this in several large chunks, without specifying
each individual page? How do you do it?
I do like you and never have had this issue.
The script memory usage should never increase.
By me with 8GB memory, it uses almost 5%.
But, if you want to get only the pictures there is an alternative way:
* use ./listAllPages to get the list of all pages
* use ./listDependences.pl to get the missing images (maybe you also need to use grep to
filter out a few templates)
* at the end use ./mirrorMediawikiPages.pl with in STDIN your list of images