Hi Asaf,
Le lun 28/09/09 14:52, Asaf Bartov asaf.bartov(a)gmail.com a écrit:
> In trying to retrieve the images for the Hebrew Wikipedia ZIM Im
> making, I tried running Emmanuels script MIRRORMEDIAWIKIPAGES.PL. My
> command line was this:
>
> ./MIRRORMEDIAWIKIPAGES.PL --SOURCEHOST=HE.WIKIPEDIA.ORG [1]
> --DESTINATIONHOST=LOCALHOST --USEINCOMPLETEPAGESASINPUT --SOURCEPATH=W
bizarre... I do not have this issue by me. Please checkout a recent version of the svn to be sure.
This should also theoriticaly not happen, because at this moment you only have in the worse case twice the list of article in memory.
> After working for more than 20 hours, and still in the stage of
> populating the @pages with incomplete pages, it aborted with "out of
> memory". The machine has 4GB physical memory, and the last time I
> checked -- several hours before it aborted -- the script was consuming
> 3.6GB.
>
> Is there a way to do this in several large chunks, without specifying
> each individual page? How do you do it?
I do like you and never have had this issue.
The script memory usage should never increase.
By me with 8GB memory, it uses almost 5%.
But, if you want to get only the pictures there is an alternative way:
* use ./listAllPages to get the list of all pages
* use ./listDependences.pl to get the missing images (maybe you also need to use grep to filter out a few templates)
* at the end use ./mirrorMediawikiPages.pl with in STDIN your list of images
Regards
Emmanuel