Hi John,
The order matters because for bigger projects it takes much longer to complete their dumps. For example, for English wikipedia it takes about 14 days to process all dumps, so if dump start later in the month, it will be ready next month only. And if something goes wrong, it takes too long to rerun the process. It is already happened several time during this last year only.
Regards, Alex
On Sat, Jan 17, 2015 at 9:31 PM, John phoenixoverride@gmail.com wrote:
All dumps are scheduled to run once a month, some just take longer than others to run, and there are issues some time, not sure why the order really matters.
The WMF dump all of their active databases (any database that is hosting a site, regardless of the status of the site) If you want dumps/wikis removed file a request on meta for the complete shutdown of the project. Otherwise just use some common sense and ignore the inactive wikis.
On Sat, Jan 17, 2015 at 2:54 PM, Alex Druk alex.druk@gmail.com wrote:
Agree! I also propose to schedule dumps according their importance. For example by article counts.
All the best,
On Sat, Jan 17, 2015 at 7:59 PM, Richard Jelinek rj@petamem.com wrote:
Hi,
we're basically mirroring all the generated dumps, extract them, harvest data etc. Lately I came to examine some of the more exotic languages and to my surprise they were even more exotic than I thought. I propose to ditch them.
Afar (aa) Wikipedia
latest at our servers is aar-20141223.xml.bz with 22974 bytes (we convert into iso639-3)
It seems the wiki has been closed or moved into incubator:
http://meta.wikimedia.org/wiki/Proposals_for_closing_projects/Closure_of_Afa...
Nevertheless in the xmldumps this wiki keeps showing up and pretending something is there. I believe we'd be all better off if dums of this would cease.
Basically the same applies for Ndonga Wikipedia
http://meta.wikimedia.org/wiki/Proposals_for_closing_projects/Closure_of_Ndo...
But the xmldumps keep pouring in:
ndo-20141223.xml.bz2
etc. Same story with several other wikimedia projects in other languages.
So in general: Could we stop dumping closed projects?
kind regards,
Dipl.-Inf. Univ. Richard C. Jelinek
PetaMem GmbH - www.petamem.com Geschäftsführer: Richard Jelinek Language Technology - We Mean IT! Sitz der Gesellschaft: Fürth 2.58921 * 10^8 Mind Units Registergericht: AG Fürth, HRB-9201
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
-- Thank you.
Alex Druk www.wikipediatrends.com alex.druk@gmail.com (775) 237-8550 Google voice
Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l