Hi John,
The order matters because for bigger projects it takes much longer to
complete their dumps. For example, for English wikipedia it takes about 14
days to process all dumps, so if dump start later in the month, it will be
ready next month only. And if something goes wrong, it takes too long to
rerun the process. It is already happened several time during this last
year only.
Regards,
Alex
On Sat, Jan 17, 2015 at 9:31 PM, John <phoenixoverride(a)gmail.com> wrote:
All dumps are scheduled to run once a month, some just
take longer than
others to run, and there are issues some time, not sure why the order
really matters.
The WMF dump all of their active databases (any database that is hosting a
site, regardless of the status of the site) If you want dumps/wikis removed
file a request on meta for the complete shutdown of the project. Otherwise
just use some common sense and ignore the inactive wikis.
On Sat, Jan 17, 2015 at 2:54 PM, Alex Druk <alex.druk(a)gmail.com> wrote:
Agree!
I also propose to schedule dumps according their importance. For example
by article counts.
All the best,
On Sat, Jan 17, 2015 at 7:59 PM, Richard Jelinek <rj(a)petamem.com> wrote:
Hi,
we're basically mirroring all the generated dumps, extract them,
harvest data etc. Lately I came to examine some of the more exotic
languages and to my surprise they were even more exotic than I
thought. I propose to ditch them.
Afar (aa) Wikipedia
latest at our servers is aar-20141223.xml.bz with 22974 bytes
(we convert into iso639-3)
It seems the wiki has been closed or moved into incubator:
http://meta.wikimedia.org/wiki/Proposals_for_closing_projects/Closure_of_Af…
Nevertheless in the xmldumps this wiki keeps showing up and pretending
something is there. I believe we'd be all better off if dums of this
would cease.
---
Basically the same applies for Ndonga Wikipedia
http://meta.wikimedia.org/wiki/Proposals_for_closing_projects/Closure_of_Nd…
But the xmldumps keep pouring in:
ndo-20141223.xml.bz2
etc. Same story with several other wikimedia projects in other languages.
So in general: Could we stop dumping closed projects?
kind regards,
--
Dipl.-Inf. Univ. Richard C. Jelinek
PetaMem GmbH -
www.petamem.com Geschäftsführer: Richard Jelinek
Language Technology - We Mean IT! Sitz der Gesellschaft: Fürth
2.58921 * 10^8 Mind Units Registergericht: AG Fürth, HRB-9201
_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
--
Thank you.
Alex Druk
www.wikipediatrends.com
alex.druk(a)gmail.com
(775) 237-8550 Google voice
_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
--
Thank you.
Alex Druk
alex.druk(a)gmail.com
(775) 237-8550 Google voice