Indeed I run parallel dumps based on a range of ids... although the
algorithm was needing tweaking. I expect to get back to looking at that
pretty soon.
Ariel
Στις 15-12-2010, ημέρα Τετ, και ώρα 13:01 -0800, ο/η Diederik van Liere
έγραψε:
Dear devs,
I would like to initiate a discussion about how to reduce the time required to generate
dump files. A while ago Emmanuel Engelhart opened a bugreport suggesting to parallelize
this feature and I would like to go through the available options and hopefully determine
a course of action.
The current process is straightforward and sequential (as far as I know): it reads table
by table and row by row and stores the output. The drawbacks of this process are that it
takes increasingly more time to generate a dump as the different projects continue to grow
and when the process halts or is interrupted then it needs to start all over again.
I believe that there are two approaches to parallelizing the export dump:
1) Launch multiple PHP processes that each take care of a particular range of ids. This
might not be called true parallelization, but it achieves the same goal. The reason for
this approach is that PHP has very limited (maybe no) support for parallelization /
multiprocessing. The only thing PHP can do is fork a process (I might be incorrect about
this)
2) Use a different language with builtin support for multiprocessing like Java or Python.
I am not intending to start an heated debate but I think this is an option that at least
should be on the table and be discussed. Obviously, an important reason not to do it is
that it's a different language. I am not sure how integral the export functionality is
to MediaWiki and if it is then this is a dead end.
However, if the export functionality is primarily used by Wikimedia and nobody else then
we might consider a different language. Or, we make a standalone app that is not part of
Mediawiki and it's use is only internally for Wikimedia.
If i am missing other approaches or solutions then please chime in.
Best regards,
Diederik
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l