Diederik van Liere wrote:
To continue the discussion on how to improve the performance, would it be possible to distribute the dumps as a 7z / gz / other format archive containing multiple smaller XML files. It's quite tricky to split a very large XML file in smaller valid XML files and if the dumping process is already parallelized then we do not have to cat the different XML files to one large XML file but instead we can distribute multiple smaller parallelized files .
best,
Diederik
That has already been done for enwiki.