Re: [Toolserver-l] Dumps handling / storage / updating etc...

11 Dec 2011


      On 11/12/11 10:45, Stefan Kühn wrote:
...
Am 10.12.2011 20:52, schrieb Jeremy Baron:
...
Is it sufficient to receive the XML on stdin or do you need to be able to seek?
It is trivial to give you XML on stdin e.g.
$<  path/to/bz2 bzip2 -d | perl script.pl
Hmm, the stdin is possible, but I think this will need many memory of 
RAM on the server. I think this is no option for the future. Every 
language grows every day and the dumps will also grow. The next problem 
is the parallel use of a compressed file. If more user use this 
compressed file like your idea, then bzip2 will crash the server IMHO.
I think it is no problem to store the uncompressed XML files for an easy 
usage. We should make rules, where they have to stay and how long or we 
need a list, where every user can say "I need only the two newest dumps 
of enwiki, dewiki,...". If a dump is not needed, then we can delete this 
file.
Stefan (sk)
You seem to think that piping the output from bzip2 will hold the xml
dump uncompressed in memory until your script processes it. That's wrong.
bzip2 will begin uncompressing and writing to the pipe, when the pipe
fills, it will get blocked. As your perl script reads from there,
there's space freed and the unbzipping can progress.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Toolserver-l] Dumps handling / storage / updating etc...