Re: [Toolserver-l] Dumps handling / storage / updating etc...

12 Dec 2011

On 12/12/11 13:59, Carl (CBM) wrote:
...
  This is correct, but the overall memory usage depends
on the XML
 library and programming technique being used. For XML that is too
 large to comfortably fit in memory, there are techniques to allow for
 the script to process the data before the entire XML file is parsed
 (google "SAX" or "stream-oriented parsing"). But this requires more
 advanced programming techniques, such as callbacks, compared to the
 more naive method of parsing all the XML into a data structure and
 then returning the data structure.  That naive technique can result in
 large memory use if, say, the program tries to create a memory array
 of every page revision on enwiki.

 Of course if the perl script is doing the parsing itself, by just
 matching regular expressions, this is not hard to do in a
 stream-oriented way.

 - Carl 
Obviously. No matter if it's read from a .xml or a .xml.bz2, if it tried
to build a xml tree in memory the memory usage would be incredibly huge.
I would expect such app to get killed for such.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Toolserver-l] Dumps handling / storage / updating etc...