On 12/12/11 13:59, Carl (CBM) wrote:
This is correct, but the overall memory usage depends on the XML library and programming technique being used. For XML that is too large to comfortably fit in memory, there are techniques to allow for the script to process the data before the entire XML file is parsed (google "SAX" or "stream-oriented parsing"). But this requires more advanced programming techniques, such as callbacks, compared to the more naive method of parsing all the XML into a data structure and then returning the data structure. That naive technique can result in large memory use if, say, the program tries to create a memory array of every page revision on enwiki.
Of course if the perl script is doing the parsing itself, by just matching regular expressions, this is not hard to do in a stream-oriented way.
- Carl
Obviously. No matter if it's read from a .xml or a .xml.bz2, if it tried to build a xml tree in memory the memory usage would be incredibly huge. I would expect such app to get killed for such.