Hi Adrian,

2017-05-12 14:55 GMT+02:00 Adrian Bielefeldt <Adrian.Bielefeldt@mailbox.tu-dresden.de>:
Hello everyone,

I have a problem on stat1002.eqiad.wmnet using
https://github.com/Wikidata/QueryAnalysis/blob/master/tools/hourlyFieldValue.py
on two files (1.1 GB and 2.4 GB respectively); the processed ends with
Killed.
My guess is that my script uses too much memory. However, it was my
understanding that csv.DictReader reads line-by-line, so the file sizes
should not matter.

If anyone can tell me why my script is taking up so much memory or if
there is any other reason for the script getting killed I'd be grateful.

I checked dmesg on stat1002 and the Kernel OOM killer is the one that ended your process. I didn't check very carefully but maybe the problem are the size of the structs inĀ https://github.com/Wikidata/QueryAnalysis/blob/master/tools/hourlyFieldValue.py#L34-L36 ?

I'd also check the usage of zip, since fromĀ https://docs.python.org/2/library/functions.html#zip it seems that it unpacks all the items of your csv dictionaries in one go.

Hope that helps!

Luca