On 04.11.2014 18:18, Cristian Consonni wrote:
Hi Markus,
2014-11-01 0:29 GMT+01:00 Markus Krötzsch markus@semantic-mediawiki.org:
Nice. We are running the RDF generation on a shared cloud environment and I am not sure we can really use a lot of RAM there. Do you have any guess how much RAM you needed to get this done?
I didn't take any stats (my bad) but I would say that for the combined dump, starting from the compressed (gz) file it took around 50GB. I don't have time to re-run this experiment again now but I next time I will take some measurements.
Ok, thanks, this is already a good indicator for us. I don't think we could use up that much memory on Wikimedia Labs ...
Btw. there was a bug in our RDF exports that made them bigger than they should have been (no wrong triples, but many duplicates). I have corrected the issue now and uploaded new versions. Maybe this also will make processing faster next time.
Markus