Yep, if we can use compressed dumps, we can use much lesser resources that what it is using now (currently >800GB).
I was taking a look at our dumps in user-store and none of them are compressed, and I was socked about that. I know a lot of people use pywikipedia to parse the dumps, and I know it can handle the bz2 files. any reason we dont just make them all bz2?
John
_______________________________________________
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette