The current gzip module in the Python library sucks. It requires seekable file objects. This means that, in case you have a socket as file object, you need to read() it out to a string, put it in a StringIO and pass that on to the gzip module. This is not only a very bad way of programming, it also may cause memory issues for large files. What I would like is a gzip module that performs buffering itself, so that it can be used (in the new api) without the intermediate step of the StringIO.
Anybody in for a job ;) ?
Bryan
2007/11/11, Bryan Tong Minh bryan.tongminh@gmail.com:
The current gzip module in the Python library sucks. It requires seekable file objects. This means that, in case you have a socket as file object, you need to read() it out to a string, put it in a StringIO and pass that on to the gzip module. This is not only a very bad way of programming, it also may cause memory issues for large files. What I would like is a gzip module that performs buffering itself, so that it can be used (in the new api) without the intermediate step of the StringIO.
That's because the gzip module is intended for physical files, being a wrapper on the low-level zlib module: http://www.python.org/doc/lib/module-zlib.html
The compressobj looked promising, but I couldn't make it work with, for example, urllib.urlopen directly. Some googling ( http://www.google.com/search?&q=python%20zlib%20on%20the%20fly) yielded this: http://effbot.org/librarybook/zlib.htm
Now the last example (the ZipInputStream wrapper) looks promising - we'd just need the reverse class (ZipOutputStream) and probably modify it not to require .seek() at all of the underlying stream.
Misza