Nicolas Dumazet ha scritto:
Well I just wanted to update the date, and I thought that a generic statement was better: in fact... why would I put my name, knowing that purodha did some important fixes on the file during those years?
Note that I'm very flexible on those attributions sections. Any suggestion in welcome, and is likely to be fine with me.
Forget it. I cannot talk about changes I haven't seen.
index = 1
while True:
path = config.datafilepath('cache', 'pagestore' + str(index))
if not os.path.exists(path): break
index += 1
At least this looks nice for diskcache module too, so we can easily get rid of the imported random module and the ugly '*-abfdexjwi' like filenames.
Thinking again about this: those files are temporary, and are only accessed from one specific entry point. A tempfile would be even cleaner, right? ( http://docs.python.org/library/tempfile.html , standard since 2.3 ) I think I could do this for both diskcache and interwiki, and remove the cache/ directory. Comments?
It would be preferable creating a single file, instead of adding a new file for each separated but identical Site, repeating the same download within a relatively short time... Working similar to a web browser cache.
You can use tempfile in current implementation, but "cache" directory is used from featured.py too, instead of "featured" (r5536). Maybe it's better to keep it, as it's a common name. For example, some my external scripts use it, and maybe in the future more scripts will do it.
Speaking of diskcache: I wondered if a simple Shelf ( http://docs.python.org/library/shelve.html ) wouldn't be faster than diskcache. Shelf has been written at low levels, has different interfaces for each specific system family. Naturally I would think that Shelf should be faster and more appropriate than our custom-made module, but Shelf might be too generic, and induce unnecessary overhead?
I am not sure if it is worth replacing it with shelve here, probably not if you think to speed up the code.
I had always asked myself why we have adopted this solution because I have a doubt about the amount of RAM requested by mediawiki-messages that the bot actually use. I think a list of items not to discard would have been simpler. Although I have really appreciated this more sophisticated solution.