https://bugzilla.wikimedia.org/show_bug.cgi?id=54574
--- Comment #4 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- Wow, that's a big patch =)
* codecs is fine with me * can you avoid lines > 80 characters? I know that this is not something we do everywhere, but that's bad looking code. Same goes for if foo: bar. Please skip a line. * can you document thoroughly what's being done? parameters in the generators? In replace.py ? I find it really hard to understand the "choice" table in the docstring explaining -scanonce & others. * What's this: + f = codecs.open(filename, 'r', 'utf-8') + f.close() ??
I am also not convinced by the fact that after each page, FilterFileAppend is called, and #1 path is computed, #2 a file is opened, written in, and closed. I'm thinking that a possible cleaner way to do this would be to have a Filter object: put everything you need in it (an opened file descriptor, a list of titles to ignore if you need to use this, etc...) and keep a reference to it from the replace & disambig bots. How does that sound to you?
I also know that Daniel wanted first to keep the same file format, but... a couple of things are wrong here: * if you output titles with page.urlname() it will not be possible to read the file with TextfilePageGenerator afaik. Think of special characters, being url encoded, and not decoded. * if you want to use a Page title for a filename, you want Page.titleforFilename, not Page.urlname
Thank you!