Revision: 6306 Author: wikipedian Date: 2009-01-27 20:38:34 +0000 (Tue, 27 Jan 2009)
Log Message: ----------- Bugfix: replace.py a b -xml:mydump.xml -namespace:0 also changed pages in other namespaces.
This is still not an ideal solution, as pages in other namespaces are still processed in the XmlDumpReplacePageGenerator. This takes a lot of time, especially for large deletion logs etc. which are in the Wikipedia namespace.
cydeweys, russblau and the others: it would be nice if you would review my changes, and also try to come up with a more elegant solution. Once we have agreed on a good solution, we also need to make similar changes to other scripts, such as selflink.py, table2wiki.py, and template.py. --Daniel
Modified Paths: -------------- trunk/pywikipedia/pagegenerators.py trunk/pywikipedia/replace.py
Modified: trunk/pywikipedia/pagegenerators.py =================================================================== --- trunk/pywikipedia/pagegenerators.py 2009-01-27 19:58:05 UTC (rev 6305) +++ trunk/pywikipedia/pagegenerators.py 2009-01-27 20:38:34 UTC (rev 6306) @@ -806,11 +806,15 @@ self.namespaces = []
""" - This function returns the combination of all accumulated generators - that have been created in the process of handling arguments. - Only call it after all arguments have been parsed. + This method returns the combination the given generator and all + accumulated generators that have been created in the process of handling + arguments. + + Only call this method after all arguments have been parsed. """ - def getCombinedGenerator(self): + def getCombinedGenerator(self, gen = None): + if gen: + self.gens.insert(0, gen) if (len(self.gens) == 0): return None if (len(self.gens) == 1):
Modified: trunk/pywikipedia/replace.py =================================================================== --- trunk/pywikipedia/replace.py 2009-01-27 19:58:05 UTC (rev 6305) +++ trunk/pywikipedia/replace.py 2009-01-27 20:38:34 UTC (rev 6306) @@ -684,9 +684,8 @@ for PageTitle in PageTitles] gen = iter(pages)
+ gen = genFactory.getCombinedGenerator(gen) if not gen: - gen = genFactory.getCombinedGenerator() - if not gen: # syntax error, show help text from the top of this file wikipedia.showHelp('replace') return
Hello !
2009/1/27 wikipedian@svn.wikimedia.org:
Revision: 6306 Author: wikipedian Date: 2009-01-27 20:38:34 +0000 (Tue, 27 Jan 2009)
cydeweys, russblau and the others: it would be nice if you would review my changes, and also try to come up with a more elegant solution. Once we have agreed on a good solution, we also need to make similar changes to other scripts, such as selflink.py, table2wiki.py, and template.py. --Daniel
The only problem I have with this change is that now Xml dumps go through DuplicateFilterPageGenerator, which is, in my opinion, redundant. I guess that maintaining a seenPages list containing all the pages objects of the dump, and that probing in this huge list is also costly. This being noted, I'm fine with the change.
pywikipedia-l@lists.wikimedia.org