I have done the work for compat, now it is running, and I plan to open the ticket when I get the numbers.
As far as I know, compat is unfortunatley totally deprecated. Is there despite this any possibility to upload a patch? Otherwise I can describe here, what I did. I know that people still use compat.
Let's talk about the second problem. I am not sure it may easily be solved for the satisfaction ov everybody, but I was already thinking about it. (I have plenty of plans concerning replace.py which is quite poor now.)
Advanced use of replace.py needs a two-run approach. First we collect candidates from dump wit no human inteaction, while the bot owner sleeps or works or does any useful thing.
The titles are collected to a file, and in the second run the owner processes them interactively, much faster. All the belongings of this process that I implemented to compat are totally missing from core now, making replace.py useless for me, but this is obviously a temporary state. So let's think in the way, that replace.py must help direct immediate raplecements as well as two-run replacements.
If you want to replace immediately in one run, the replacement should be done only once to spare time. But where? XMLdumpgenerator is a separate class that can yield pages, I don't hink we should remove it. The main class has the complexity of how to handle the separate cases and human interactions. I don't think we should transfer it to XMLdumpgenerator. Perhaps the best solution is if XMLdumpgeneratordoes not want to replace, just to search. This will be some faster.
If you save the titles for later human processing, XMLdumpgenerator does not have to do the replacement in most of cases, just search again.
There is a third case: when I develop new fixes, I often do experiments, It is useful to see the planned replacements during the first run, this helps me to enhance the fix. So I wouldn't totally remove the replacing ability. This needs a separate switch which can be ON by default when we use -xml.
Please keep in mind that to accelerate the generator is important, but keep the seped of main replacebot high is even more important. When you want to totally avoid double work, you don't use dump at all.
So I have three ideas for the work of this:
- The switch tells the generator to replace the second parameter of replacement tuples with ''. I don't have numbers, how faster this would be. This has some danger, so the bot must ensure that the switch is effective only if we save titles to a file, or we work in simulation mode, not to destroy wiki.
- The generator will search instead of replace. I Don't like this idea, because textlib.py has the complexity to listen for exceptions and comments and nowikis etc.
- We enhance textlib.py so that
replaceExcept() will have a new parameter. This will make
replaceExcept()
to use a search rather than a replace. This is the good solution. In this case the function could return a dummy text which differs from original, so that we don't have to rewrite the scripts which use it.
Anyhow,
replaceExcept()
needs another enhancement which I already did in my copy. It should optionally return (old, new) pairs for further processing, this is very useful for developing fixes, measuring efficiency, creating statistics etc. This will be a separate task, but if you agree with this solution, we may add the two new parameters in one run.
So we have three tickets now. :-)