This is why a fully automated bot is unlikely to be successful. Until you see the changes that are actually going to occur, you don’t know what terrible mistakes might be made due to the unforseen types of usage. Hence my suggestion is to use AWB, which enables you to eyeball every change and SKIP those that are inappropriate. You can also update your rules on the fly to reflect new patterns you hadn’t foreseen but suspect will re-occur. I also note that AWB has a checkbox to eliminate changes marked-up content, such as references, image file names, URLs, etc (which would be appropriate in this case).

 

In my not-so-humble opinion, the effort in writing the perfect bot for a one-off task is almost always going to be far greater than the semi-automated AWB approach. Even for a not-one-off task, I would still say to do the first use case as AWB to figure out what patterns are likely to be needed in the bot for ongoing use.

 

Kerry

 


From: gendergap-bounces@lists.wikimedia.org [mailto:gendergap-bounces@lists.wikimedia.org] On Behalf Of Daniel and Elizabeth Case
Sent: Friday, 27 February 2015 1:41 AM
To: Addressing gender equity and exploring ways to increase theparticipationof women within Wikimedia projects.
Subject: Re: [Gendergap] Random musings about a bot

 

>You also need to avoid making such a change in uRLs and quotations, or at least quotations that were originally in >English.

And filenames, too, in image syntax (although of course we should probably rename the image files, too).

 

Daniel Case