This is why a fully automated bot is unlikely to be successful. Until you
see the changes that are actually going to occur, you don't know what
terrible mistakes might be made due to the unforseen types of usage. Hence
my suggestion is to use AWB, which enables you to eyeball every change and
SKIP those that are inappropriate. You can also update your rules on the fly
to reflect new patterns you hadn't foreseen but suspect will re-occur. I
also note that AWB has a checkbox to eliminate changes marked-up content,
such as references, image file names, URLs, etc (which would be appropriate
in this case).
In my not-so-humble opinion, the effort in writing the perfect bot for a
one-off task is almost always going to be far greater than the
semi-automated AWB approach. Even for a not-one-off task, I would still say
to do the first use case as AWB to figure out what patterns are likely to be
needed in the bot for ongoing use.
Kerry
_____
From: gendergap-bounces(a)lists.wikimedia.org
[mailto:gendergap-bounces@lists.wikimedia.org] On Behalf Of Daniel and
Elizabeth Case
Sent: Friday, 27 February 2015 1:41 AM
To: Addressing gender equity and exploring ways to increase
theparticipationof women within Wikimedia projects.
Subject: Re: [Gendergap] Random musings about a bot
You also need to avoid making such a change in uRLs and
quotations, or at
least quotations that were originally in >English.
And filenames, too, in image syntax (although of course we should probably
rename the image files, too).
Daniel Case