2012/3/18 Chris Watkins <chriswaterguy(a)appropedia.org>
There are a couple of options then. If the first solution, from Bináris,
requires the page to be identified, I could make a list from AllPages for
that namespace...
Yes, and once you clean it up properly, you may repeat the cleaning
regularly, e.g. on a weekly basis, from Recentchanges which is much faster.
But for now (given my lack of skill in SQL and Python)
I tell you how I became a Python programmer. I behan to use Pywikipedia,
then I realized that I wanted to modify something for my own needs, and
tried to understand, then I began to exepriment with basic.py, than I began
to write my own scripts... Now I use Python as my general hobby programming
language.
it occurs to me that I can do a search for any match
from a list of spam
strings and replace with a delete tag. "(Florida|real estate|home
insurance... )" - I have a list of a few hundred spammy phrases.
That's a way, too, if your list is comprehensive enough. I strongly suggest
you to use fixes instead of command line replacements. You may create a fix
in fixes.py or user-fixes.py with all your stopwords, following the same
pattern, while you can't type a few hundred words into a command. If your
list is in a well ordered form, you may put the words in a column of an
Excel table, and create the replacements with a text function and copy back
to user-fixes.py.
You may also want to replace these words with a special category instead of
a deleet tag and tell delete.py to kill them en masse.
One more question: do you know blacklist? This is an admins' tool to list
ugly websites. You just write the bad-faced website on the blacklist, and
users will be unhappy to see they cannot save the page until it contains
the link to it. :-)
--
Bináris