2012/3/18 Chris Watkins chriswaterguy@appropedia.org
There are a couple of options then. If the first solution, from Bináris, requires the page to be identified, I could make a list from AllPages for that namespace...
Yes, and once you clean it up properly, you may repeat the cleaning regularly, e.g. on a weekly basis, from Recentchanges which is much faster.
But for now (given my lack of skill in SQL and Python)
I tell you how I became a Python programmer. I behan to use Pywikipedia, then I realized that I wanted to modify something for my own needs, and tried to understand, then I began to exepriment with basic.py, than I began to write my own scripts... Now I use Python as my general hobby programming language.
it occurs to me that I can do a search for any match from a list of spam strings and replace with a delete tag. "(Florida|real estate|home insurance... )" - I have a list of a few hundred spammy phrases.
That's a way, too, if your list is comprehensive enough. I strongly suggest you to use fixes instead of command line replacements. You may create a fix in fixes.py or user-fixes.py with all your stopwords, following the same pattern, while you can't type a few hundred words into a command. If your list is in a well ordered form, you may put the words in a column of an Excel table, and create the replacements with a text function and copy back to user-fixes.py. You may also want to replace these words with a special category instead of a deleet tag and tell delete.py to kill them en masse.
One more question: do you know blacklist? This is an admins' tool to list ugly websites. You just write the bad-faced website on the blacklist, and users will be unhappy to see they cannot save the page until it contains the link to it. :-)