On Mon, Mar 19, 2012 at 02:05, Bináris <wikiposta(a)gmail.com> wrote:
I tell you how I became a Python programmer. I behan to use Pywikipedia,
then I realized that I wanted to modify something for
my own needs, and
tried to understand, then I began to exepriment with basic.py, than I began
to write my own scripts... Now I use Python as my general hobby programming
language.
Hadn 't seen basic.py - thanks. With all the commenting, it looks like a
much better place to start than the other Python I've looked at.
it occurs to me that I can do a search for any
match from a list of spam
strings and replace with a delete tag. "(Florida|real estate|home
insurance... )" - I have a list of a few hundred spammy phrases.
That's a way, too, if your list is comprehensive enough. I strongly
suggest you to use fixes instead of command line replacements. You may
create a fix in fixes.py or user-fixes.py with all your stopwords,
following the same pattern, while you can't type a few hundred words into a
command. If your list is in a well ordered form, you may put the words in a
column of an Excel table, and create the replacements with a text function
and copy back to user-fixes.py.
You may also want to replace these words with a special category instead
of a deleet tag and tell delete.py to kill them en masse.
Thanks, good ideas. I wasn't familiar with fixes.py/user-fixes.py. I copy
and paste from my working file to the command line rather than typing , but
having the fixes in a file is still much better.
I don't expect to do this often - now we have AbuseFilter set up and it's
catching new spam quite well. It's just a cleanup tool for old spam.
One more question: do you know blacklist? This is an
admins' tool to list
ugly websites. You just write the bad-faced website on the blacklist, and
users will be unhappy to see they cannot save the page until it contains
the link to it. :-)
Yes, *SpamBlacklist
<http://www.mediawiki.org/wiki/Extension:SpamBlacklist>*- it
apparently updates from the Wikimedia blacklist every 10-15 minutes,
and we have added sites on there as well. Very good, but we can't keep up
with all the new spam sites (which makes the updates from Wikimedia very
valuable.)
Thanks again,
Chris
--
Bináris
_______________________________________________
Pywikipedia-l mailing list
Pywikipedia-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
--
Chris Watkins
Appropedia.org - Sharing knowledge to build rich, sustainable lives.