On Mon, Mar 19, 2012 at 02:05, Bináris <wikiposta@gmail.com> wrote:

I tell you how I became a Python programmer. I behan to use Pywikipedia, then I realized that I wanted to modify something for my own needs, and tried to understand, then I began to exepriment with basic.py, than I began to write my own scripts... Now I use Python as my general hobby programming language.

Hadn 't seen basic.py - thanks. With all the commenting, it looks like a much better place to start than the other Python I've looked at.

 
it occurs to me that I can do a search for any match from a list of spam strings and replace with a delete tag.  "(Florida|real estate|home insurance... )"  - I have a list of a few hundred spammy phrases.
That's a way, too, if your list is comprehensive enough. I strongly suggest you to use fixes instead of command line replacements. You may create a fix in fixes.py or user-fixes.py with all your stopwords, following the same pattern, while you can't type a few hundred words into a command. If your list is in a well ordered form, you may put the words in a column of an Excel table, and create the replacements with a text function and copy back to user-fixes.py.
You may also want to replace these words with a special category instead of a deleet tag and tell delete.py to kill them en masse.

Thanks, good ideas. I wasn't familiar with fixes.py/user-fixes.py. I copy and paste from my working file to the command line rather than typing , but having the fixes in a file is still much better.

I don't expect to do this often - now we have AbuseFilter set up and it's catching new spam quite well. It's just a cleanup tool for old spam.


One more question: do you know blacklist? This is an admins' tool to list ugly websites. You just write the bad-faced website on the blacklist, and users will be unhappy to see  they  cannot save the page until it contains the link to it. :-)

Yes, SpamBlacklist - it apparently updates from the Wikimedia blacklist every 10-15 minutes, and we have added sites on there as well. Very good, but we can't keep up with all the new spam sites (which makes the updates from Wikimedia very valuable.)

Thanks again,
Chris


--
Bináris

_______________________________________________
Pywikipedia-l mailing list
Pywikipedia-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l




--
Chris Watkins

Appropedia.org - Sharing knowledge to build rich, sustainable lives.