Re: [MediaWiki-l] Wiki spam. Stronger fightback.

25 May 2013


      On Fri, 24 May 2013, Daniel Friesen wrote:
...
..
The proper way to deal with this spam is not by IP but by content. We need 
some people who are knowledgeable about matching spam by training programs 
with spam and non-spam. ..
Well, Daniel. I have some ideas how to realize the automatic analysis of
the content of articles and qualify some of them as spam.
They are based on the TORI axioms, and I am not sure if this is correct 
place to describe them. Better, I would try to do it by myself,
but yet I have no experience programming in PHP and writing robots.
(My best achievements: I wrote few PHP scripts and once
I killed a hundred of users through MySQL with a single command;
and I am not sure if the intelligent robot should use such a brutal way.)
In order to participate in the project, I need certain help from the
professionals. Namely, I need somebody to post the detailed tutorial,
description of the basic "plug-in" and "plug-out",
with very simple examples:
1. Code opens the wiki, downloads the list of new files and saves the
list as a text file in the working directory.
2. Code open the specific page for editing and saves its source in the 
working directory.
3. Code opens the editing of the specific page and replaces its content
with the special source from the working directory.
4. Code opens the editing of the specific discussion page and add there
the warning.
5. Code blocks the specific user.
6. Code removes the specific page.
7. Code collects all the complains about its activity and transfers them
to the Human–administrator.
8. Code makes the google search and saves the results as the text file.
The spammers already have these examples; it would be good to supply with 
the same tools the colleagues, who handle some wikis.
The samples mentioned above should be short; preferably, one line each.
They should be optimized not for the best performance,
but for the easiest understanding by a human. In particular,
neither loops, not complicated logical expressions should be involved.
The rest I plan to write in C++, which seems to be faster than PHP;
and (which is more important) I am more familiar with C++ than with PHP.
The goal is robot–admin, robot-editor,
that would not be distinguishable from an intelligent professional human,
that follows the explicitly formulated and transparent editorial politics.
If success, you'll be able to rewrite it from C++ to PHP and
optimize for MediaWiki.
Also, it would be good to arrange the option, that the new page, by
default, opens with sertain content form the sample page
(for example, http://mizugadro.mydns.jp/o/index.php/SamplePage )
that helps the human to provide the necessary elements of a good article:
preamble, introduction, definition(s), description of the new concept(s),
support of the concept suggested,
critics of the concept suggested,
ways of refutation of the concept suggested,
humor about the concept,
conclusion,
references, keywords, categories.
Then, any article, that fail the elements above, should qualified as spam
and treated correspondently.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

Re: [MediaWiki-l] Wiki spam. Stronger fightback.