Hello,
Spam programs have posted spam links on our wiki for a while. Although SpamBacklist extension was installed, "php cleanup.php" was to revert the spam links. After ConfirmEdit extension was installed, spam programs are difficult to post spam automatically. However, those spam links are still in page history, and in database. For example, the following page has a lot of revision of spam links:
http://kb.linuxvirtualserver.org/wiki?title=Building_Scalable_TFTP_Cluster_u...
It's really annoying to keep those spams in the database, which occupy a lot of space.
I am not sure if there are any tools to remove spams in database automatically and permanently. For example, the script tool iterates over every page, and check if there is any revision in history equal to the current page, if so, remove all revisions after the matched revision?
Thanks for your suggestion in advance,
Wensong
Well, first you need to define spam - that is the most difficult part.
On 10/2/07, wensong zhang wensong.zhang@gmail.com wrote:
Hello,
Spam programs have posted spam links on our wiki for a while. Although SpamBacklist extension was installed, "php cleanup.php" was to revert the spam links. After ConfirmEdit extension was installed, spam programs are difficult to post spam automatically. However, those spam links are still in page history, and in database. For example, the following page has a lot of revision of spam links:
http://kb.linuxvirtualserver.org/wiki?title=Building_Scalable_TFTP_Cluster_u...
It's really annoying to keep those spams in the database, which occupy a lot of space.
I am not sure if there are any tools to remove spams in database automatically and permanently. For example, the script tool iterates over every page, and check if there is any revision in history equal to the current page, if so, remove all revisions after the matched revision?
Thanks for your suggestion in advance,
Wensong _______________________________________________ MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
On 10/3/07, Boris Epstein borepstein@gmail.com wrote:
Well, first you need to define spam - that is the most difficult part.
Sorry, I don't understand your comments well. In SpamBlacklist, we have defined some links are spam, so that spam links change could be reverted.
On 10/2/07, wensong zhang wensong.zhang@gmail.com wrote:
Hello,
Spam programs have posted spam links on our wiki for a while. Although SpamBacklist extension was installed, "php cleanup.php" was to revert
the
spam links. After ConfirmEdit extension was installed, spam programs are difficult to post spam automatically. However, those spam links are
still
in page history, and in database. For example, the following page has a lot of revision of spam links:
http://kb.linuxvirtualserver.org/wiki?title=Building_Scalable_TFTP_Cluster_u...
It's really annoying to keep those spams in the database, which occupy a lot of space.
I am not sure if there are any tools to remove spams in database automatically and permanently. For example, the script tool iterates
over
every page, and check if there is any revision in history equal to the current page, if so, remove all revisions after the matched revision?
Thanks for your suggestion in advance,
Wensong _______________________________________________ MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
-- Boris Epstein http://www.dogandponny.org/ http://dikayasobaka.livejournal.com/ (Russian) _______________________________________________ MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Sorry, I should have worded my comment better. What I was saying was that if you want to recognize spam on the fly that requires defining it first - and it is not an easy thing to define. If you have pre-defined links in SpamBlacklist then sure, the task becomes fairly trivial - simply that of deleting those links.
On 10/2/07, wensong zhang wensong.zhang@gmail.com wrote:
On 10/3/07, Boris Epstein borepstein@gmail.com wrote:
Well, first you need to define spam - that is the most difficult part.
Sorry, I don't understand your comments well. In SpamBlacklist, we have defined some links are spam, so that spam links change could be reverted.
On 10/2/07, wensong zhang wensong.zhang@gmail.com wrote:
Hello,
Spam programs have posted spam links on our wiki for a while. Although SpamBacklist extension was installed, "php cleanup.php" was to revert
the
spam links. After ConfirmEdit extension was installed, spam programs
are
difficult to post spam automatically. However, those spam links are
still
in page history, and in database. For example, the following page has a
lot
of revision of spam links:
http://kb.linuxvirtualserver.org/wiki?title=Building_Scalable_TFTP_Cluster_u...
It's really annoying to keep those spams in the database, which occupy
a
lot of space.
I am not sure if there are any tools to remove spams in database automatically and permanently. For example, the script tool iterates
over
every page, and check if there is any revision in history equal to the current page, if so, remove all revisions after the matched revision?
Thanks for your suggestion in advance,
Wensong _______________________________________________ MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
-- Boris Epstein http://www.dogandponny.org/ http://dikayasobaka.livejournal.com/ (Russian) _______________________________________________ MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
On 10/3/07, Boris Epstein borepstein@gmail.com wrote:
Sorry, I should have worded my comment better. What I was saying was that if you want to recognize spam on the fly that requires defining it first - and it is not an easy thing to define. If you have pre-defined links in SpamBlacklist then sure, the task becomes fairly trivial - simply that of deleting those links.
Is there any existing tools to delete those revisions? or do I need to program to remove spam revisions in database?
Thanks,
Wensong
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
wensong zhang wrote:
On 10/3/07, Boris Epstein borepstein@gmail.com wrote:
Sorry, I should have worded my comment better. What I was saying was that if you want to recognize spam on the fly that requires defining it first - and it is not an easy thing to define. If you have pre-defined links in SpamBlacklist then sure, the task becomes fairly trivial - simply that of deleting those links.
Is there any existing tools to delete those revisions? or do I need to program to remove spam revisions in database?
cleanup.php in SpamBlacklist extension.
- -- brion vibber (brion @ wikimedia.org)
Hi Brion,
I run the cleanup.php of SpamBlacklist from time to time, but it only revert the spam change. All the spam links are still in page history, see http://kb.linuxvirtualserver.org/wiki?title=Building_Scalable_TFTP_Cluster_u...
All I want is to remove spam revisions in database permanently. Is there any existing tools for this purpose?
Thanks,
Wensong
On 10/4/07, Brion Vibber brion@wikimedia.org wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
wensong zhang wrote:
On 10/3/07, Boris Epstein borepstein@gmail.com wrote:
Sorry, I should have worded my comment better. What I was saying was
that
if you want to recognize spam on the fly that requires defining it first - and it is not an easy thing to define. If you have pre-defined links in SpamBlacklist then sure, the task becomes fairly trivial - simply that
of
deleting those links.
Is there any existing tools to delete those revisions? or do I need to program to remove spam revisions in database?
cleanup.php in SpamBlacklist extension.
- -- brion vibber (brion @ wikimedia.org)
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFHBA2NwRnhpk1wk44RAojIAJsF3VBpoUAG/ZDPo1F/Q0IkBNGjKACgjzsq iyvMmZUli/Ee3xKlPKWUHHA= =F/es -----END PGP SIGNATURE-----
On 04/10/2007, wensong zhang wensong.zhang@gmail.com wrote:
All I want is to remove spam revisions in database permanently. Is there any existing tools for this purpose?
Essentially, no, since messing about with previous revisions and text records is dodgy at best and damaging in the worst case scenario.
You're better off running maintenance/storage/compressOld.php if disk space is an issue, although if it is *that much* of an issue, then you're probably better off addressing that limitation first.
Rob Church
There must be a clean way to do this. Leaving spam in the histories does relegate it to the realm of noindex nofollow, but it does mean that humans going through the histories have to see it. I'd like to see an option for sysops to expunge a specific revision. Wasn't there also some discussion of this in the context of collapsing consecutive minor revisions from the same user? Heck, as a user, I'd like to collapse my consecutive minor edits!
And could you be more specific about the damage danger? From what I can tell, MW builds the page history by timestamp, so dropping a record in the middle shouldn't affect that. There's no pointer from revision to revision, right?
Recentchanges might link to a nonexistent revision, but wouldn't rebuildrecentchanges handle that?
Permalinks to the spam revision would break, but that's not exactly a tragedy.
I'm sure I'm missing something, probably because I'm not thinking in terms of the distributed architecture used for wikipedia.
Jim
On Oct 4, 2007, at 1:14 AM, Rob Church wrote:
On 04/10/2007, wensong zhang wensong.zhang@gmail.com wrote:
All I want is to remove spam revisions in database permanently. Is there any existing tools for this purpose?
Essentially, no, since messing about with previous revisions and text records is dodgy at best and damaging in the worst case scenario.
You're better off running maintenance/storage/compressOld.php if disk space is an issue, although if it is *that much* of an issue, then you're probably better off addressing that limitation first.
Rob Church
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
===================================== Jim Hu Associate Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054
mediawiki-l@lists.wikimedia.org