Hi everyone!
I have found myself in the following situation several times: I created a wiki for some event or small project, everything works fine and after the event or project was done - nobody have seen this wiki for several months and does nothing on it. After several months somebody needs the wiki once again and realizes that the wiki database now have 3 Gb of text spam. Suppose that there is no back-up or rollback option in a wiki hosting. So here is the question: how to
1) remove all the spam 2) delete all the spam accounts 3) reduce the database size from 3Gb to the original size
Cheers, Yury Katkov, WikiVote
Do you have a list of legitimate known good accounts?
On Fri, Aug 24, 2012 at 3:27 AM, Yury Katkov katkov.juriy@gmail.com wrote:
Hi everyone!
I have found myself in the following situation several times: I created a wiki for some event or small project, everything works fine and after the event or project was done - nobody have seen this wiki for several months and does nothing on it. After several months somebody needs the wiki once again and realizes that the wiki database now have 3 Gb of text spam. Suppose that there is no back-up or rollback option in a wiki hosting. So here is the question: how to
- remove all the spam
- delete all the spam accounts
- reduce the database size from 3Gb to the original size
Cheers, Yury Katkov, WikiVote
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Hi everyone!
I have found myself in the following situation several times: I created a wiki for some event or small project, everything works fine and after the event or project was done - nobody have seen this wiki for several months and does nothing on it. After several months somebody needs the wiki once again and realizes that the wiki database now have 3 Gb of text spam. Suppose that there is no back-up or rollback option in a wiki hosting. So here is the question: how to
- remove all the spam
- delete all the spam accounts
- reduce the database size from 3Gb to the original size
Cheers, Yury Katkov, WikiVote
Do you have a list of legitimate known good accounts?
I'm actually really interested in this too. I just deleted the databases for two copies of Mediawiki that I ran for similar reasons...
Thank you, Derric Atzrott
On 24 August 2012 09:27, Yury Katkov katkov.juriy@gmail.com wrote:
Hi everyone!
I have found myself in the following situation several times: I created a wiki for some event or small project, everything works fine and after the event or project was done - nobody have seen this wiki for several months and does nothing on it. After several months somebody needs the wiki once again and realizes that the wiki database now have 3 Gb of text spam. Suppose that there is no back-up or rollback option in a wiki hosting. So here is the question: how to
No backups, no way to roolback to a date? thats bad. You could start a wiki from scratch, copy manually from the old one whatever was good. Maybe share this task with a few selected voluntaries. Start the new one without anonymous edits, a sexy theme and a huge campaign to attract people. "No like the old wiki!, this is actually good and maintaned!". Maybe the lack of maintenance contributed to the decay. I wonder if a wiki without enough contributors is worth existing, like a garden without anyone to cut the grass.
Hi everyone!
I have found myself in the following situation several times: I created a wiki for some event or small project, everything works fine and after the event or project was done - nobody have seen this wiki for several months and does nothing on it. After several months somebody needs the wiki once again and realizes that the wiki database now have 3 Gb of text spam. Suppose that there is no back-up or rollback option in a wiki hosting. So here is the question: how to
No backups, no way to roolback to a date? thats bad. You could start a wiki from scratch, copy manually from the old one whatever was good. Maybe share this task with a few selected voluntaries. Start the new one without anonymous edits, a sexy theme and a huge campaign to attract people. "No like the old wiki!, this is actually good and maintaned!". Maybe the lack of maintenance contributed to the decay. I wonder if a wiki without enough contributors is worth existing, like a garden without anyone to cut the grass.
Certainly. If for no other reason than the historical value.
We still keep all the Wikimania wikis around.
Thank you, Derric Atzrott
Given enough facts it would be rather easy for me to write a script that nukes said spam I did something similar on http://manual.fireman.com.br/wiki/Especial:Registro/Betacommand
Hi everyone! I agree with everyone in this thread, but the main problem is that even if I create a bot of use extensions that removes pages, the actual database records won't be deleted. If I understand correctly, the MediaWiki philosophy tells us that we cannot just drop the page or an account from the database - all the deletions means only that we will hide those nasty spam pages.
Consequently after the deletions the size of my database won't shrink to original 100 Mb, it remains around 3Gb which is a problem for hosting.
The proposed solution of exporting all the pages to a brand new wiki solves this problem. Are there any other solutions where the dropping of my old spammed database does not involved? ----- Yury Katkov
On Fri, Aug 24, 2012 at 4:13 PM, John phoenixoverride@gmail.com wrote:
Given enough facts it would be rather easy for me to write a script that nukes said spam I did something similar on http://manual.fireman.com.br/wiki/Especial:Registro/Betacommand
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
What can be done after mass deleting is to purge the archive database table which should reduce the database size significantly. If you take a look at the the example where I cleaned up an existing site I reduced the database size by about 90%
On Fri, Aug 24, 2012 at 11:47 AM, Yury Katkov katkov.juriy@gmail.com wrote:
Hi everyone! I agree with everyone in this thread, but the main problem is that even if I create a bot of use extensions that removes pages, the actual database records won't be deleted. If I understand correctly, the MediaWiki philosophy tells us that we cannot just drop the page or an account from the database - all the deletions means only that we will hide those nasty spam pages.
Consequently after the deletions the size of my database won't shrink to original 100 Mb, it remains around 3Gb which is a problem for hosting.
The proposed solution of exporting all the pages to a brand new wiki solves this problem. Are there any other solutions where the dropping of my old spammed database does not involved?
Yury Katkov
On Fri, Aug 24, 2012 at 4:13 PM, John phoenixoverride@gmail.com wrote:
Given enough facts it would be rather easy for me to write a script that nukes said spam I did something similar on http://manual.fireman.com.br/wiki/Especial:Registro/Betacommand
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
http://www.mediawiki.org/wiki/Manual:Reduce_size_of_the_database and here is the manual on how to purge the archive database! Thanks John, that's a perfect solution! ----- Yury Katkov
On Fri, Aug 24, 2012 at 7:51 PM, John phoenixoverride@gmail.com wrote:
What can be done after mass deleting is to purge the archive database table which should reduce the database size significantly. If you take a look at the the example where I cleaned up an existing site I reduced the database size by about 90%
On Fri, Aug 24, 2012 at 11:47 AM, Yury Katkov katkov.juriy@gmail.com wrote:
Hi everyone! I agree with everyone in this thread, but the main problem is that even if I create a bot of use extensions that removes pages, the actual database records won't be deleted. If I understand correctly, the MediaWiki philosophy tells us that we cannot just drop the page or an account from the database - all the deletions means only that we will hide those nasty spam pages.
Consequently after the deletions the size of my database won't shrink to original 100 Mb, it remains around 3Gb which is a problem for hosting.
The proposed solution of exporting all the pages to a brand new wiki solves this problem. Are there any other solutions where the dropping of my old spammed database does not involved?
Yury Katkov
On Fri, Aug 24, 2012 at 4:13 PM, John phoenixoverride@gmail.com wrote:
Given enough facts it would be rather easy for me to write a script that nukes said spam I did something similar on http://manual.fireman.com.br/wiki/Especial:Registro/Betacommand
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Like I said if you want I can whip up a script to nuke the spam, just drop me an email off list
On Fri, Aug 24, 2012 at 11:54 AM, Yury Katkov katkov.juriy@gmail.com wrote:
http://www.mediawiki.org/wiki/Manual:Reduce_size_of_the_database and here is the manual on how to purge the archive database! Thanks John, that's a perfect solution!
Yury Katkov
On Fri, Aug 24, 2012 at 7:51 PM, John phoenixoverride@gmail.com wrote:
What can be done after mass deleting is to purge the archive database table which should reduce the database size significantly. If you take a look at the the example where I cleaned up an existing site I reduced the database size by about 90%
On Fri, Aug 24, 2012 at 11:47 AM, Yury Katkov katkov.juriy@gmail.com wrote:
Hi everyone! I agree with everyone in this thread, but the main problem is that even if I create a bot of use extensions that removes pages, the actual database records won't be deleted. If I understand correctly, the MediaWiki philosophy tells us that we cannot just drop the page or an account from the database - all the deletions means only that we will hide those nasty spam pages.
Consequently after the deletions the size of my database won't shrink to original 100 Mb, it remains around 3Gb which is a problem for hosting.
The proposed solution of exporting all the pages to a brand new wiki solves this problem. Are there any other solutions where the dropping of my old spammed database does not involved?
Yury Katkov
On Fri, Aug 24, 2012 at 4:13 PM, John phoenixoverride@gmail.com wrote:
Given enough facts it would be rather easy for me to write a script that nukes said spam I did something similar on http://manual.fireman.com.br/wiki/Especial:Registro/Betacommand
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Speaking of scripts, it would be cool if someone would polish this set of anti-spam scripts a little bit and see if it's worth advertising more:
https://www.noisebridge.net/wiki/Secretaribot https://github.com/dannyob/secretaribot
Hi John, thanks! Take your time! If you already have such a script, and can share it - please do! But if not - I think it will be a good exercise in pywikipediabot or extension development for me. ----- Yury Katkov
On Fri, Aug 24, 2012 at 7:55 PM, John phoenixoverride@gmail.com wrote:
Like I said if you want I can whip up a script to nuke the spam, just drop me an email off list
On Fri, Aug 24, 2012 at 11:54 AM, Yury Katkov katkov.juriy@gmail.com wrote:
http://www.mediawiki.org/wiki/Manual:Reduce_size_of_the_database and here is the manual on how to purge the archive database! Thanks John, that's a perfect solution!
Yury Katkov
On Fri, Aug 24, 2012 at 7:51 PM, John phoenixoverride@gmail.com wrote:
What can be done after mass deleting is to purge the archive database table which should reduce the database size significantly. If you take a look at the the example where I cleaned up an existing site I reduced the database size by about 90%
On Fri, Aug 24, 2012 at 11:47 AM, Yury Katkov katkov.juriy@gmail.com wrote:
Hi everyone! I agree with everyone in this thread, but the main problem is that even if I create a bot of use extensions that removes pages, the actual database records won't be deleted. If I understand correctly, the MediaWiki philosophy tells us that we cannot just drop the page or an account from the database - all the deletions means only that we will hide those nasty spam pages.
Consequently after the deletions the size of my database won't shrink to original 100 Mb, it remains around 3Gb which is a problem for hosting.
The proposed solution of exporting all the pages to a brand new wiki solves this problem. Are there any other solutions where the dropping of my old spammed database does not involved?
Yury Katkov
On Fri, Aug 24, 2012 at 4:13 PM, John phoenixoverride@gmail.com wrote:
Given enough facts it would be rather easy for me to write a script that nukes said spam I did something similar on http://manual.fireman.com.br/wiki/Especial:Registro/Betacommand
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Its rather easy to write in pywiki I just need some information from you about your wiki. (IE are all edits after X date bad, we only have Y valid users and here are their names) exc stuff like that allows me to tailor the script to your needs.
On Fri, Aug 24, 2012 at 12:03 PM, Yury Katkov katkov.juriy@gmail.com wrote:
Hi John, thanks! Take your time! If you already have such a script, and can share it - please do! But if not - I think it will be a good exercise in pywikipediabot or extension development for me.
Yury Katkov
On Fri, Aug 24, 2012 at 7:55 PM, John phoenixoverride@gmail.com wrote:
Like I said if you want I can whip up a script to nuke the spam, just drop me an email off list
On Fri, Aug 24, 2012 at 11:54 AM, Yury Katkov katkov.juriy@gmail.com wrote:
http://www.mediawiki.org/wiki/Manual:Reduce_size_of_the_database and here is the manual on how to purge the archive database! Thanks John, that's a perfect solution!
Yury Katkov
On Fri, Aug 24, 2012 at 7:51 PM, John phoenixoverride@gmail.com wrote:
What can be done after mass deleting is to purge the archive database table which should reduce the database size significantly. If you take a look at the the example where I cleaned up an existing site I reduced the database size by about 90%
On Fri, Aug 24, 2012 at 11:47 AM, Yury Katkov katkov.juriy@gmail.com wrote:
Hi everyone! I agree with everyone in this thread, but the main problem is that even if I create a bot of use extensions that removes pages, the actual database records won't be deleted. If I understand correctly, the MediaWiki philosophy tells us that we cannot just drop the page or an account from the database - all the deletions means only that we will hide those nasty spam pages.
Consequently after the deletions the size of my database won't shrink to original 100 Mb, it remains around 3Gb which is a problem for hosting.
The proposed solution of exporting all the pages to a brand new wiki solves this problem. Are there any other solutions where the dropping of my old spammed database does not involved?
Yury Katkov
On Fri, Aug 24, 2012 at 4:13 PM, John phoenixoverride@gmail.com wrote:
Given enough facts it would be rather easy for me to write a script that nukes said spam I did something similar on http://manual.fireman.com.br/wiki/Especial:Registro/Betacommand
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
I think that we have the date after which there was only spam. ----- Yury Katkov
On Fri, Aug 24, 2012 at 8:07 PM, John phoenixoverride@gmail.com wrote:
Its rather easy to write in pywiki I just need some information from you about your wiki. (IE are all edits after X date bad, we only have Y valid users and here are their names) exc stuff like that allows me to tailor the script to your needs.
On Fri, Aug 24, 2012 at 12:03 PM, Yury Katkov katkov.juriy@gmail.com wrote:
Hi John, thanks! Take your time! If you already have such a script, and can share it - please do! But if not - I think it will be a good exercise in pywikipediabot or extension development for me.
Yury Katkov
On Fri, Aug 24, 2012 at 7:55 PM, John phoenixoverride@gmail.com wrote:
Like I said if you want I can whip up a script to nuke the spam, just drop me an email off list
On Fri, Aug 24, 2012 at 11:54 AM, Yury Katkov katkov.juriy@gmail.com wrote:
http://www.mediawiki.org/wiki/Manual:Reduce_size_of_the_database and here is the manual on how to purge the archive database! Thanks John, that's a perfect solution!
Yury Katkov
On Fri, Aug 24, 2012 at 7:51 PM, John phoenixoverride@gmail.com wrote:
What can be done after mass deleting is to purge the archive database table which should reduce the database size significantly. If you take a look at the the example where I cleaned up an existing site I reduced the database size by about 90%
On Fri, Aug 24, 2012 at 11:47 AM, Yury Katkov katkov.juriy@gmail.com wrote:
Hi everyone! I agree with everyone in this thread, but the main problem is that even if I create a bot of use extensions that removes pages, the actual database records won't be deleted. If I understand correctly, the MediaWiki philosophy tells us that we cannot just drop the page or an account from the database - all the deletions means only that we will hide those nasty spam pages.
Consequently after the deletions the size of my database won't shrink to original 100 Mb, it remains around 3Gb which is a problem for hosting.
The proposed solution of exporting all the pages to a brand new wiki solves this problem. Are there any other solutions where the dropping of my old spammed database does not involved?
Yury Katkov
On Fri, Aug 24, 2012 at 4:13 PM, John phoenixoverride@gmail.com wrote: > Given enough facts it would be rather easy for me to write a script > that nukes said spam I did something similar on > http://manual.fireman.com.br/wiki/Especial:Registro/Betacommand > > _______________________________________________ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Can I get a link to your site? I would love to take a look and write you that script, (I always love a challenge)
On Fri, Aug 24, 2012 at 12:10 PM, Yury Katkov katkov.juriy@gmail.com wrote:
I think that we have the date after which there was only spam.
Yury Katkov
On Fri, Aug 24, 2012 at 8:07 PM, John phoenixoverride@gmail.com wrote:
Its rather easy to write in pywiki I just need some information from you about your wiki. (IE are all edits after X date bad, we only have Y valid users and here are their names) exc stuff like that allows me to tailor the script to your needs.
On Fri, Aug 24, 2012 at 12:03 PM, Yury Katkov katkov.juriy@gmail.com wrote:
Hi John, thanks! Take your time! If you already have such a script, and can share it - please do! But if not - I think it will be a good exercise in pywikipediabot or extension development for me.
Yury Katkov
On Fri, Aug 24, 2012 at 7:55 PM, John phoenixoverride@gmail.com wrote:
Like I said if you want I can whip up a script to nuke the spam, just drop me an email off list
On Fri, Aug 24, 2012 at 11:54 AM, Yury Katkov katkov.juriy@gmail.com wrote:
http://www.mediawiki.org/wiki/Manual:Reduce_size_of_the_database and here is the manual on how to purge the archive database! Thanks John, that's a perfect solution!
Yury Katkov
On Fri, Aug 24, 2012 at 7:51 PM, John phoenixoverride@gmail.com wrote:
What can be done after mass deleting is to purge the archive database table which should reduce the database size significantly. If you take a look at the the example where I cleaned up an existing site I reduced the database size by about 90%
On Fri, Aug 24, 2012 at 11:47 AM, Yury Katkov katkov.juriy@gmail.com wrote: > Hi everyone! I agree with everyone in this thread, but the main > problem is that even if I create a bot of use extensions that removes > pages, the actual database records won't be deleted. If I understand > correctly, the MediaWiki philosophy tells us that we cannot just drop > the page or an account from the database - all the deletions means > only that we will hide those nasty spam pages. > > Consequently after the deletions the size of my database won't shrink > to original 100 Mb, it remains around 3Gb which is a problem for > hosting. > > The proposed solution of exporting all the pages to a brand new wiki > solves this problem. Are there any other solutions where the dropping > of my old spammed database does not involved? > ----- > Yury Katkov > > > > On Fri, Aug 24, 2012 at 4:13 PM, John phoenixoverride@gmail.com wrote: >> Given enough facts it would be rather easy for me to write a script >> that nukes said spam I did something similar on >> http://manual.fireman.com.br/wiki/Especial:Registro/Betacommand >> >> _______________________________________________ >> Wikitech-l mailing list >> Wikitech-l@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l > > _______________________________________________ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Its rather easy to write in pywiki I just need some information from you about your wiki. (IE are all edits after X date bad, we only have Y valid users and here are their names) exc stuff like that allows me to tailor the script to your needs.
Can I get a link to your site? I would love to take a look and write you that script, (I always love a challenge)
If you make your script have some sort of configuration variables or something along those lines for these different things, then you could release it and many people could be helped by it.
If you do decide to release it. I would cross post to the mailing list for Mediawiki administrators as well. I'm sure someone on there could use it.
Thank you, Derric Atzrott
Ive got a script but would like to test it before I make it public. If someone has a site with spam and would let me test it, it would be appreciated
On Fri, Aug 24, 2012 at 12:20 PM, Derric Atzrott datzrott@alizeepathology.com wrote:
Its rather easy to write in pywiki I just need some information from you about your wiki. (IE are all edits after X date bad, we only have Y valid users and here are their names) exc stuff like that allows me to tailor the script to your needs.
Can I get a link to your site? I would love to take a look and write you that script, (I always love a challenge)
If you make your script have some sort of configuration variables or something along those lines for these different things, then you could release it and many people could be helped by it.
If you do decide to release it. I would cross post to the mailing list for Mediawiki administrators as well. I'm sure someone on there could use it.
Thank you, Derric Atzrott
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
I do! http://wiki.sittv.com has been building up spam for a number of months (or longer).
*--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com
On Fri, Aug 24, 2012 at 12:52 PM, John phoenixoverride@gmail.com wrote:
Ive got a script but would like to test it before I make it public. If someone has a site with spam and would let me test it, it would be appreciated
On Fri, Aug 24, 2012 at 12:20 PM, Derric Atzrott datzrott@alizeepathology.com wrote:
Its rather easy to write in pywiki I just need some information from you about your wiki. (IE are all edits after X date bad, we only have Y valid users and here are their names) exc stuff like that allows me to tailor the script to your needs.
Can I get a link to your site? I would love to take a look and write you that script, (I always love a challenge)
If you make your script have some sort of configuration variables or
something
along those lines for these different things, then you could release it
and
many people could be helped by it.
If you do decide to release it. I would cross post to the mailing list
for
Mediawiki administrators as well. I'm sure someone on there could use
it.
Thank you, Derric Atzrott
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Tyler, how are the results? John, can you upload it on some repository? Google code, github?
P.S. Sorry for that super-late response, I appreciate your effort to help! ----- Yury Katkov
On Fri, Aug 24, 2012 at 8:56 PM, Tyler Romeo tylerromeo@gmail.com wrote:
I do! http://wiki.sittv.com has been building up spam for a number of months (or longer).
*--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com
On Fri, Aug 24, 2012 at 12:52 PM, John phoenixoverride@gmail.com wrote:
Ive got a script but would like to test it before I make it public. If someone has a site with spam and would let me test it, it would be appreciated
On Fri, Aug 24, 2012 at 12:20 PM, Derric Atzrott datzrott@alizeepathology.com wrote:
Its rather easy to write in pywiki I just need some information from you about your wiki. (IE are all edits after X date bad, we only have Y valid users and here are their names) exc stuff like that allows me to tailor the script to your needs.
Can I get a link to your site? I would love to take a look and write you that script, (I always love a challenge)
If you make your script have some sort of configuration variables or
something
along those lines for these different things, then you could release it
and
many people could be helped by it.
If you do decide to release it. I would cross post to the mailing list
for
Mediawiki administrators as well. I'm sure someone on there could use
it.
Thank you, Derric Atzrott
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
His wiki is clean, Ive found that the scripts require tweaking for each wiki
On Sat, Oct 6, 2012 at 9:21 AM, Yury Katkov katkov.juriy@gmail.com wrote:
Tyler, how are the results? John, can you upload it on some repository? Google code, github?
P.S. Sorry for that super-late response, I appreciate your effort to help!
Yury Katkov
On Fri, Aug 24, 2012 at 8:56 PM, Tyler Romeo tylerromeo@gmail.com wrote:
I do! http://wiki.sittv.com has been building up spam for a number of months (or longer).
*--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com
On Fri, Aug 24, 2012 at 12:52 PM, John phoenixoverride@gmail.com wrote:
Ive got a script but would like to test it before I make it public. If someone has a site with spam and would let me test it, it would be appreciated
On Fri, Aug 24, 2012 at 12:20 PM, Derric Atzrott datzrott@alizeepathology.com wrote:
Its rather easy to write in pywiki I just need some information from you about your wiki. (IE are all edits after X date bad, we only have Y valid users and here are their names) exc stuff like that allows me to tailor the script to your needs.
Can I get a link to your site? I would love to take a look and write you that script, (I always love a challenge)
If you make your script have some sort of configuration variables or
something
along those lines for these different things, then you could release it
and
many people could be helped by it.
If you do decide to release it. I would cross post to the mailing list
for
Mediawiki administrators as well. I'm sure someone on there could use
it.
Thank you, Derric Atzrott
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
I think that it's possible to make the tool more generic. Can you put the scripts on a repository? I'm giving a talk of fighting spam on Semantic MediaWiki conference and would be glad to include your bot as an example of solution of this common problem. ----- Yury Katkov
On Sat, Oct 6, 2012 at 5:42 PM, John phoenixoverride@gmail.com wrote:
His wiki is clean, Ive found that the scripts require tweaking for each wiki
On Sat, Oct 6, 2012 at 9:21 AM, Yury Katkov katkov.juriy@gmail.com wrote:
Tyler, how are the results? John, can you upload it on some repository? Google code, github?
P.S. Sorry for that super-late response, I appreciate your effort to help!
Yury Katkov
On Fri, Aug 24, 2012 at 8:56 PM, Tyler Romeo tylerromeo@gmail.com wrote:
I do! http://wiki.sittv.com has been building up spam for a number of months (or longer).
*--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com
On Fri, Aug 24, 2012 at 12:52 PM, John phoenixoverride@gmail.com wrote:
Ive got a script but would like to test it before I make it public. If someone has a site with spam and would let me test it, it would be appreciated
On Fri, Aug 24, 2012 at 12:20 PM, Derric Atzrott datzrott@alizeepathology.com wrote:
Its rather easy to write in pywiki I just need some information from you about your wiki. (IE are all edits after X date bad, we only have Y valid users and here are their names) exc stuff like that allows me to tailor the script to your needs.
Can I get a link to your site? I would love to take a look and write you that script, (I always love a challenge)
If you make your script have some sort of configuration variables or
something
along those lines for these different things, then you could release it
and
many people could be helped by it.
If you do decide to release it. I would cross post to the mailing list
for
Mediawiki administrators as well. I'm sure someone on there could use
it.
Thank you, Derric Atzrott
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Indeed. The script eliminated some tens of thousands of spam pages among only ~400 actual content pages. It was not perfect (there were still a few pages that had spam on them), but it definitely worked amazingly and did not have any false positives that I am aware of.
*--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com
On Sat, Oct 6, 2012 at 9:51 AM, Yury Katkov katkov.juriy@gmail.com wrote:
I think that it's possible to make the tool more generic. Can you put the scripts on a repository? I'm giving a talk of fighting spam on Semantic MediaWiki conference and would be glad to include your bot as an example of solution of this common problem.
Yury Katkov
On Sat, Oct 6, 2012 at 5:42 PM, John phoenixoverride@gmail.com wrote:
His wiki is clean, Ive found that the scripts require tweaking for each
wiki
On Sat, Oct 6, 2012 at 9:21 AM, Yury Katkov katkov.juriy@gmail.com
wrote:
Tyler, how are the results? John, can you upload it on some repository? Google code, github?
P.S. Sorry for that super-late response, I appreciate your effort to
help!
Yury Katkov
On Fri, Aug 24, 2012 at 8:56 PM, Tyler Romeo tylerromeo@gmail.com
wrote:
I do! http://wiki.sittv.com has been building up spam for a number of months (or longer).
*--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com
On Fri, Aug 24, 2012 at 12:52 PM, John phoenixoverride@gmail.com
wrote:
Ive got a script but would like to test it before I make it public. If someone has a site with spam and would let me test it, it would be appreciated
On Fri, Aug 24, 2012 at 12:20 PM, Derric Atzrott datzrott@alizeepathology.com wrote:
>Its rather easy to write in pywiki I just need some information from >you about your wiki. (IE are all edits after X date bad, we only
have
>Y valid users and here are their names) exc stuff like that allows
me
>to tailor the script to your needs. > >Can I get a link to your site? I would love to take a look and write >you that script, (I always love a challenge)
If you make your script have some sort of configuration variables or
something
along those lines for these different things, then you could
release it
and
many people could be helped by it.
If you do decide to release it. I would cross post to the mailing
list
for
Mediawiki administrators as well. I'm sure someone on there could
use
it.
Thank you, Derric Atzrott
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Be aware that by default InnoDB uses a file called ibdata1 to do all of it's data storage. When you remove data from the database InnoDB does not shrink ibdata1 down. So even if you reduce your 3GB database down <1GB and you have room for
2GB of content to be added before ibdata1 grows again.
The actual size on disk that your database takes up will likely remain at 3GB.
So if you really want to reduce on-disk size exporting and re-importing at least your raw database at some point becomes necessary since InnoDB will never give you that disk space back.
On Fri, 24 Aug 2012 08:54:26 -0700, Yury Katkov katkov.juriy@gmail.com wrote:
http://www.mediawiki.org/wiki/Manual:Reduce_size_of_the_database and here is the manual on how to purge the archive database! Thanks John, that's a perfect solution!
Yury Katkov
On Fri, Aug 24, 2012 at 7:51 PM, John phoenixoverride@gmail.com wrote:
What can be done after mass deleting is to purge the archive database table which should reduce the database size significantly. If you take a look at the the example where I cleaned up an existing site I reduced the database size by about 90%
On Fri, Aug 24, 2012 at 11:47 AM, Yury Katkov katkov.juriy@gmail.com wrote:
Hi everyone! I agree with everyone in this thread, but the main problem is that even if I create a bot of use extensions that removes pages, the actual database records won't be deleted. If I understand correctly, the MediaWiki philosophy tells us that we cannot just drop the page or an account from the database - all the deletions means only that we will hide those nasty spam pages.
Consequently after the deletions the size of my database won't shrink to original 100 Mb, it remains around 3Gb which is a problem for hosting.
The proposed solution of exporting all the pages to a brand new wiki solves this problem. Are there any other solutions where the dropping of my old spammed database does not involved?
Yury Katkov
On Fri, Aug 24, 2012 at 4:13 PM, John phoenixoverride@gmail.com wrote:
Given enough facts it would be rather easy for me to write a script that nukes said spam I did something similar on http://manual.fireman.com.br/wiki/Especial:Registro/Betacommand
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Technically speaking, pages and accounts can be permanently deleted. (There is an extension for it I believe.) However, since MediaWiki does not use foreign keys, you have to be careful not to break things in the process.
*--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com
On Fri, Aug 24, 2012 at 11:47 AM, Yury Katkov katkov.juriy@gmail.comwrote:
Hi everyone! I agree with everyone in this thread, but the main problem is that even if I create a bot of use extensions that removes pages, the actual database records won't be deleted. If I understand correctly, the MediaWiki philosophy tells us that we cannot just drop the page or an account from the database - all the deletions means only that we will hide those nasty spam pages.
Consequently after the deletions the size of my database won't shrink to original 100 Mb, it remains around 3Gb which is a problem for hosting.
The proposed solution of exporting all the pages to a brand new wiki solves this problem. Are there any other solutions where the dropping of my old spammed database does not involved?
Yury Katkov
On Fri, Aug 24, 2012 at 4:13 PM, John phoenixoverride@gmail.com wrote:
Given enough facts it would be rather easy for me to write a script that nukes said spam I did something similar on http://manual.fireman.com.br/wiki/Especial:Registro/Betacommand
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Export everything, filter out revisions newer than the spam start, import in a new db.
wikitech-l@lists.wikimedia.org