On Tue, 24 Jul 2012 08:08:23 -0700, Platonides <Platonides(a)gmail.com>
wrote:
On 24/07/12 16:11, matanya wrote:
As for the last few month the spam rate stewards
deal with is raising.
Can you provide references?
What is the basis of the spam/work to do? Maybe we could make their
lives easier through creating a new tool, or better anti-spam measures.
I suggest we implement a new mechanism:
Instead of giving the user a CAPTCHA to solve, give him a image from
commons
and ask him to add a brief description in his own language.
We can give him two images, one with known description, and the other
with
unknown, after enough users translate the unknown in the same why, we
can use it as a verified translation. We base on the known image
description to
allow the user to create the account.
What if the known image isn't described the same way?
Even assuming that we provide them the English translation (so that they
know what it is, eg. it's not a "house" but the Royal Palace of XYZ!),
and that all our users understand English good enough for making a
translation. Not all translatoins will be the same.
Supose we get these different results: Fairytale graphic illustration,
House of the grandmother of Little Red Riding Hood, House of Little Red
Riding Hood granny, Picture of Perrault fairytale about Little Red
Riding Hood, Image of Little Red Riding Hood story from Grimms' Fairy
Tales.
It wouldn't be that bad to have differing _proposed descriptions_. But
not so for the check-description, when you would need to guess how it
was translated previously by other (even if all translations were fair
and accurate, with no misspellings at all).
Is it possible to embed a file from commons in
the login page? is it
possible to parse the entered text and store it?
Yes and yes (dedicating some efforts to make it happen, of course).
benefits:
A) it would be harder for bots to create automated accounts.
B) We will get translations to many languages with little effort from
the users
signing up.
What do you think?
I agree with (B) in that we would get many translations (although
probably low-quality ones).
I am not so sure about (A). If the accounts are being created by bot,
the captcha should be changed to stop it and/or new mechanisms (such as
throttles) created.
If they are handmade, I see little difference from a spammer POV. Making
up a description is harder than typing a word, but we would need to dumb
the process, so not a big difference. And in little time they would
learn how to game the system.
As for moving it forward, I think the learn from entered values should
be done in a generic way, and then the "recaptcha" proposal for helping
wikisource implemented. Your idea could be added later on (I see those
flaws, though).
Regards
I see flaws in this idea.
Firstly all images are going to need a pre-existing text we can match to
work. We can't just go purely off of what is entered into the captcha
because that means the first few users who are given that will have no
text to match against and we're going to be annoying a lot of users by
giving false negatives and rejecting edits from valid users.
Given that we will already have text out in the open this captcha will be
trivial to attack. It will be a fairly trivial job to iterate over all of
the commons images, and save a hash of the image and a description
extracted from the image page into a key/value database.
After that when the bot is served an image it would hash the image, look
up the value for the hash key in it's own database, and then use that
description bypassing the captcha.
Even if you were to try resizing and slightly distorting an image I don't
think it would help much. Image comparison is much easier than the task of
extracting text from an image. Bots are liable to have access to some
algorithm for creating fingerprints of images that will match even after
you try to make slight changes to it.
But lastly, there is a very important fact in captcha cracking you're
missing. Human aided captcha cracking already exists. No matter how hard
you make it for a computer to understand captchas there are already bots
breaking captchas by sending the captcha back to some home server, giving
the captcha to either a badly paid turk worker or tricking some person
wanting to look at porn into solving the captcha, then sending the
solution back to the bot and breaking through the captcha.
At this point "Pick the kitten" captchas will be less vulnerable than this
proposal. (And even those can be broken with a human in the mix).
--
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [