Seems like the spammers have found the web equivalent of an smtp open relay.
For example: [http://wiki.cs.uiuc.edu/VisualWorks/DOWNLOAD/sb/index.htm sitz bath] [http://www.buddy4u.com/view/?u=monophonic+ringtone monophonic ringtone] [http://www.buddyprofile.com/viewprofile.php?username=nextelringtone nextel ringtone]
These are links to legitimate sites that perform poor input validation... The spammers have managed to convert the pages into http redirects.
Because of how the various search engines work a link to a redirect page is just as good as a link to the redirect target.
Since the spammers can make an infinite number of unique URLs at these sites, blocking the exact URL is pointless. So right now our only choices are to block legitimate sites because their poor hygiene allows them to be used as a spam-bouncer, or allow ourselves to be spammed with these sites and contribute to the declining usefulness of the internet.
Things like this make nofollow more attractive all the time. Has there ever been any discussion on perhaps allowing a white-list for non-spam sites that we won't no-follow? This would be useful for wikis who don't want to kill all their externals with no-follow.
Gregory Maxwell wrote:
Things like this make nofollow more attractive all the time. Has there ever been any discussion on perhaps allowing a white-list for non-spam sites that we won't no-follow? This would be useful for wikis who don't want to kill all their externals with no-follow.
Or at least a second blacklist for sites that can be linked to but will have rel="nofollow" applied.
Also, shouldn't we enable nofollow for talk pages? For example, if this discussion had been on a talk page, those example links above would now be indexed by search engines and thus help the spammers.
On 4/10/06, Ilmari Karonen nospam@vyznev.net wrote:
Gregory Maxwell wrote:
Things like this make nofollow more attractive all the time. Has there ever been any discussion on perhaps allowing a white-list for non-spam sites that we won't no-follow? This would be useful for wikis who don't want to kill all their externals with no-follow.
Or at least a second blacklist for sites that can be linked to but will have rel="nofollow" applied.
Also, shouldn't we enable nofollow for talk pages? For example, if this discussion had been on a talk page, those example links above would now be indexed by search engines and thus help the spammers.
I'm not sure how to implement black or whitelisting for nofollow... :( Right now we do our URL blacklist at page submission, thats not a fast path so we can do computationally expensive things like apply a long list of regexes... to black or whitelist URLs for nofollow we'd need to perform it at page load, which might not be acceptable. The only alternatives I can see involve complex changes. For example, we could extend our external link syntax so that you must explicitly tag links in order to prevent them from being no-followed, and the black or whitelist would control which links are allowed to be tagged.
Can a developer please comment on the idea of applying a external links filter at page load time?
Nofollow per namespace would actually be really good. If we had per-namespace nofollow right now I'd enable it everywhere except the main namespace ... you can argue that the main ns links are reviewed and are of some quality, but that doesn't extend to anywhere else.
On 4/10/06, Gregory Maxwell gmaxwell@gmail.com wrote:
I'm not sure how to implement black or whitelisting for nofollow... :( Right now we do our URL blacklist at page submission, thats not a fast path so we can do computationally expensive things like apply a long list of regexes... to black or whitelist URLs for nofollow we'd need to perform it at page load, which might not be acceptable. The only alternatives I can see involve complex changes. For example, we could extend our external link syntax so that you must explicitly tag links in order to prevent them from being no-followed, and the black or whitelist would control which links are allowed to be tagged.
I was thinking something similar once. * Deny all links that are not wrapped in a {{url}} template. * Deny {{url}} template to anons. (contentious) Perhaps allow white-listed anons? Or allow some other template which generates a comment rather than an actual link? * Add "nofollow" to all links that are not wrapped in {{urlfollow}} * Deny {{urlfollow}} to everyone except confirmed users (reg + 10 days or something)
Though there's something I don't like about people deciding on a case by case basis about whether to use nofollow or not.
Perhaps, alternatively, devise a mechanism whereby URLs automatically become "follow" links after some time period, like 5 days.
Steve
On 4/10/06, Steve Bennett stevage@gmail.com wrote:
I was thinking something similar once.
- Deny all links that are not wrapped in a {{url}} template.
- Deny {{url}} template to anons. (contentious) Perhaps allow
white-listed anons? Or allow some other template which generates a comment rather than an actual link?
- Add "nofollow" to all links that are not wrapped in {{urlfollow}}
- Deny {{urlfollow}} to everyone except confirmed users (reg + 10 days
or something)
FWIW, Anons are not a real spam liability.. anyone competent enough to be a significant annoyance is able to make an account.
The template solution you've proposed is effectively an extension to the wikitext syntax... I don't see a reason to overload the template system for that purpose... just extend the syntax if thats what we really want.
Though there's something I don't like about people deciding on a case by case basis about whether to use nofollow or not.
Perhaps, alternatively, devise a mechanism whereby URLs automatically become "follow" links after some time period, like 5 days.
"And then a miracle occurs"... Making that scale will be a challenge. ...and still doesn't cover the cases of spam on namespaces that get less attention.
On 4/10/06, Gregory Maxwell gmaxwell@gmail.com wrote:
FWIW, Anons are not a real spam liability.. anyone competent enough to be a significant annoyance is able to make an account.
Ok, replace "anon" with some risk-based metric.
The template solution you've proposed is effectively an extension to the wikitext syntax... I don't see a reason to overload the template system for that purpose... just extend the syntax if thats what we really want.
Is it? I'm seeing how much we can do without having to get a developer to do anything :)
Perhaps, alternatively, devise a mechanism whereby URLs automatically become "follow" links after some time period, like 5 days.
"And then a miracle occurs"... Making that scale will be a challenge. ...and still doesn't cover the cases of spam on namespaces that get less attention.
True.
Steve
Gregory Maxwell wrote:
On 4/10/06, Ilmari Karonen nospam@vyznev.net wrote:
Also, shouldn't we enable nofollow for talk pages? For example, if this discussion had been on a talk page, those example links above would now be indexed by search engines and thus help the spammers.
Nofollow per namespace would actually be really good. If we had per-namespace nofollow right now I'd enable it everywhere except the main namespace ... you can argue that the main ns links are reviewed and are of some quality, but that doesn't extend to anywhere else.
Ask and ye shall receive: http://bugzilla.wikimedia.org/show_bug.cgi?id=5523
On 4/10/06, Ilmari Karonen nospam@vyznev.net wrote:
Gregory Maxwell wrote:
On 4/10/06, Ilmari Karonen nospam@vyznev.net wrote:
Also, shouldn't we enable nofollow for talk pages? For example, if this discussion had been on a talk page, those example links above would now be indexed by search engines and thus help the spammers.
Nofollow per namespace would actually be really good. If we had per-namespace nofollow right now I'd enable it everywhere except the main namespace ... you can argue that the main ns links are reviewed and are of some quality, but that doesn't extend to anywhere else.
Ask and ye shall receive: http://bugzilla.wikimedia.org/show_bug.cgi?id=5523
The patch seems obviously correct. I've tested it here with $wgNoFollowNsExceptions = array(0); and it works just fine.
It would be great to see this make it into the main branch, it would remove some of the incentive for things like: http://en.wikipedia.org/w/index.php?title=Talk:Timpani&diff=22299267&... (which wasn't found and removed for over a month)
From the angle of being a good netizen this would appear to be an
ideal comprimise.
Gregory Maxwell <gmaxwell@...> writes:
Right now we do our URL blacklist at page submission, thats not a fast path so we can do computationally expensive things like apply a long list of regexes... to black or whitelist URLs for nofollow we'd need to perform it at page load, which might not be acceptable. The only alternatives I can see involve complex changes.
Well, I'm a developer wanna-be and not a real developer :-) but if we set aside for a moment the policy type questions about whitelisting and nofollow, whitelisting does not need to be the same process as blacklisting. For blacklisting, we need fancy regexes because people will try to get around the list - for whitelist, we could probably define a much simpler validation (to be applied when the parser parses the url) because if someone is posting an url that is whitelisted, they will know what url they're trying to match. There's still an issue of scale - checking against thousands or tens of thousand of whitelisted urls may not be feasible - but calling "in_array" for example is probably *much* faster than bunches of regexes.
-Aerik
On 4/10/06, Aerik aerik@thesylvans.com wrote:
Well, I'm a developer wanna-be and not a real developer :-) but if we set aside for a moment the policy type questions about whitelisting and nofollow, whitelisting does not need to be the same process as blacklisting. For blacklisting, we need fancy regexes because people will try to get around the list - for whitelist, we could probably define a much simpler validation (to be applied when the parser parses the url) because if someone is posting an url that is whitelisted, they will know what url they're trying to match. There's still an issue of scale - checking against thousands or tens of thousand of whitelisted urls may not be feasible - but calling "in_array" for example is probably *much* faster than bunches of regexes.
Right right.. There are all sorts of ways to make set membership testing fast... esp if "probably a member" is good enough (which it is for us).
Really then, whitelisting becomes the same as the redlink/bluelink challenge.. There are 5.9 million external links currently in enwiki vs ~3.8million pages. So it's obviously not intractable.
Still, we come back to the question... it it worth it? Why not just set nofollow on all links and save our spamfighting resources to deal with the people who aren't just concerned with SEO?
wikitech-l@lists.wikimedia.org