Following the mediawiki-l discussionhttp://lists.wikimedia.org/pipermail/mediawiki-l/2013-November/042038.htmlabout $wgNoFollowLinks and various other discussions, in which some discontent was expressed with the current two options of either applying or not applying nofollow to all external links, I wanted to see what support there might be for applying nofollow only to external links added in revisions that are still unpatrolled (bug 42599https://bugzilla.wikimedia.org/show_bug.cgi?id=42599 ).
How common do you think it would be for a use case to arise in which one could be confident that a revision's being patrolled means that the external links added in that revision have been adequately reviewed for spamminess? Nemo had mentioned "sysadmins would be interested in this only if their wiki has a strict definition of what's patrollable which matches the assumptions here." In my experience, spam is pretty easy to spot because the bots aren't very subtle about it.
I would think that if someone went around marking such obviously spammy edits as patrolled, that if there were any bureaucrats around who cared about keeping spam off the wiki, his patrol rights would end up getting taken away. Spam is a form of vandalism, so it would fall under the duties of patrollers. At Wikipedia, RecentChanges patrollers are expected to be on the lookout for spam. https://en.wikipedia.org/wiki/Wikipedia:Recent_changes_patrol#Spam
On 11/17/2013 06:41 AM, Nathan Larson wrote:
I wanted to see what support there might be for applying nofollow only to external links added in revisions that are still unpatrolled (bug 42599)
I think I could support this.
After that wiki-spamer site last week, I went looking for various forums and such where these things are discussed and saw that they (wiki-spammers) don't seem to see nofollow as a real impediment to their work.
So, I won't say we should just drop nofollow, but it obviously doesn't put much in the way of spammers.
Mark.
On 11/17/2013 06:41 AM, Nathan Larson wrote:
I wanted to see what support there might be for applying nofollow only to external links added in revisions that are still unpatrolled (bug 42599)
I think I could support this.
After that wiki-spamer site last week, I went looking for various forums and such where these things are discussed and saw that they (wiki-spammers) don't seem to see nofollow as a real impediment to their work.
So, I won't say we should just drop nofollow, but it obviously doesn't put much in the way of spammers.
Mark.
On 17 November 2013 11:41, Nathan Larson nathanlarson3141@gmail.com wrote:
In my experience, spam is pretty easy to spot because the bots aren't very subtle about it.
I'm sure spam directed at, say, enwiki, would get very subtle very quickly if spammers thought there was a real chance of it being able to use enwiki's pagerank weight. Don't underestimate spammers' ability to learn and adapt.
--HM
On Mon, Nov 18, 2013 at 3:09 PM, Happy Melon happy.melon.wiki@gmail.comwrote:
I'm sure spam directed at, say, enwiki, would get very subtle very quickly if spammers thought there was a real chance of it being able to use enwiki's pagerank weight. Don't underestimate spammers' ability to learn and adapt.
+1. I think this would be a very bad idea. If we opened up external links to Google, I'm sure it would only be a matter of time before spammers started figuring out how to get revision reviewing rights. It's just a question of economics. Which is cheaper: Paying an SEO company $5000 to improve your pagerank or paying a Wiki-PR editor $100 to do the same (and probably more effectively).
Ryan Kaldari
On 11/18/2013 04:39 AM, Happy Melon wrote:
I'm sure spam directed at, say, enwiki, would get very subtle very quickly if spammers thought there was a real chance of it being able to use enwiki's pagerank weight. Don't underestimate spammers' ability to learn and adapt.
Also +1; pagerank is a valuable thing and Wikipedia has lots of it. Spammers would be quick to find ways to cheat, lie and manipulate their way into tapping into it.
Right now, we are plagued with the spammers that are too desperate or stupid to care; if we turned nofollow off, they would all descend upon us like a plague of locusts.
-- Marc
I agree. While spammers are so pathetic they will do anything for page views, I have to admire (and detest) their ability to adapt in order to spread their nonsense.
Anything that slows them down, even in the slightest degree, is something I support and recommend.
Date: Mon, 18 Nov 2013 09:24:54 -0500 From: marc@uberbox.org To: wikitech-l@lists.wikimedia.org Subject: Re: [Wikitech-l] Applying nofollow only to external links added in revisions that are still unpatrolled
On 11/18/2013 04:39 AM, Happy Melon wrote:
I'm sure spam directed at, say, enwiki, would get very subtle very quickly if spammers thought there was a real chance of it being able to use enwiki's pagerank weight. Don't underestimate spammers' ability to learn and adapt.
Also +1; pagerank is a valuable thing and Wikipedia has lots of it. Spammers would be quick to find ways to cheat, lie and manipulate their way into tapping into it.
Right now, we are plagued with the spammers that are too desperate or stupid to care; if we turned nofollow off, they would all descend upon us like a plague of locusts.
-- Marc
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
I also agree.
Perhaps more importantly, I don't see any actual argument for *not* using nofollow. We're not here to drive pagerank for other websites, and our doing so can be harmful to those sites, or to the article subject.
Risker
On 18 November 2013 09:44, Arcane 21 arcane@live.com wrote:
I agree. While spammers are so pathetic they will do anything for page views, I have to admire (and detest) their ability to adapt in order to spread their nonsense.
Anything that slows them down, even in the slightest degree, is something I support and recommend.
Date: Mon, 18 Nov 2013 09:24:54 -0500 From: marc@uberbox.org To: wikitech-l@lists.wikimedia.org Subject: Re: [Wikitech-l] Applying nofollow only to external links added
in revisions that are still unpatrolled
On 11/18/2013 04:39 AM, Happy Melon wrote:
I'm sure spam directed at, say, enwiki, would get very subtle very
quickly
if spammers thought there was a real chance of it being able to use enwiki's pagerank weight. Don't underestimate spammers' ability to
learn
and adapt.
Also +1; pagerank is a valuable thing and Wikipedia has lots of it. Spammers would be quick to find ways to cheat, lie and manipulate their way into tapping into it.
Right now, we are plagued with the spammers that are too desperate or stupid to care; if we turned nofollow off, they would all descend upon us like a plague of locusts.
-- Marc
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Mon, Nov 18, 2013 at 11:21 AM, Risker risker.wp@gmail.com wrote:
Perhaps more importantly, I don't see any actual argument for *not* using nofollow. We're not here to drive pagerank for other websites, and our doing so can be harmful to those sites, or to the article subject.
Wikipedia's purpose may not be to drive PageRank, but nonetheless I think the argument for using nofollow is pretty clear. Why would Wikipedia want to purposely make search engine results less useful? The question here is whether spammers are smart enough to get around us and boost their PageRank artifically, which seems to be the case.
*-- * *Tyler Romeo* Stevens Institute of Technology, Class of 2016 Major in Computer Science
To aggregate some of the arguments and counter-arguments, I posted https://www.mediawiki.org/wiki/The_dofollow_FAQ and https://www.mediawiki.org/wiki/Manual:Costs_and_benefits_of_using_nofollow. It does seem, from my googling of what the owners of smaller wikis have to say about it, that nofollow is less popular outside of WMF with many of those wiki owners who have taken the time to analyze the issue. On the other hand, it could be that people who were happy with the default felt less dissatisfied with MediaWiki devs' decision and therefore didn't feel as much need to voice their opinions, since they had already gotten their way and didn't have to take any measures to override the default.
I do think the implications of changing how nofollow is applied are very different on, say, Wikipedia than they would be on a small or even medium-sized wiki where the average user watches RecentChanges instead of a watchlist. In a small town, you can leave your doors unlocked and get away with it because you don't have as much traffic coming through and the neighbors would notice and care about (for curiosity, if no other reason) the presence of anyone who seemed out of place. It's the same way on these small wikis; it's rare than anyone comes along to try to subtly add a spam link, and when they do, it's noticed. Likewise, if someone starts marking spammy edits as patrolled, that gets noticed.
Spambots are not able yet to be subtle, and the labor required to get accustomed to the norms of a wiki and to become fluent enough in the native language to fit in require a skilled labor that is more expensive than that required to simply pass a CAPTCHA. So, I think that putting dofollow on patrolled external links would be okay especially on smaller wikis, as the patrol would stop the spambots from getting a pagerank boost and the labor costs would deter the subtler ones. Even on Wikipedia, those fighting spam can take advantage of the same economies of scale as those adding spam, such as using pattern recognition on the entire wiki to catch people, or blacklisting individual spammers and taking measures to keep them out (on the smaller wikis, a person caught spamming can just go to another wiki, but if you're caught spamming on Wikipedia, there isn't another site of Wikipedia's size and scope you can go to.)
To say that patrolling wouldn't do enough to keep spam out is basically to say, at least to some extent, that patrolling is not a very effective system and that the wiki way doesn't work very well. If Google agrees, they can stop giving wikis in general, or certain wikis, such influence over pagerank. The spammers have market incentives to become more sophisticated, but so does Google, since their earnings depend on keeping their search results relevant and useful, so that people don't switch to competitors that do a better job.
The question of what the default configuration should be, or what configuration should be used on WMF sites, can be addressed in other bugs besides this one. It doesn't take much coding to change a default setting from "true" to "false". For now, I would just like to implement the feature and make it available for those wikis who want to use it. So, is there support for putting this in the core as an optional feature, and is there anyone who will do the code review if I write this?
On 11/18/2013 09:46 AM, Nathan Larson wrote:
I do think the implications of changing how nofollow is applied are very different on, say, Wikipedia than they would be on a small or even medium-sized wiki
As I said, at least for Google there should be no difference as it ignores rel=nofollow on MediaWiki-powered sites anyway. See https://bugzilla.wikimedia.org/show_bug.cgi?id=52617.
Gabriel
On Mon, Nov 18, 2013 at 12:58 PM, Gabriel Wicke gwicke@wikimedia.orgwrote:
On 11/18/2013 09:46 AM, Nathan Larson wrote:
I do think the implications of changing how nofollow is applied are very different on, say, Wikipedia than they would be on a small or even medium-sized wiki
As I said, at least for Google there should be no difference as it ignores rel=nofollow on MediaWiki-powered sites anyway. See https://bugzilla.wikimedia.org/show_bug.cgi?id=52617.
Gabriel
Do we have any way of knowing that Yong-Gang Wang of Google is correct about this? I sent a message to this individualhttps://plus.google.com/105349418663822362024/about(hopefully it's the same guy) asking for more information. It seems like a pretty major departure from past Google policy/practice.
On 11/18/2013 10:11 AM, Nathan Larson wrote:
Do we have any way of knowing that Yong-Gang Wang of Google is correct about this? I sent a message to this individualhttps://plus.google.com/105349418663822362024/about(hopefully it's the same guy) asking for more information. It seems like a pretty major departure from past Google policy/practice.
I think it is highly likely that he is correct about this. Professional spammers will likely monitor the effect of their campaigns closely, so would know about this first. I would expect less wiki spam if rel=nofollow was actually honored. Especially hidden (unclickable) links don't have much value apart from page rank.
Gabriel
On 2013-11-18 2:19 PM, "Gabriel Wicke" gwicke@wikimedia.org wrote:
On 11/18/2013 10:11 AM, Nathan Larson wrote:
Do we have any way of knowing that Yong-Gang Wang of Google is correct about this? I sent a message to this individual<https://plus.google.com/105349418663822362024/about
(hopefully
it's the same guy) asking for more information. It seems like a pretty major departure from past Google policy/practice.
I think it is highly likely that he is correct about this. Professional spammers will likely monitor the effect of their campaigns closely, so would know about this first. I would expect less wiki spam if rel=nofollow was actually honored. Especially hidden (unclickable) links don't have much value apart from page rank.
Gabriel
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
That certainly sounds logical for wikipedia and friends. However it sounds kind of odd for mediawiki in general. There exists many unmaintained mw installs just collecting spam.
It would also be interesting to know if other search engines do something similar.
-bawolff
On 18 November 2013 17:46, Nathan Larson nathanlarson3141@gmail.com wrote:
If Google agrees, they can stop giving wikis in general, or certain wikis, such influence over pagerank. The spammers have market incentives to become more sophisticated, but so does Google, since their earnings depend on keeping their search results relevant and useful, so that people don't switch to competitors that do a better job.
Market forces are not our friend. Google's incentive is to *ignore* spammy links, not to stop them existing; spammers' incentive is to get their links wherever they possibly can, and particularly in the places where they're effective, not to avoid putting links where they're not effective. Pure market forces would leave wikis (large and small) attacked by progressively more sophisticated spam, search engines being progressively smarter about ignoring the spam, and wikis *still being served with as much spam as before* (and it being progressively harder to identify and remove).
Wikis can only participate in the arms race by exposing publicly the *extent* to which spamming is pointless. Google publicising the fact that nofollow is ignored (and hence spamming is pointful) is actually a really unhelpful thing for them to do. If they really have taken the nofollow weapon away from wikis altogether, then we need to find a way to get it back.
--HM
On 11/18/2013 01:03 PM, Happy Melon wrote:
wikis *still being served with as much spam as before* (and it being progressively harder to identify and remove).
I do think the implications of changing how nofollow is applied are very different on, say, Wikipedia than they would be on a small or even medium-sized wiki where the average user watches RecentChanges instead of
a
watchlist. In a small town, you can leave your doors unlocked and get away with it because you don't have as much traffic coming through and the neighbors would notice and care about (for curiosity, if no other reason) the presence of anyone who seemed out of place. It's the same way on these small wikis; it's rare than anyone comes along to try to subtly add a spam link, and when they do, it's noticed. Likewise, if someone starts marking spammy edits as patrolled, that gets noticed.
That's actually the opposite of what I expect. Small wikis have much less resources to deal with spam, so the per capita spam is significantly larger (imho)
The question of what the default configuration should be, or what configuration should be used on WMF sites, can be addressed in other bugs besides this one. It doesn't take much coding to change a default setting from "true" to "false". For now, I would just like to implement the
feature
and make it available for those wikis who want to use it. So, is there support for putting this in the core as an optional feature, and is there anyone who will do the code review if I write this?
If there reasonably conceivable exists 3rd party users who want such a feature, I (speaking just for myself) see no problem with having it as an off by default, feature in core.
-bawolff _______________________________________________
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 11/17/2013 03:41 AM, Nathan Larson wrote:
Following the mediawiki-l discussionhttp://lists.wikimedia.org/pipermail/mediawiki-l/2013-November/042038.htmlabout $wgNoFollowLinks and various other discussions, in which some discontent was expressed with the current two options of either applying or not applying nofollow to all external links, I wanted to see what support there might be for applying nofollow only to external links added in revisions that are still unpatrolled (bug 42599https://bugzilla.wikimedia.org/show_bug.cgi?id=42599 ).
Google and probably other search engines have a custom rule to ignore rel=nofollow in MediaWiki-powered wikis. It seems that our external links are too high quality to pass up. See https://bugzilla.wikimedia.org/show_bug.cgi?id=52617.
Gabriel
On 18 November 2013 11:47, Gabriel Wicke gwicke@wikimedia.org wrote:
On 11/17/2013 03:41 AM, Nathan Larson wrote:
Following the mediawiki-l discussion<
http://lists.wikimedia.org/pipermail/mediawiki-l/2013-November/042038.html
about $wgNoFollowLinks and various other discussions, in which some discontent was expressed with the current two options of either applying
or
not applying nofollow to all external links, I wanted to see what support there might be for applying nofollow only to external links added in revisions that are still unpatrolled (bug 42599https://bugzilla.wikimedia.org/show_bug.cgi?id=42599 ).
Google and probably other search engines have a custom rule to ignore rel=nofollow in MediaWiki-powered wikis. It seems that our external links are too high quality to pass up. See https://bugzilla.wikimedia.org/show_bug.cgi?id=52617.
Oh dear. This becomes a philosophical versus practical discussion. Practically, we have nowhere near enough spam-fighters to keep just the obvious spam off our projects, let alone the not-as-obvious spam.
People keep mixing up English Wikipedia (with thousands of active editors, many of whom do nothing but page patrolling) with the rest of the Wikimedia projects, many of which have only a handful of active editors, who then get stuck having to choose between spam-fighting or adding content. Software decisions should not be made based on the assumption that some editor somewhere will clean up the problems.
Given the ease by which all Mediawiki wikis can be infiltrated by useless and spam links, and the particular ease by which most Wikimedia wikis can be infiltrated, Google's pretty badly polluting their pageranks if they're given links to Wikimedia projects any significant rank.
To be honest, I suspect if the Google fellow said anything like this, it was that they might ignore nofollow on Wikimedia wikis, but I'm pretty certain that he didn't say Mediawiki wikis. There are thousands and thousands of them out there that have been completely abandoned to spam.
Risker
On 11/18/2013 12:27 PM, Risker wrote:
To be honest, I suspect if the Google fellow said anything like this, it was that they might ignore nofollow on Wikimedia wikis, but I'm pretty certain that he didn't say Mediawiki wikis.
I remember being surprised too that it applied to all MediaWiki installations rather than just Wikimedia sites. I have pinged him about it.
Gabriel
On 18 November 2013 21:59, Gabriel Wicke gwicke@wikimedia.org wrote:
On 11/18/2013 12:27 PM, Risker wrote:
To be honest, I suspect if the Google fellow said anything like this, it was that they might ignore nofollow on Wikimedia wikis, but I'm pretty certain that he didn't say Mediawiki wikis.
I remember being surprised too that it applied to all MediaWiki installations rather than just Wikimedia sites. I have pinged him about it.
I run a small but public MW installation and I've seen others overrun by spam. I appreciate the noble intent in removing nofollow by default, but as one of those third-party users, I'd still rather it didn't change. If your intent is to change it on en:wp, that's a local config switch.
- d.
On 11/18/2013 01:59 PM, Gabriel Wicke wrote:
On 11/18/2013 12:27 PM, Risker wrote:
To be honest, I suspect if the Google fellow said anything like this, it was that they might ignore nofollow on Wikimedia wikis, but I'm pretty certain that he didn't say Mediawiki wikis.
I remember being surprised too that it applied to all MediaWiki installations rather than just Wikimedia sites. I have pinged him about it.
I now received an answer from my contact at Google:
Google will not follow rel=nofollow links, and not flow pagerank through them. That includes Wiki{m,p}edia sources.
So the information I got at Wikimania was either not correct or the result of a misunderstanding on my part. Another possibility is that this detail of how pagerank works is considered too sensitive for publication.
It should not be too hard to verify this independently by setting up a fresh page with an unguessable URL and linking it from a wiki page with rel=nofollow. If googlebot visits that page (or it turns up in search results), then rel=nofollow was ignored.
Gabriel
wikitech-l@lists.wikimedia.org