Hi all, bug 42594 https://bugzilla.wikimedia.org/show_bug.cgi?id=42594proposes changing the default value of $wgNoFollowLinks https://www.mediawiki.org/wiki/Manual:$wgNoFollowLinksfrom true to false. The status quo is that, by default, external URL links in wiki text will be given the rel="nofollow" attribute as a hint to search engines that they should not be followed for ranking purposes as they are user-supplied and thus subject to spamming. If the change is implemented, you will need to change your LocalSettings.php to switch $wgNoFollowLinks to true if you want to keep the status quo on your wiki.
The argument for the status quo is that nofollow deters spammers. The argument for the proposed change it is that it's better for the Internet as a whole, and arguably for the individual wikis, to have the links followed for ranking purposes. I'll focus on the arguments in favor of the change and let others rebut them.
Suppose you run a wiki, wiki.foowidget.com, devoted to documenting your software application, FooWidget. If you link to, say, the main foowidget.comsite or to a vendor that stocks your software, would you not want to improve their pagerank, since this benefits you?
The same goes for, e.g., nonprofits that are promoting a cause. If you run CancerWiki and there are a bunch of links on your site to the American Cancer Society and other allied causes, would you not want to increase their pagerank? I think that in the wikisphere, what we commonly see is wikis devoted to niche interests they are trying to promote or share information about. The reason they link to certain websites is that a community consensus has decided that those sites are useful for effectively promoting, or informing people about, those topics.
If the links are spammy, then the editing community at that wiki should revert those spam edits. If they do so promptly, then if they have any effect on pagerank at all, it won't be for long. A well-maintained wiki will mostly have links to good sites, and the effect of the pagerank boost those provide will drown out the pagerank boost that goes to the short-lived spam links.
Also, we have other antispam tools that are way more effective than nofollow at deterring spam. Sites that mirror a wiki may not apply nofollow anyway, in which case those links might still increase the spammers' pagerank, regardless of your nofollow setting. It's hard to reduce the benefits that accrue to the spammers, except by vigilantly reverting their edits; it's easier to increase the costs that the spammers incur, by using CAPTCHAs and the like.
$wgNoFollowLinks was introduced in MediaWiki 1.4.0 as a setting that defaults to true, so I'm not sure that we really gave the other option much of a chance. Also, well-designed search engines should have other measures too for sorting out what's spammy. There should be some sort of algorithm for identifying wikis that have been overrun by spam, much as the search engines have ways of figuring out which sites have a bunch of links just for SEO purposes.
On 09.11.2013, 20:57 Nathan wrote:
Hi all, bug 42594 https://bugzilla.wikimedia.org/show_bug.cgi?id=42594proposes changing the default value of $wgNoFollowLinks https://www.mediawiki.org/wiki/Manual:$wgNoFollowLinksfrom true to false. The status quo is that, by default, external URL links in wiki text will be given the rel="nofollow" attribute as a hint to search engines that they should not be followed for ranking purposes as they are user-supplied and thus subject to spamming. If the change is implemented, you will need to change your LocalSettings.php to switch $wgNoFollowLinks to true if you want to keep the status quo on your wiki.
Please email wikitech-l about this proposal - here is a wrong place to discuss this.
On Sun, Nov 10, 2013 at 8:37 AM, Max Semenik maxsem.wiki@gmail.com wrote:
Please email wikitech-l about this proposal - here is a wrong place to discuss this.
My intent was to invite the community of wiki system administrators, not just developers, to weigh in with their arguments in support of, or in opposition to, the proposed change. A lot of the system administrators probably aren't subscribed to wikitech-l. We can have a parallel discussion on wikitech-l if there are more complicated technical issues about the change that need to be discussed.
I'm opposed to this change. A site administrator with a big enough community to address spammy links, and wants to enable this feature, is likely savvy enough to change the preference from true to false.
I think setting this to false by default is going to encourage spam bot authors to target MediaWiki specifically, more than they currently do.
On Sat, Nov 9, 2013 at 8:57 AM, Nathan Larson nathanlarson3141@gmail.comwrote:
Hi all, bug 42594 https://bugzilla.wikimedia.org/show_bug.cgi?id=42594proposes changing the default value of $wgNoFollowLinks https://www.mediawiki.org/wiki/Manual:$wgNoFollowLinksfrom true to false. The status quo is that, by default, external URL links in wiki text will be given the rel="nofollow" attribute as a hint to search engines that they should not be followed for ranking purposes as they are user-supplied and thus subject to spamming. If the change is implemented, you will need to change your LocalSettings.php to switch $wgNoFollowLinks to true if you want to keep the status quo on your wiki.
The argument for the status quo is that nofollow deters spammers. The argument for the proposed change it is that it's better for the Internet as a whole, and arguably for the individual wikis, to have the links followed for ranking purposes. I'll focus on the arguments in favor of the change and let others rebut them.
Suppose you run a wiki, wiki.foowidget.com, devoted to documenting your software application, FooWidget. If you link to, say, the main foowidget.comsite or to a vendor that stocks your software, would you not want to improve their pagerank, since this benefits you?
The same goes for, e.g., nonprofits that are promoting a cause. If you run CancerWiki and there are a bunch of links on your site to the American Cancer Society and other allied causes, would you not want to increase their pagerank? I think that in the wikisphere, what we commonly see is wikis devoted to niche interests they are trying to promote or share information about. The reason they link to certain websites is that a community consensus has decided that those sites are useful for effectively promoting, or informing people about, those topics.
If the links are spammy, then the editing community at that wiki should revert those spam edits. If they do so promptly, then if they have any effect on pagerank at all, it won't be for long. A well-maintained wiki will mostly have links to good sites, and the effect of the pagerank boost those provide will drown out the pagerank boost that goes to the short-lived spam links.
Also, we have other antispam tools that are way more effective than nofollow at deterring spam. Sites that mirror a wiki may not apply nofollow anyway, in which case those links might still increase the spammers' pagerank, regardless of your nofollow setting. It's hard to reduce the benefits that accrue to the spammers, except by vigilantly reverting their edits; it's easier to increase the costs that the spammers incur, by using CAPTCHAs and the like.
$wgNoFollowLinks was introduced in MediaWiki 1.4.0 as a setting that defaults to true, so I'm not sure that we really gave the other option much of a chance. Also, well-designed search engines should have other measures too for sorting out what's spammy. There should be some sort of algorithm for identifying wikis that have been overrun by spam, much as the search engines have ways of figuring out which sites have a bunch of links just for SEO purposes.
-- Nathan Larson https://mediawiki.org/wiki/User:Leucosticte Distribution of my contributions to this email is hereby authorized pursuant to the CC0 license< http://creativecommons.org/publicdomain/zero/1.0/%3E . _______________________________________________ MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
On Tue, Nov 12, 2013 at 1:29 PM, Chris Steipp csteipp@wikimedia.org wrote:
I'm opposed to this change. A site administrator with a big enough community to address spammy links, and wants to enable this feature, is likely savvy enough to change the preference from true to false.
I think setting this to false by default is going to encourage spam bot authors to target MediaWiki specifically, more than they currently do.
The available statistics http://www.wikirobot.net/wikibase_breakdown.aspxshow that most sites leave nofollow on. That could be because they carefully weighed the pros and cons and decided to leave it on, or (in my opinion more likely) they didn't think much, or at all, about it. When people don't have a strong opinion one way or another about what to do, or any immediate crisis impelling them to action, they often tend to get distracted by other priorities and leave the default in place.
You're right; if more wikis were to switch off nofollow, it almost certainly would encourage spammers to target MediaWiki more. That in turn would likely tend to prompt affected site owners to install more/better antispam tools, and would stimulate demand for development of such tools. Of course, there are costs associated with allocating labor to those activities; I'm just saying the problems can be mitigated from what they would be if the community could not adapt. Admittedly, there might be some attrition because some site owners will simply give up. In economics parlance, it's a question of how inelastichttps://en.wikipedia.org/wiki/Price_elasticity_of_demandthe demand for the benefits of wiki site ownership is; if it's pretty inelastic, then site owners won't be deterred by the spammers.
Shutting off nofollow could encourage more editing in general, not just spambot editing. Sometimes there's a grey area of semi-spam, in which people make edits that are somewhat useful to the project's goals and also somewhat promotional. Arguably, much of Wikipedia's content was contributed by people pursuing some sort of personal agenda that happened to be enough aligned with Wikipedia's goals that the two were able to coexist. Sometimes, people who wanted to engage in promotional editing probably got involved in other parts of the community, and edited unrelated articles, in order to make their agendas less obvious.
If people realize that they can bring up the pagerank of sites pertaining to their favorite entities, activities, interests, etc. by editing wikis, they might get more interested in doing so. An increase in contributed wiki content can in turn attract more visitors (who typically find the sites by search engines); some of these visitors will become editors, and so on, in a virtuous cycle. The more visitors and contributors there are, the more resources (including editors' labor) become available for fighting spam, and thus the problem takes care of itself, and then some. It might be that with a more vibrant http://wikiindex.org/Category:Vibrant wikisphere, there will actually be less spam on wikis because it will get noticed and removed faster, and the larger wiki community will be able to support the allocation of more MediaWiki developer labor, some of which will go to developing antispam tools.
To use a gardening analogy, I think it's a question of whether the plants can outgrow the weeds and choke them out, or grow away from them (like a tomato plant whose tendrils climb a fence) if we stop applying a certain weed killer. Ideally, you don't want to apply an unnecessary weed killer. Some kinds of plants can handle the weeds on their own; some (e.g. corn, if I recall correctly) can't. I'm not sure if the wikisphere is more like a tomato plant or a cornstalk. I've operated several wikis and never had to rely on nofollow; Asirra and a reasonable level of diligence always took care of the spammers pretty well.
-Nathan
On 11/13/2013 04:59 AM, Nathan Larson wrote:
To use a gardening analogy, I think it's a question of whether the plants can outgrow the weeds and choke them out, or grow away from them (like a tomato plant whose tendrils climb a fence) if we stop applying a certain weed killer.
To continue the gardening analogy, effective spam fighting tends to work better when more people are working in the garden.
But you can have a successful wiki with only one or two gardeners. MediaWiki's default setup doesn't provide a good environment for this, though.
I'd like to get the default setup addressed before we start to play with nofollow.
Mark.
On Wed, Nov 13, 2013 at 1:59 AM, Nathan Larson nathanlarson3141@gmail.comwrote:
You're right; if more wikis were to switch off nofollow, it almost certainly would encourage spammers to target MediaWiki more. That in turn would likely tend to prompt affected site owners to install more/better antispam tools, and would stimulate demand for development of such tools. Of course, there are costs associated with allocating labor to those activities; I'm just saying the problems can be mitigated from what they would be if the community could not adapt. Admittedly, there might be some attrition because some site owners will simply give up. In economics parlance, it's a question of how inelastichttps://en.wikipedia.org/wiki/Price_elasticity_of_demandthe demand for the benefits of wiki site ownership is; if it's pretty inelastic, then site owners won't be deterred by the spammers.
Speaking as the former Admin Tools lead, I'm (currently) seeing lots of spam issues piling up, but even serious issues fail to be addressed because there are bigger issues that have to be prioritized. I'm not optimistic that new tools will be developed an installed by site administrators. Developer time is, sadly, more expensive than lots of volunteers doing manual work.
Shutting off nofollow could encourage more editing in general, not just spambot editing. Sometimes there's a grey area of semi-spam, in which people make edits that are somewhat useful to the project's goals and also somewhat promotional. Arguably, much of Wikipedia's content was contributed by people pursuing some sort of personal agenda that happened to be enough aligned with Wikipedia's goals that the two were able to coexist. Sometimes, people who wanted to engage in promotional editing probably got involved in other parts of the community, and edited unrelated articles, in order to make their agendas less obvious.
I think a few people would definitely argue this point, but I do see your point. Maybe someone on the growth team could comment on editor motivations, and being motivated by giving (SEO relevant) links to other sites? If there's a whole lot of editors who would suddenly spend more time editing if they knew it affected SEO, then there might be a case for turning this on (even on WMF wikis). But there are other people on this list far more qualified to make those calls.
On 12 November 2013 18:29, Chris Steipp csteipp@wikimedia.org wrote:
I'm opposed to this change. A site administrator with a big enough community to address spammy links, and wants to enable this feature, is likely savvy enough to change the preference from true to false. I think setting this to false by default is going to encourage spam bot authors to target MediaWiki specifically, more than they currently do.
Concur. If someone wants to switch this on, they can. But leaving the default to on? Spammer magnet.
- d.
On 10/11/13 03:57, Nathan Larson wrote:
Also, we have other antispam tools that are way more effective than nofollow at deterring spam.
Like what? AbuseFilter is unusable for small wikis, SpamBlacklist is poorly maintained, and FancyCaptcha is comprehensively broken -- we have had reports of sites with FancyCaptcha being spammed to death. Pretty much any captcha can be broken for $1.39 per 1000. You wouldn't need very many impressions per edit for that to be economical.
Also, well-designed search engines should have other measures too for sorting out what's spammy.
Sure, they do have such measures, but after years of incremental development, the measures were clearly not working, which is why Google introduced nofollow.
-- Tim Starling
On Wed, Nov 13, 2013 at 7:42 PM, Tim Starling tstarling@wikimedia.orgwrote:
On 10/11/13 03:57, Nathan Larson wrote:
Also, we have other antispam tools that are way more effective than nofollow at deterring spam.
Like what? AbuseFilter is unusable for small wikis, SpamBlacklist is poorly maintained, and FancyCaptcha is comprehensively broken -- we have had reports of sites with FancyCaptcha being spammed to death. Pretty much any captcha can be broken for $1.39 per 1000. You wouldn't need very many impressions per edit for that to be economical.
Asirra works pretty well. I've never seen it fail to bring spam down to almost nil.
On 14/11/13 20:52, Nathan Larson wrote:
On Wed, Nov 13, 2013 at 7:42 PM, Tim Starling tstarling@wikimedia.orgwrote:
On 10/11/13 03:57, Nathan Larson wrote:
Also, we have other antispam tools that are way more effective than nofollow at deterring spam.
Like what? AbuseFilter is unusable for small wikis, SpamBlacklist is poorly maintained, and FancyCaptcha is comprehensively broken -- we have had reports of sites with FancyCaptcha being spammed to death. Pretty much any captcha can be broken for $1.39 per 1000. You wouldn't need very many impressions per edit for that to be economical.
Asirra works pretty well. I've never seen it fail to bring spam down to almost nil.
So, is there any open source solution?
-- Tim Starling
Tim Starling <tstarling <at> wikimedia.org> writes:
On 10/11/13 03:57, Nathan Larson wrote:
Also, we have other antispam tools that are way more effective than nofollow at deterring spam.
Like what? AbuseFilter is unusable for small wikis, SpamBlacklist is poorly maintained, and FancyCaptcha is comprehensively broken -- we have had reports of sites with FancyCaptcha being spammed to death. Pretty much any captcha can be broken for $1.39 per 1000. You wouldn't need very many impressions per edit for that to be economical.
Except that's a bit of a strawman. Sure all of those spam tools are relatively broken. You can use them on a wiki and still get a pile of spam. Except that most of those wikis being filled with spam haven't turned off $wgNoFollowLinks. So even though the spam tools are broken, the same goes for nofollow.
What really matters for these comparisons is "how much" spam each of these tools are capable of deflecting. And what the overlap between them is.
For example (percentages are arbitrary): If nofollow carries negative effects with it. And it deflects 5% of spam. But one of these other anti-spam tools deflects 25% of spam, including the same 5% group that are stopped by nofollow. Then nofollow is irrelevant since it doesn't deflect any spam that would already be deflected leading to a net negative for it being enabled.
Or another example: Even if we "know" that one spam tool has the option to take nofollow into account. If nofollow only stops <1% of spam from showing up in practice. Then it's pretty irrelevant.
Another point is if you can put a MediaWiki installation up under default settings right now and have it flooded with spam. Then $wgNoFollowLinks being on by default is pretty irrelevant if the difference is that a tiny bit more spam shows up. As whether it's on or off the wiki is still being flooded with an unbearable amount of spam.
Unfortunately I don't know of any actual statistical test anyone's done comparing the amout of spam made on a wiki with and without nofollow. Or other anti-spam tools.
On 14/11/13 22:51, Daniel Friesen wrote:
Unfortunately I don't know of any actual statistical test anyone's done comparing the amout of spam made on a wiki with and without nofollow. Or other anti-spam tools.
I couldn't find any useful statistics after a bit of web searching. It would be interesting to set up two wikis, and get one of them listed on "dofollow" lists such as these:
http://www.blackhatlists.com/view-item/52/352-DoFollow-Wikis-List.html http://www.milanchymcak.com/blog/fresh-dofollow-wiki-list-download-updated-dofollow-wiki-list-for-free http://www.blackhatgroup.com/f8/%5Bget%5D-list-do-follow-wiki-sites-105103.html
... and leave the other one off them, and see which gets more spam. I gather the actual spambots are stupid and wouldn't notice whether the wiki actually has nofollow enabled, but getting a wiki included on these crawler-generated lists may have some impact.
-- Tim Starling
On 2013-11-18 3:57 AM, Tim Starling wrote:
On 14/11/13 22:51, Daniel Friesen wrote:
Unfortunately I don't know of any actual statistical test anyone's done comparing the amout of spam made on a wiki with and without nofollow. Or other anti-spam tools.
I couldn't find any useful statistics after a bit of web searching. It would be interesting to set up two wikis, and get one of them listed on "dofollow" lists such as these:
http://www.blackhatlists.com/view-item/52/352-DoFollow-Wikis-List.html http://www.milanchymcak.com/blog/fresh-dofollow-wiki-list-download-updated-dofollow-wiki-list-for-free http://www.blackhatgroup.com/f8/%5Bget%5D-list-do-follow-wiki-sites-105103.html
... and leave the other one off them, and see which gets more spam. I gather the actual spambots are stupid and wouldn't notice whether the wiki actually has nofollow enabled, but getting a wiki included on these crawler-generated lists may have some impact.
-- Tim Starling
If we're going to go and leave a wiki off the list we might as well make it 4 wikis (on list but nofollow, on list and dofollow, off list and nofollow, off list and dofollow) to get proper data not skewed by the settings. In addition to data on whether the dofollow list increases spam we can also get some data both on whether bots using dofollow lists actually test for nofollow as well as if it makes a difference for public wikis not originally on the list.
Ideally long term we should be running honeypots running a variety of anti spam tools. Watching what kind of spam gets sent. What bots and types of spam bypass the different tools. etc... Project Honeypot (https://www.projecthoneypot.org/) runs a bunch but they're not wikis, they're not monitoring our types of spam or data about the bots we get.
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://danielfriesen.name/]
On Mon, Nov 18, 2013 at 7:27 AM, Daniel Friesen daniel@nadir-seen-fire.comwrote:
http://www.blackhatlists.com/view-item/52/352-DoFollow-Wikis-List.html <
http://www.milanchymcak.com/blog/fresh-dofollow-wiki-list-download-updated-d...
<
http://www.blackhatgroup.com/f8/%5Bget%5D-list-do-follow-wiki-sites-105103.h...
The first link has you download an executable file that prompts Windows to ask whether it's okay to make system configuration changes. If I had a computer with a hard drive I didn't mind exposing to malware infections, I might see what that installation would do. (These are black hats we're dealing with, after all.) The second link asks you to endorse them on social media to unlock the content, and the third link doesn't seem to actually lead eventually to the promised content (it says it's down). It did, however, feel pretty ironic to have to pass a CAPTCHA to register to get a file allegedly useful in spamming.
On 19/11/13 03:19, Nathan Larson wrote:
On Mon, Nov 18, 2013 at 7:27 AM, Daniel Friesen daniel@nadir-seen-fire.comwrote:
http://www.blackhatlists.com/view-item/52/352-DoFollow-Wikis-List.html <
http://www.milanchymcak.com/blog/fresh-dofollow-wiki-list-download-updated-d...
<
http://www.blackhatgroup.com/f8/%5Bget%5D-list-do-follow-wiki-sites-105103.h...
The first link has you download an executable file that prompts Windows to ask whether it's okay to make system configuration changes. If I had a computer with a hard drive I didn't mind exposing to malware infections, I might see what that installation would do. (These are black hats we're dealing with, after all.) The second link asks you to endorse them on social media to unlock the content, and the third link doesn't seem to actually lead eventually to the promised content (it says it's down). It did, however, feel pretty ironic to have to pass a CAPTCHA to register to get a file allegedly useful in spamming.
That's a pretty normal user experience for this sort of thing, but it doesn't mean that the lists don't exist. There is a similar list for forums which can easily be verified:
http://www.techmaish.com/700-dofollow-forums-list/
It is quite old (2010), but it demonstrates the principle. The comments leave you in no doubt as to what the list was used for, e.g.:
"really good list. yes i m agree with you. if we work hard and post 50 per forum then its easy get 35000 back link. thank you very much for sharing this huge list"
My point is, it's difficult to answer the question "will setting $wgNoFollowLinks=false on my wiki cause it to be spammed more?" but easier to answer the question "are spammers particularly interested in finding and spamming dofollow websites?"
Look at this site, which dates from 2012:
http://how2getbacklinks.com/wikibacklinks/
It talks about how important it is to get dofollow links, and offers to spam "12,000 wiki sites + 400 dofollow links" for $45.
-- Tim Starling
mediawiki-l@lists.wikimedia.org