Regardless of rel="nofollow" the current whitelist and blacklist features which are used across languages anyway will still be needed.
Therefore, could I put in a request for a "greenlist" feature to allow sysop approved links to be generated without rel="nofollow"?
If it is hard to do directly (and I cannot see an easy way), an acceptable alternative is the creation of http://greenlist.wikipedia.org under interwiki which autodiverts:
http://greenlist.wikipedia.org /first-string/second-string to http:///first-string/second-string for any /first-string/ on a greenlist.
It seems to me inevitable that this feature will be wanted sooner or later on en, and will be useful for other implementations so doing it sooner rather than later is preferable...
On 24.01.2007 13:45, Andrew Cates wrote:
Regardless of rel="nofollow" the current whitelist and blacklist features which are used across languages anyway will still be needed.
Therefore, could I put in a request for a "greenlist" feature to allow sysop approved links to be generated without rel="nofollow"?
"sysop approved" links?
Sysops are choosen by consensus. How can that consensus be expanded to "can choose external non-nofollow links by own discretion"?
Or do we hold "Request(s) for external non-nofollow link" discussions then? (Um, WP:RFN is already taken ;-)
What's next?
Sysop approved new articles? Sysop approved article content? Sysop approved edits? Requests for article edits?
I know I'm exaggerating, but you might get the idea...
(I wouldn't be surprised if Google starts ignoring rel=nofollow anyway - sooner or later. Or they might give it some weight depending on some obscure unknown criteria.)
On 1/24/07, Ligulem ligulem@pobox.com wrote:
"sysop approved" links?
Sysops are choosen by consensus. How can that consensus be expanded to "can choose external non-nofollow links by own discretion"?
They already choose whether to spam-blacklist particular links.
Sysop approved edits? Requests for article edits?
{{editprotected}}
On 1/24/07, Simetrical Simetrical+wikitech@gmail.com wrote:
On 1/24/07, Ligulem ligulem@pobox.com wrote:
"sysop approved" links?
Sysops are choosen by consensus. How can that consensus be expanded to "can choose external non-nofollow links by own discretion"?
They already choose whether to spam-blacklist particular links.
Wiki admins don't. Meta admins do.
It's an issue of scaling. It's not reasonable to expect the smallish number small number of admins to maintain a list of this size, while it's reasonable to expect the smaller number of admins on meta to maintain the tiny SBL.
(We don't even have to go as far as using a bloom filter: We handle page coloring by hitting the DB during parsing.. I'd proposed using a bloom filter for this (and even coded a external daemon to handle link color testing) but it just wasn't needed.)
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Gregory Maxwell schreef:
On 1/24/07, Simetrical Simetrical+wikitech@gmail.com wrote:
On 1/24/07, Ligulem ligulem@pobox.com wrote:
"sysop approved" links?
Sysops are choosen by consensus. How can that consensus be expanded to "can choose external non-nofollow links by own discretion"?
They already choose whether to spam-blacklist particular links.
Wiki admins don't. Meta admins do.
But local sysops can override again that blacklist by including that domain on the local whitelist.
And it seems that there will also come a local blacklist active; http://bugzilla.wikimedia.org/show_bug.cgi?id=8492
- -- Contact: walter AT wikizine DOT org Wikizine.org - news for and about the Wikimedia community
On 1/24/07, Walter Vermeir walter@wikipedia.be wrote:
But local sysops can override again that blacklist by including that domain on the local whitelist.
Yes, this was due to a single weird case where a free hosting site was being used by spammers to hit all of our wikis.. but the same free hosting site is widely used by speakers of a single language for legitimate hosting (including things like government sites and such).
The local white lists are not widely used.
And it seems that there will also come a local blacklist active; http://bugzilla.wikimedia.org/show_bug.cgi?id=8492
If every bug or even patch in bugzilla were closed the world would be a very different place.
I'm generally against making the SBL a local decision.. if a URL is being spamvertised and it is without value, and the spamming can't be stopped via less aggressive means, then all the wikis should be protected from it. Although for non-spamming related uses, a local URLBL makes sense.
On 24/01/07, Gregory Maxwell gmaxwell@gmail.com wrote:
If every bug or even patch in bugzilla were closed the world would be a very different place.
We'd have no users, for a start.
Rob Church
On 1/24/07, Rob Church robchur@gmail.com wrote:
On 24/01/07, Gregory Maxwell gmaxwell@gmail.com wrote:
If every bug or even patch in bugzilla were closed the world would be a very different place.
We'd have no users, for a start.
Details.
More importantly, the software wouldn't work very well.
;)
On 24/01/07, Gregory Maxwell gmaxwell@gmail.com wrote:
I'm generally against making the SBL a local decision.. if a URL is being spamvertised and it is without value, and the spamming can't be stopped via less aggressive means, then all the wikis should be protected from it. Although for non-spamming related uses, a local URLBL makes sense.
On en:, the problem is finding a Meta admin in less than geological time.
- d.
David Gerard schreef:
On 24/01/07, Gregory Maxwell gmaxwell@gmail.com wrote:
I'm generally against making the SBL a local decision.. if a URL is being spamvertised and it is without value, and the spamming can't be stopped via less aggressive means, then all the wikis should be protected from it. Although for non-spamming related uses, a local URLBL makes sense.
On en:, the problem is finding a Meta admin in less than geological time.
- d.
Hoi, I am almost always on line. Then again I am not on en. I can be found on IRC easily. The same is true for so many other Meta admins. If finding things means that they have to be on the project itself, try finding a Meta admin on the Swahili wikipedia .. :) Thanks, GerardM
On 25/01/07, Gerard Meijssen gerard.meijssen@gmail.com wrote:
I am almost always on line. Then again I am not on en. I can be found on IRC easily. The same is true for so many other Meta admins. If finding things means that they have to be on the project itself, try finding a Meta admin on the Swahili wikipedia .. :)
If you don't mind having the en:wp spamfighters messaging you lots, then good :-)
- d.
David Gerard schreef:
On 25/01/07, Gerard Meijssen gerard.meijssen@gmail.com wrote:
I am almost always on line. Then again I am not on en. I can be found on IRC easily. The same is true for so many other Meta admins. If finding things means that they have to be on the project itself, try finding a Meta admin on the Swahili wikipedia .. :)
If you don't mind having the en:wp spamfighters messaging you lots, then good :-)
- d.
Hoi, Depends what a lot means. When there is a need for a Meta admin because someone wants to do such a thing, I am sure that he/she can become a Meta Admin fairly quickly. Your notion that people cannot be found is wrong. There are sufficient of them about. They just do not need to hang out on en. Thanks, GerardM
On 1/24/07, Andrew Cates andrew@catesfamily.org.uk wrote:
Therefore, could I put in a request for a "greenlist" feature to allow sysop approved links to be generated without rel="nofollow"?
There are well over 8 million external links in enwiki alone, including well over a half million distinct domains via HTTP.
A greenlist is not a remotely sane solution because of this...
Sane solutions are, however, possible.
I would suggest that any proposed feature interact with whatever version flagging system we end up implementing. I don't believe that will likely be sufficient since the the proposal most likely to be implemented just provides us with a not-vandalized flag... which doesn't do much to indicate that any member of the editing community has actually reviewed the link (and pure time limits are even worse in that regard). Just keep in mind that anything proposed must scale, and it must be robust against editing (special text that only admins can insert is not robust against editing... for example, a non admin couldn't revert page blanking on a page that had special text links).
Thanks for the number: about in line with guestimate.
I agree about the issue on a non-admin reverting a sysop authorised page. That's why I didn't propose that route.
But why does the number make the proposed solution insane? Is it the list maintenance that bothers you? There are plenty of ways of maintaining an adequate list for example DMOZ is a wikipedia partner project and we could take all the links from non-commercial pages of DMOZ as a 95% solution (then they can handle all the arguments). Or you can use a bot to find links currently entered by legit users which have lived for a year without reversion; whatever. Once we have a starting point for a list changes will be much more manageable. In the end the community is entirely up to deciding a process to choose greenlist links; the problem is what's the simplest way to implement it. The divert route seems much simpler because it doesn't clog up existing processes with a big checkdata.
================= Gregory Maxwell wrote:
On 1/24/07, Andrew Cates andrew@catesfamily.org.uk wrote:
Therefore, could I put in a request for a "greenlist" feature to allow sysop approved links to be generated without rel="nofollow"?
There are well over 8 million external links in enwiki alone, including well over a half million distinct domains via HTTP.
A greenlist is not a remotely sane solution because of this...
Sane solutions are, however, possible.
I would suggest that any proposed feature interact with whatever version flagging system we end up implementing. I don't believe that will likely be sufficient since the the proposal most likely to be implemented just provides us with a not-vandalized flag... which doesn't do much to indicate that any member of the editing community has actually reviewed the link (and pure time limits are even worse in that regard). Just keep in mind that anything proposed must scale, and it must be robust against editing (special text that only admins can insert is not robust against editing... for example, a non admin couldn't revert page blanking on a page that had special text links).
Sorry if duplicate: looks like last send broke... ========== Andrew Cates wrote:
Thanks for the number: about in line with guestimate.
I agree about the issue on a non-admin reverting a sysop authorised page. That's why I didn't propose that route.
But why does the number make the proposed solution insane? Is it the list maintenance that bothers you? There are plenty of ways of maintaining an adequate list for example DMOZ is a wikipedia partner project and we could take all the links from non-commercial pages of DMOZ as a 95% solution (then they can handle all the arguments). Or you can use a bot to find links currently entered by legit users which have lived for a year without reversion; whatever. Once we have a starting point for a list changes will be much more manageable. In the end the community is entirely up to deciding a process to choose greenlist links; the problem is what's the simplest way to implement it. The divert route seems much simpler because it doesn't clog up existing processes with a big checkdata. =================
Gregory Maxwell wrote:
On 1/24/07, Andrew Cates andrew@catesfamily.org.uk wrote:
Therefore, could I put in a request for a "greenlist" feature to allow sysop approved links to be generated without rel="nofollow"?
There are well over 8 million external links in enwiki alone, including well over a half million distinct domains via HTTP.
A greenlist is not a remotely sane solution because of this...
Sane solutions are, however, possible.
I would suggest that any proposed feature interact with whatever version flagging system we end up implementing. I don't believe that will likely be sufficient since the the proposal most likely to be implemented just provides us with a not-vandalized flag... which doesn't do much to indicate that any member of the editing community has actually reviewed the link (and pure time limits are even worse in that regard). Just keep in mind that anything proposed must scale, and it must be robust against editing (special text that only admins can insert is not robust against editing... for example, a non admin couldn't revert page blanking on a page that had special text links).
On 24/01/07, Andrew Cates andrew@catesfamily.org.uk wrote:
maintaining an adequate list for example DMOZ is a wikipedia partner project and we could take all the links from non-commercial pages of
Is it?
Rob Church
Well it says so on every page below home at dmoz
"*Visit our sister sites* mozilla.org http://www.mozilla.org/ | ChefMoz http://chefmoz.org/ | MusicMoz http://musicmoz.org/ | Open-Site http://open-site.org/ | Wikipedia" http://en.wikipedia.org/
Unrequited love perhaps?
==================== Rob Church wrote:
On 24/01/07, Andrew Cates andrew@catesfamily.org.uk wrote:
maintaining an adequate list for example DMOZ is a wikipedia partner project and we could take all the links from non-commercial pages of
Is it?
Rob Church
FWIW I think Angela was in some dialogue with them a few years ago. I wonder if they know we've "nofollowed" all our links to them given what they give us
===================== Andrew Cates wrote:
Well it says so on every page below home at dmoz
"*Visit our sister sites* mozilla.org http://www.mozilla.org/ | ChefMoz http://chefmoz.org/ | MusicMoz http://musicmoz.org/ | Open-Site http://open-site.org/ | Wikipedia" http://en.wikipedia.org/
Unrequited love perhaps?
==================== Rob Church wrote:
On 24/01/07, Andrew Cates andrew@catesfamily.org.uk wrote:
maintaining an adequate list for example DMOZ is a wikipedia partner project and we could take all the links from non-commercial pages of
Is it?
Rob Church
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 1/24/07, Andrew Cates andrew@catesfamily.org.uk wrote:
FWIW I think Angela was in some dialogue with them a few years ago. I wonder if they know we've "nofollowed" all our links to them given what they give us
That was a long time ago, before I was on the Board I think. I've lost my account through inactivity since then, so I don't remember exactly what was said, but someone there had been in contact with Jimbo about becoming a sister site and didn't seem to mind that Wikipedia wasn't considering them a sister site even though they consider Wikipedia one.
Angela
On 24/01/07, Andrew Cates andrew@catesfamily.org.uk wrote:
Well it says so on every page below home at dmoz
"*Visit our sister sites* mozilla.org http://www.mozilla.org/ | ChefMoz http://chefmoz.org/ | MusicMoz http://musicmoz.org/ | Open-Site http://open-site.org/ | Wikipedia" http://en.wikipedia.org/
I'm sure someone from the Board will be able to confirm it. It's not mentioned on the Foundation web site as far as I'm aware.
Rob Church
On 1/24/07, Andrew Cates andrew@catesfamily.org.uk wrote: [snip]
There are plenty of ways of maintaining an adequate list for example DMOZ is a wikipedia partner project and we could take all the links from non-commercial pages of DMOZ as a 95% solution (then they can handle all the arguments).
[snip]
Anyone else find it deeply ironic that DMOZ would somehow be mentioned as a solution to our spamming problems?
For more background on the current state of DMOZ: http://www.skrenta.com/2006/12/dmoz_had_9_lives_used_up_yet.html
Okay, okay. As a DMOZ editor I take it on the chin, particularly the technical side there has serious issues.
But in fact that's a red herring. The point is in principle it is relatively easy to maintain a long list via an bot+algo or a harvest or whatever and implement using interwiki and a redirect site. And a nice visible list that wiki-spammers can diff every morning is a compact solution to the zillions of needles in a huge haystack approach. As I said, the community can solve the list maintenance when the technology to do one is sorted. A list has to be better than link by link.
So far on every other way that you've suggested, Greg, there have been a series of fairly tough to fix problems. I don't think "the answer may well fall out of a tree" is really going to find a way to implement something which has been discussed for at least a year.
================= Gregory Maxwell wrote:
On 1/24/07, Andrew Cates andrew@catesfamily.org.uk wrote: [snip]
There are plenty of ways of maintaining an adequate list for example DMOZ is a wikipedia partner project and we could take all the links from non-commercial pages of DMOZ as a 95% solution (then they can handle all the arguments).
[snip]
Anyone else find it deeply ironic that DMOZ would somehow be mentioned as a solution to our spamming problems?
For more background on the current state of DMOZ: http://www.skrenta.com/2006/12/dmoz_had_9_lives_used_up_yet.html
On 1/24/07, Andrew Cates andrew@catesfamily.org.uk wrote:
Okay, okay. As a DMOZ editor I take it on the chin, particularly the technical side there has serious issues.
What, you didn't sell your dmoz account to SEO's during the big rush? Sucker. :)
But in fact that's a red herring. The point is in principle it is relatively easy to maintain a long list via an bot+algo or a harvest or whatever and implement using interwiki and a redirect site. And a nice visible list that wiki-spammers can diff every morning is a compact solution to the zillions of needles in a huge haystack approach.
Oh, our goal is to help spammers track their progress?
I must be confused, because I thought we were trying to make quality articles. .. Remember those article things? The blocks of text that hopefully someone with knowledge about the subject cares about? The sort of folks who might actually have a clue about a link being useful or a worthless advertisement?
As I said, the community can solve the list maintenance when the technology to do one is sorted. A list has to be better than link by link.
So... We can handle writing an encyclopedia on a word by word basis.. but suddenly approving links is too hard?
Do you actually think that it's reasonable to ask admins to continually edit a page which is hundreds of megabytes, and somehow keep up with the thousands of totally valid links added across millions of pages by thousands of non-admins every week? Certainly you can't expect them to use bots to do this, because on enwiki at least it seems that running a bot as with sysop flag is a crime worse than murder.
So far on every other way that you've suggested, Greg, there have been a series of fairly tough to fix problems. I don't think "the answer may well fall out of a tree" is really going to find a way to implement something which has been discussed for at least a year.
What are you talking about? I don't follow you at all here.
Some form of article validation/stabilization feature one of the higher technical priority for the Foundation which directly impacts the quality of our product (unlike passing out pagerank to third party sites). Once implemented it would be utterly trivial to make the links in the approved versions non-nofollow and it wouldn't require overloading adminship with another inappropriate role which the admins can't scale up to perform.
I can tell you for free (since I went through thousands of them for the WP CD) that an article having a quality flag doesn't reduce the risk of vandalism or spam. It is more attractive than anything else.
And as I'm sure you know when you've finished enjoying your eloquence "wiki-spammers" or "wiki-spam-members" is what the folk at "Project-Wikispam" call themselves. I cannot bring myself to use "spam-member" of someone who is helping the project, hence I use title 1. Removing spam from WP is ever harder work and no one has predicted that "nofollow" will help with making the en better quality or help keep spam away. Phrases involving babies and bathwater come to mind.
As for what's reasonable... a bot with sysop powers may be an issue but plenty of sysops run bots.
btw the suckers were the ones who bought the accounts... ;)
======================== Gregory Maxwell wrote:
On 1/24/07, Andrew Cates andrew@catesfamily.org.uk wrote:
Okay, okay. As a DMOZ editor I take it on the chin, particularly the technical side there has serious issues.
What, you didn't sell your dmoz account to SEO's during the big rush? Sucker. :)
But in fact that's a red herring. The point is in principle it is relatively easy to maintain a long list via an bot+algo or a harvest or whatever and implement using interwiki and a redirect site. And a nice visible list that wiki-spammers can diff every morning is a compact solution to the zillions of needles in a huge haystack approach.
Oh, our goal is to help spammers track their progress?
I must be confused, because I thought we were trying to make quality articles. .. Remember those article things? The blocks of text that hopefully someone with knowledge about the subject cares about? The sort of folks who might actually have a clue about a link being useful or a worthless advertisement?
As I said, the community can solve the list maintenance when the technology to do one is sorted. A list has to be better than link by link.
So... We can handle writing an encyclopedia on a word by word basis.. but suddenly approving links is too hard?
Do you actually think that it's reasonable to ask admins to continually edit a page which is hundreds of megabytes, and somehow keep up with the thousands of totally valid links added across millions of pages by thousands of non-admins every week? Certainly you can't expect them to use bots to do this, because on enwiki at least it seems that running a bot as with sysop flag is a crime worse than murder.
So far on every other way that you've suggested, Greg, there have been a series of fairly tough to fix problems. I don't think "the answer may well fall out of a tree" is really going to find a way to implement something which has been discussed for at least a year.
What are you talking about? I don't follow you at all here.
Some form of article validation/stabilization feature one of the higher technical priority for the Foundation which directly impacts the quality of our product (unlike passing out pagerank to third party sites). Once implemented it would be utterly trivial to make the links in the approved versions non-nofollow and it wouldn't require overloading adminship with another inappropriate role which the admins can't scale up to perform.
On 1/24/07, Andrew Cates andrew@catesfamily.org.uk wrote:
I can tell you for free (since I went through thousands of them for the WP CD) that an article having a quality flag doesn't reduce the risk of vandalism or spam. It is more attractive than anything else.
Yes, as I said in my initial post.. I do not believe that simply overloading the not-vandalized flag by itself is sufficient, mostly because people don't bother looking at the externals.. and often people feel uncomfortable removing them. The same mechanism, however, could be easily applied to collect data on link reviews as they are performed... Even without that it would still give us a point of attack (i.e. when you validate a page, be sure to check new links), today we don't have such a hook.
[snip]
Removing spam from WP is ever harder work and no one has predicted that "nofollow" will help with making the en better quality or help keep spam away. Phrases involving babies and bathwater come to mind.
We saw an almost complete end of User: pages created by otherwise inactive users which were filled with nothing but external links after turning on nofollow for user: namespace a few months back.
I'd comment more, but this is going offtopic into the land of local wiki politics. Perhaps this discussion should be continued once someone has actually coded something?
Agree lets see something first and talk politics elsewhere. FWIW I put the no new user pages down to an intelligent change to robots.txt which seems to have recently been reversed (what links here isn't /w/ anymore... why, that helped). I might take it up on your user page when I've time
================== Gregory Maxwell wrote:
On 1/24/07, Andrew Cates andrew@catesfamily.org.uk wrote:
I can tell you for free (since I went through thousands of them for the WP CD) that an article having a quality flag doesn't reduce the risk of vandalism or spam. It is more attractive than anything else.
Yes, as I said in my initial post.. I do not believe that simply overloading the not-vandalized flag by itself is sufficient, mostly because people don't bother looking at the externals.. and often people feel uncomfortable removing them. The same mechanism, however, could be easily applied to collect data on link reviews as they are performed... Even without that it would still give us a point of attack (i.e. when you validate a page, be sure to check new links), today we don't have such a hook.
[snip]
Removing spam from WP is ever harder work and no one has predicted that "nofollow" will help with making the en better quality or help keep spam away. Phrases involving babies and bathwater come to mind.
We saw an almost complete end of User: pages created by otherwise inactive users which were filled with nothing but external links after turning on nofollow for user: namespace a few months back.
I'd comment more, but this is going offtopic into the land of local wiki politics. Perhaps this discussion should be continued once someone has actually coded something?
-----Original Message----- From: wikitech-l-bounces@lists.wikimedia.org [mailto:wikitech-l-bounces@lists.wikimedia.org] On Behalf Of Gregory Maxwell Sent: 24 January 2007 14:38 To: andrew.cates@soschildren.org; Wikimedia developers Subject: Re: [Wikitech-l] Extension request: Greenlist
On 1/24/07, Andrew Cates andrew@catesfamily.org.uk wrote:
Therefore, could I put in a request for a "greenlist"
feature to allow
sysop approved links to be generated without rel="nofollow"?
There are well over 8 million external links in enwiki alone, including well over a half million distinct domains via HTTP.
A greenlist is not a remotely sane solution because of this...
A Bloom Filter could hold the entire 8 million urls in about 1 megabyte with a false positive probability of 1% or 1.5megabytes with a 0.1% probability.
Whilst loading in and out of PHP wouldn't be nice, a separate service is certainly feasible.
Jared
Jared Williams wrote:
-----Original Message----- From: Gregory Maxwell On 1/24/07, Andrew Cates andrew@catesfamily.org.uk wrote:
Therefore, could I put in a request for a "greenlist" feature to allow sysop approved links to be generated without rel="nofollow"?
There are well over 8 million external links in enwiki alone, including well over a half million distinct domains via HTTP.
A greenlist is not a remotely sane solution because of this...
A Bloom Filter could hold the entire 8 million urls in about 1 megabyte with a false positive probability of 1% or 1.5megabytes with a 0.1% probability.
Whilst loading in and out of PHP wouldn't be nice, a separate service is certainly feasible.
A simple database table containing a list of hostnames is probably the simplest solution, and can be queried efficiently. The size of the table should not be an issue -- after all, we _already have_ all this data in our database in a much less compact format.
On 24.01.2007 15:37, Gregory Maxwell wrote:
On 1/24/07, Andrew Cates andrew@catesfamily.org.uk wrote:
Therefore, could I put in a request for a "greenlist" feature to allow sysop approved links to be generated without rel="nofollow"?
There are well over 8 million external links in enwiki alone, including well over a half million distinct domains via HTTP.
The foundation should be able to receive 10 cents per outgoing click to a commercial site :-)
Some implementation ideas (just kidding):
Links could be routed to an intermediate page stating "You have clicked a link to a commercial site. Thank you for using Wikipedia, you are now redirected to the target".
Commercial target domain owners could then create logins and prepay money.
Domains with zero prepay balance would result in a page with big pictures/text about the details how to donate to the foundation and optionally with some other advertising.
Logged in users (Wikipedians) should be able to turn off the intermediate page in their preferences (default: turned off).
On 1/25/07, Gregory Maxwell gmaxwell@gmail.com wrote:
There are well over 8 million external links in enwiki alone, including well over a half million distinct domains via HTTP.
A greenlist is not a remotely sane solution because of this...
Sane solutions are, however, possible.
I was thinking the simplest (from a user perspective) solution would be to designate any external link that had survived some given time frame (e.g., a week) as a non-spam link, and thus to switch off nofollow for it.
Obviously it wouldn't be perfect, but it might be close. Driveby spamming would have no effect, but useful links added by Wikipedians would, after a week, be considered real links.
I imagine this would be fairly difficult to implement though.
Steve
wikitech-l@lists.wikimedia.org