On 4/10/06, Aerik aerik@thesylvans.com wrote:
Well, I'm a developer wanna-be and not a real developer :-) but if we set aside for a moment the policy type questions about whitelisting and nofollow, whitelisting does not need to be the same process as blacklisting. For blacklisting, we need fancy regexes because people will try to get around the list - for whitelist, we could probably define a much simpler validation (to be applied when the parser parses the url) because if someone is posting an url that is whitelisted, they will know what url they're trying to match. There's still an issue of scale - checking against thousands or tens of thousand of whitelisted urls may not be feasible - but calling "in_array" for example is probably *much* faster than bunches of regexes.
Right right.. There are all sorts of ways to make set membership testing fast... esp if "probably a member" is good enough (which it is for us).
Really then, whitelisting becomes the same as the redlink/bluelink challenge.. There are 5.9 million external links currently in enwiki vs ~3.8million pages. So it's obviously not intractable.
Still, we come back to the question... it it worth it? Why not just set nofollow on all links and save our spamfighting resources to deal with the people who aren't just concerned with SEO?