On 4/10/06, Aerik <aerik(a)thesylvans.com> wrote:
Well, I'm a developer wanna-be and not a real
developer :-) but if we
set aside for a moment the policy type questions about whitelisting
and nofollow, whitelisting does not need to be the same process as
blacklisting. For blacklisting, we need fancy regexes because people
will try to get around the list - for whitelist, we could probably
define a much simpler validation (to be applied when the parser parses
the url) because if someone is posting an url that is whitelisted, they
will know what url they're trying to match. There's still an issue of
scale - checking against thousands or tens of thousand of whitelisted
urls may not be feasible - but calling "in_array" for example is probably
*much* faster than bunches of regexes.
Right right.. There are all sorts of ways to make set membership
testing fast... esp if "probably a member" is good enough (which it is
for us).
Really then, whitelisting becomes the same as the redlink/bluelink
challenge.. There are 5.9 million external links currently in enwiki
vs ~3.8million pages. So it's obviously not intractable.
Still, we come back to the question... it it worth it? Why not just
set nofollow on all links and save our spamfighting resources to deal
with the people who aren't just concerned with SEO?