Gregory Maxwell <gmaxwell@...> writes:
Right now we do our URL blacklist at page submission,
thats not a
fast path so we can do computationally expensive things like apply a
long list of regexes... to black or whitelist URLs for nofollow we'd
need to perform it at page load, which might not be acceptable. The
only alternatives I can see involve complex changes.
Well, I'm a developer wanna-be and not a real developer :-) but if we
set aside for a moment the policy type questions about whitelisting
and nofollow, whitelisting does not need to be the same process as
blacklisting. For blacklisting, we need fancy regexes because people
will try to get around the list - for whitelist, we could probably
define a much simpler validation (to be applied when the parser parses
the url) because if someone is posting an url that is whitelisted, they
will know what url they're trying to match. There's still an issue of
scale - checking against thousands or tens of thousand of whitelisted
urls may not be feasible - but calling "in_array" for example is probably
*much* faster than bunches of regexes.
-Aerik