Gregory Maxwell <gmaxwell@...> writes:
Right now we do our URL blacklist at page submission, thats not a fast path so we can do computationally expensive things like apply a long list of regexes... to black or whitelist URLs for nofollow we'd need to perform it at page load, which might not be acceptable. The only alternatives I can see involve complex changes.
Well, I'm a developer wanna-be and not a real developer :-) but if we set aside for a moment the policy type questions about whitelisting and nofollow, whitelisting does not need to be the same process as blacklisting. For blacklisting, we need fancy regexes because people will try to get around the list - for whitelist, we could probably define a much simpler validation (to be applied when the parser parses the url) because if someone is posting an url that is whitelisted, they will know what url they're trying to match. There's still an issue of scale - checking against thousands or tens of thousand of whitelisted urls may not be feasible - but calling "in_array" for example is probably *much* faster than bunches of regexes.
-Aerik