Andrew Gray wrote:
Mmm. We get quite a few of these (and if we're unlucky, it's squatted by a pornsite). Is there any practical way of spidering through our links to check for these?
Interesting question. I could think of two ways.
One would be to take a large sample of domain names, check them before and after expiration, and develop some sort of fingerprint for the squatters. E.g., IP hosting blocks, DNS servers, WHOIS records, page content, page links, or server info.
The other would be to crawl all our external links and check for significant changes in the pages after WHOIS changes (or perhaps major nameserver changes if we can't find a source for bulk WHOIS queries. I think we could get at significance by using our article pages to recognize important words or word frequency patterns on the linked pages and noting significant deviations.
The lamer version would just be to make a list of links to domains that appear to have changed hands recently. That'd have a higher error rate, but would be pretty easy to build.
William