Andrew Gray wrote:
Mmm. We get quite a few of these (and if we're
unlucky, it's squatted
by a pornsite). Is there any practical way of spidering through our
links to check for these?
Interesting question. I could think of two ways.
One would be to take a large sample of domain names, check them before
and after expiration, and develop some sort of fingerprint for the
squatters. E.g., IP hosting blocks, DNS servers, WHOIS records, page
content, page links, or server info.
The other would be to crawl all our external links and check for
significant changes in the pages after WHOIS changes (or perhaps major
nameserver changes if we can't find a source for bulk WHOIS queries. I
think we could get at significance by using our article pages to
recognize important words or word frequency patterns on the linked pages
and noting significant deviations.
The lamer version would just be to make a list of links to domains that
appear to have changed hands recently. That'd have a higher error rate,
but would be pretty easy to build.
William
--
William Pietri <william(a)scissor.com>
http://en.wikipedia.org/wiki/User:William_Pietri