This query compiles a list of external links (from the main namespace) grouped by website:
select count(*), trim(leading 'www.' from substring_index(substring_index(el_to, '/', 3),'/',-1)) as site from externallinks, page where el_from = page_id and page_namespace=0 group by 2 having count(*)> 10 order by 1;
Running this for no.wikipedia (nowiki_p), these are the top 5 sites:
8073 | snl.no 8586 | vegvesen.no 8907 | imdb.com 21924 | commons.wikimedia.org 46662 | toolserver.org
I'd like to run this once a week for many of our sites, and compile the statistics. Is anybody doing this already? Is it bad or impossible?
Where should I store the results? How do I select... from each database (nowiki_p, dawiki_p, svwiki_p, ...) and insert into a one central database table? Does MySQL handle that? The statistics table should have columns also for site and date, e.g. the four columns:
no.wikipedia 2011-11-25 8907 imdb.com
Afterwards, we could select sum()...group by... any of the columns and make all sorts of interesting statistics.