Tracking external link statistics over time - Toolserver-l

25 Nov 2011

This query compiles a list of external links (from the main namespace)
grouped by website:

select count(*), trim(leading 'www.' from substring_index(substring_index(el_to,
'/', 3),'/',-1)) as site
  from externallinks, page
  where el_from = page_id and page_namespace=0
  group by 2  having count(*)>  10 order by 1;

Running this for no.wikipedia (nowiki_p), these are the top 5 sites:

   8073 | snl.no
   8586 | vegvesen.no
   8907 | imdb.com
  21924 | commons.wikimedia.org
  46662 | toolserver.org

I'd like to run this once a week for many of our sites, and compile
the statistics. Is anybody doing this already? Is it bad or impossible?

Where should I store the results? How do I select... from each database
(nowiki_p, dawiki_p, svwiki_p, ...) and insert into a one central
database table? Does MySQL handle that? The statistics table
should have columns also for site and date, e.g. the four columns:

no.wikipedia 2011-11-25 8907 imdb.com

Afterwards, we could select sum()...group by... any of the columns
and make all sorts of interesting statistics.

-- 
   Lars Aronsson (lars(a)aronsson.se)
   Aronsson Datateknik - http://aronsson.se