[Toolserver-l] Tracking external link statistics over time
Lars Aronsson
lars at aronsson.se
Fri Nov 25 21:20:49 UTC 2011
This query compiles a list of external links (from the main namespace)
grouped by website:
select count(*), trim(leading 'www.' from substring_index(substring_index(el_to, '/', 3),'/',-1)) as site
from externallinks, page
where el_from = page_id and page_namespace=0
group by 2 having count(*)> 10 order by 1;
Running this for no.wikipedia (nowiki_p), these are the top 5 sites:
8073 | snl.no
8586 | vegvesen.no
8907 | imdb.com
21924 | commons.wikimedia.org
46662 | toolserver.org
I'd like to run this once a week for many of our sites, and compile
the statistics. Is anybody doing this already? Is it bad or impossible?
Where should I store the results? How do I select... from each database
(nowiki_p, dawiki_p, svwiki_p, ...) and insert into a one central
database table? Does MySQL handle that? The statistics table
should have columns also for site and date, e.g. the four columns:
no.wikipedia 2011-11-25 8907 imdb.com
Afterwards, we could select sum()...group by... any of the columns
and make all sorts of interesting statistics.
--
Lars Aronsson (lars at aronsson.se)
Aronsson Datateknik - http://aronsson.se
More information about the Toolserver-l
mailing list