Peter Gervai wrote:
Hello,
I see I've created quite a stir around, but so far nothing really useful popped up. :-(
But I see that one from Neil:
Yes, modifying the http://stats.grok.se/ systems looks like the way to go.
For me it doesn't really seem to be, since it seems to be using an extremely dumbed down version of input, which only contains page views and [unreliable] byte counters. Most probably it would require large rewrites, and a magical new data source.
What do people actually want to see from the traffic data? Do they want referrers, anonymized user trails, or what?
Are you old enough to remember stats.wikipedia.org? As far as I remember originally it ran webalizer, then something else, then nothing. If you check a webalizer stat you'll see what's in it. We are using, or we used until our nice fellow editors broke it, awstats, which basically provides the same with more caching.
Most used and useful stats are page views (daily and hourly stats are pretty useful too), referrers, visitor domain and provider stats, os and browser stats, screen resolution stats, bot activity stats, visitor duration and depth, among probably others.
At a brief glance I could replicate the grok.se stats easily since it seems to work out of http://dammit.lt/wikistats/, but it's completely useless for anything beyond page hit count.
Is there a possibility to write a code which process raw squid data? Who do I have to bribe? :-/
We do have http://stats.wikimedia.org/ which includes things like http://stats.wikimedia.org/EN/VisitorsSampledLogOrigins.htm