Re: [Wikitech-l] User agent statistics

25 Nov 2008

Domas Mituzas wrote:
...
  Helloes,

  I think this would be a very cool way to have
accurate statistics
 about browser usage, not only on Wikipedia.  

 I was hesitating to work on that simply because it is not interesting  
 as 24/7 statistics of pageviews, though may be interesting to see a  
 day-long snapshots with per-project+per-country split every quarter or  
 so.
 For that we'd need:

 a) good UA header parser (in C...) 
What should it parse?
Some people have talked about cutting the UA to some abstraction level 
but IMHO it's better to aggregate the whole header and group by browser.

So you could get something like this:

*Mozilla Firefox 2%
**Mozilla Firefox 3
**Mozilla Firefox 2
**Mozilla Firefox 1 and older

*Internet Explorer 5%
**Internet Explorer 8 (beta) - 0.5%
**Internet Explorer 7 - 2%
**Internet Explorer 6 - 2%
**Internet Explorer 5 and older - 0.5%
***Mozilla/4.0 Windows 95 IE4.0 broken 0.0001%

....

Getting hits to the detail will allow to check that the filters are 
right. And how many different UA headers we may get? 50, 80, 100? It's 
perfectly acceptable.

Yes, there is information on the User Agents which shouldn't be there, 
most notably, IE wants to announce everywhere your OS, service pack, and 
even several .NET versions.
But when they're aggregated, just knowing that 0.01% of the hits (not 
even users!) came from a Windows 3.1 isn't really breaking Foo's privacy.
It might, if you included your name and address into your User-Agent, 
but as all sites you browse learn about it, then you have bigger 
problems than Wikimedia finding it.
I can think of one case where you may get in trouble: your boss finding 
the company-customized UA, when employees weren't supposed to visit 
wiktionary. But he could as well install a proxy or sniff the traffic.

Plus all this data is also useful on another ways (eg. aggregations by 
OS, I'm sure we would get some surprises) and can itself be used as 
source for subsequent studies.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] User agent statistics