On Wed, Nov 26, 2008 at 6:08 AM, Platonides Platonides@gmail.com wrote:
Gregory Maxwell wrote:
On Tue, Nov 25, 2008 at 5:31 PM, Platonides wrote: [snip]
Getting hits to the detail will allow to check that the filters are right. And how many different UA headers we may get? 50, 80, 100? It's perfectly acceptable.
On Tue, Nov 25, 2008 at 5:52 PM, Marco Schuster wrote:
I'd basically think of 300 different UAs, but that shouldn't be a major problem to handle, I think.
Only counting 1:100 JS executing browsers hitting enwp there were 78,033 unique user agent strings yesterday.
This is due to all the weird crap that gets thrown into the strings which takes me back to my original post.
Sometimes to the point of making almost unique to some machines http://meta.wikimedia.org/w/index.php?title=Vandalism_reports&diff=prev&...
Really. Making a manual mapping will not work.
Not neccessarily manual but I thought it was a number easier to abstract and review.
Could you share the list of headers?
If you'd like to try writing a scrubber I'd be glad to run it and give you feedback. If you need some examples of weird agents, I can make some for you.