Assuming this was public, I could use this data on seldom edited Wikis to find out which editors likely have old browser/OS versions with vulnerabilities that I could attack[1].  This would be easier and easier the more dimensions you add to the data.  

<re-reads>

OK.  The anonymization strategy for dropping records that represent < 50 distinct editors seems to address this concern.   50 edits is a lot.  So this data wouldn't be too terribly useful for under-active wikis.  Then again, if you just want to a sense for what the dominant browser/OS pairs are, then they will likely represent > 50 unique editors on most projects. 

1. Props to Matt Flaschen and Dan Andreescu for helping me work through the implications of that one. 

On Tue, Mar 3, 2015 at 9:59 PM, Oliver Keyes <okeyes@wikimedia.org> wrote:
Yeah, makes sense.

On 3 March 2015 at 20:38, Nuria Ruiz <nuria@wikimedia.org> wrote:
>>Agreed. Do we have a way of syncing files to Labs yet?
> No need to sync if file is available in an endpoint like
> htpp://some-data-here
>
> On Tue, Mar 3, 2015 at 4:50 PM, Oliver Keyes <okeyes@wikimedia.org> wrote:
>>
>> On 3 March 2015 at 19:35, Nuria Ruiz <nuria@wikimedia.org> wrote:
>> >>Erik has asked me to write an exploratory app for user-agent data. The
>> >>idea is to enable Product Managers and engineers to easily explore
>> >>what users use so they know what to support. I've thrown up an example
>> >>screenshot at http://ironholds.org/agents_example_screen.png
>> >
>> > I cannot speak as to the interest of community about this data but for
>> > developers and PM we should make sure we have a solid way to update any
>> > data
>> > we put up. User Agent data is outdated as soon as a new version of
>> > android
>> > or iOs is released, a new popular phone comes along or a new autoupdate
>> > for
>> > popular browsers. Not only that, if we make changes to, say, redirect
>> > all
>> > iPad users to the desktop site we want to asses effect of those changes
>> > as
>> > soon as possible. A monthly update will be a must. Also distinguishing
>> > between browser percentages on desktop site versus mobile site versus
>> > apps
>> > is a must for this data to be real useful for PMs and developers
>> > (specially
>> > for bug triage).
>> >
>>
>> Yes! However, I am addressing a specific ad-hoc request. If there is a
>> need for this (I agree there is) I hope Toby and Kevin can eke out the
>> time on the Analytics Engineering schedule to work on it; y'all are a
>> lot better at infrastructure work than me :).
>>
>> >
>> > We have couple backlog items to make monthly reports on this regard. A
>> > UI on
>> > top of them will be superb.
>> >
>>
>> Agreed. Do we have a way of syncing files to Labs yet? That's the
>> biggest blocker. The UI doesn't care what the file contains as long as
>> it's a TSV with a header row - I've deliberately built it so that
>> things like the download links are dynamic and can change.
>>
>> >
>> >
>> >
>> >
>> > On Tue, Mar 3, 2015 at 1:05 PM, Oliver Keyes <okeyes@wikimedia.org>
>> > wrote:
>> >>
>> >> Hey all,
>> >>
>> >> (Sending this to the public list because it's more transparent and I'd
>> >> like people who think this data is useful to be able to shout out)
>> >>
>> >> Erik has asked me to write an exploratory app for user-agent data. The
>> >> idea is to enable Product Managers and engineers to easily explore
>> >> what users use so they know what to support. I've thrown up an example
>> >> screenshot at http://ironholds.org/agents_example_screen.png  (I'd
>> >> host it on Commons, inb4Dario, but I'm not sure the copyright status
>> >> of the UI)
>> >>
>> >> One side-effect of this is that we end up with files of common user
>> >> agents, split between {readers,editors} and {mobile, desktop}, parsed
>> >> and unparsed. I'd like to release these files. The reuse potential is
>> >> twofold; researchers and engineers can use the parsed files to see
>> >> what browser penetration looks like globally and what browsers should
>> >> be supported at a top-10, and software engineers can use the unparsed
>> >> files to improve detection rates.
>> >>
>> >> The privacy implications /should/ be minimal, because of how this data
>> >> is gathered. The editor data is gathered from the checkuser table,
>> >> globally, and automatically excludes any user agent used by fewer than
>> >> 50 distinct usernames. The reader data is gathered from a month of
>> >> 1:1000 sampled log files, and excludes any agent responsible for fewer
>> >> than 500 pageviews in a 24 hour period (except, sampled. So,
>> >> practically speaking, that's 500,000 pageviews)
>> >>
>> >> What do people think about making this a data release? Would people
>> >> get value from the data, as well as the tool?
>> >>
>> >> --
>> >> Oliver Keyes
>> >> Research Analyst
>> >> Wikimedia Foundation
>> >>
>> >> _______________________________________________
>> >> Analytics mailing list
>> >> Analytics@lists.wikimedia.org
>> >> https://lists.wikimedia.org/mailman/listinfo/analytics
>> >
>> >
>> >
>> > _______________________________________________
>> > Analytics mailing list
>> > Analytics@lists.wikimedia.org
>> > https://lists.wikimedia.org/mailman/listinfo/analytics
>> >
>>
>>
>>
>> --
>> Oliver Keyes
>> Research Analyst
>> Wikimedia Foundation
>>
>> _______________________________________________
>> Analytics mailing list
>> Analytics@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
>
> _______________________________________________
> Analytics mailing list
> Analytics@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>



--
Oliver Keyes
Research Analyst
Wikimedia Foundation

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics