Garry,
Replying again to clarify matters a bit.
>Your proposed user agent would basically mean that
every single person
using the most up-to-date version of the app on a particular
platform would
be indistinguishable from each other. This would,
unfortunately, lead to
lots of innocent users getting blocked as sockpuppets.
There are two types of data we are considering when it comes to data
gathering and storage: operational and application data.
Operational data is retained for a 90 day period, it is logged per request
and not manipulated in any way. The second type of data is application
data, data that comes from logging an application event (like event
logging) or tracks the usage of a feature. This data will be aggregated to
avoid privacy concerns.
The google doc that describes user agent data collection does so for
application data, i.e. data we wish to retain long term.
https://office.wikimedia.org/wiki/Analytics/Internal/EventLogging/PrivateDa…
.
I think perhaps the confusion here comes from not defining who are the
consumers of the user agent format mobile is proposing.
Your proposed user agent would basically mean that
every single person
using the most up-to-date version of the app on a particular
platform would
be indistinguishable from each other. This would,
unfortunately, lead to
lots of innocent users getting blocked as sockpuppets.
By whom? how? via checkuser extension or other system?. It is worth having
in mind that a mobile application is not a website (i.e. requests do not
come from a browser, they come from an http client) and thus you might not
be able to detect false accounts in the same fashion. For example, it is
not strange that all users of a telco appear to come from a small set of IP
addresses. In that case the IP bit of a request is not very significant
when it comes to uniquely identify a user.
On Thu, Mar 27, 2014 at 10:20 AM, Nuria Ruiz <nuria(a)wikimedia.org> wrote:
>
> >As a checkuser, user agents are an important part of my workflow for
> identifying that multiple accounts are owned by the same person.
> > So I'm going to have to argue for including more information in the user
> agent.
>
> Including more information on the UA, while being covered by legal under
> the new privacy policy, really goes agains the wishes of the community as
> they do not wish to be finger printed.
> See:
>
https://www.mediawiki.org/wiki/Talk:EventLogging/UserAgentSanitization or
>
https://meta.wikimedia.org/wiki/Talk:Privacy_policy
> There has been plenty more discussions about this on analytics e-mail list.
>
>
>
Your proposed user agent would basically mean that
every single person
> using the most up-to-date version of the app on a
particular platform would
>
be indistinguishable from each other. This would,
unfortunately, lead to
> lots of innocent users getting blocked as sockpuppets.
>
> However, note that the UA " WikipediaApp/<version>
> <OS>/<form-factor>/<version>" clearly satisfies the use case
of the mobile
> team. It provides as much information as they need from their user without
> sending any private data.
>
> Can you please list what is your use case? Namely how are you identifying
> "false" accounts. Perhaps relying on the user agent to do so is not the
> best strategy going forward. Have in mind that with the old privacy policy
> UA data needed to be discarded after 90 days. With the new policy there is
> more legal room but given community feedback analytics team is planning
> on aggregating all UA information in the future. This means that UA data
> will not be stored (or reported) per user or request but rather agreggated
> (as in "4% of users use iPhone").
>
> We gathered recently information from all teams as to use cases pertaining
> UA data collection:
>
>
https://office.wikimedia.org/wiki/Analytics/Internal/EventLogging/PrivateDa…
> .
>
> Let's talk about your use case and add it to the document that already
> exists describing usages of user agent data, this document was sent out to
> all teams couple months ago but there is no description of your use case
> there:
>
>
https://docs.google.com/a/wikimedia.org/document/d/1bp6qrvYi0Mh7l0s1psGnXEE…
>
>
>
>
>
>
> On Wed, Mar 26, 2014 at 11:20 PM, Dan Garry <dgarry(a)wikimedia.org> wrote:
>
>> Hey Yuvi,
>>
>> As a checkuser, user agents are an important part of my workflow for
>> identifying that multiple accounts are owned by the same person. So I'm
>> going to have to argue for including more information in the user agent.
>> Your proposed user agent would basically mean that every single person
>> using the most up-to-date version of the app on a particular platform would
>> be indistinguishable from each other. This would, unfortunately, lead to
>> lots of innocent users getting blocked as sockpuppets.
>>
>> Here's an example of a user agent from an iPhone using Safari:
>> Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_3_1 like Mac OS X; zh-tw)
>> AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8G4
>> Safari/6533.18.5
>>
>> Look at all of that wonderful information! ;-) In general, the more
>> information you can include without breaching the user's privacy, the
>> better.
>>
>> I'd be happy to work with you on this.
>>
>> Thanks,
>> Dan
>>
>> P.S. You may also want to consult with the legal team, to ensure that an
>> unacceptable levels of private information are not given out. They would
>> also make a complement for me; I would likely be pulling in the direction
>> of "MOAR INFORMATION!", whereas they would likely be pulling in the
>> direction of "LESS INFORMATION!". :-)
>>
>>
>> On 26 March 2014 15:00, Yuvi Panda <yuvipanda(a)gmail.com> wrote:
>>
>>> Add Analytics to cc, as I think they'll be interested as well :)
>>>
>>> On Thu, Mar 27, 2014 at 3:20 AM, Yuvi Panda <yuvipanda(a)gmail.com>
wrote:
>>> > Hello!
>>> >
>>> > We are getting closer to a general release of the Wikipedia Android
>>> > and iOS apps, and I think we should standardize on a User-Agent
>>> > format. The old app just appended an identifier in front of the
>>> > phone's default UA[1] but I think we can do better, to avoid
privacy
>>> > concerns[2].
>>> >
>>> > How about:
>>> >
>>> > WikipediaApp/<version>
<OS>/<form-factor>/<version>
>>> >
>>> > This gives us all the info we need (App version, OS, Form Factor
>>> > (Tablet / Phone) and OS version) without giving away too much. It is
>>> > also fairly simple to construct and parse.
>>> >
>>> > For the latest alpha, my Nexus 4 would generate
>>> >
>>> > WikipediaApp/32 Android/Phone/4.4
>>> >
>>> > While an iOS device might generate
>>> >
>>> > WkipediaApp/2.0 iOS/Phone/7.1
>>> >
>>> > form-factor would just be Phone|Tablet for now, and can be expanded
>>> > later if necessary.
>>> >
>>> > Thoughts?
>>> >
>>> > [1]:
https://www.mediawiki.org/wiki/Mobile/User_agents#Apps
>>> > [2]:
https://www.mediawiki.org/wiki/EventLogging/UserAgentSanitization
>>> > --
>>> > Yuvi Panda T
>>> >
http://yuvi.in/blog
>>>
>>>
>>>
>>> --
>>> Yuvi Panda T
>>>
http://yuvi.in/blog
>>>
>>> _______________________________________________
>>> Analytics mailing list
>>> Analytics(a)lists.wikimedia.org
>>>
https://lists.wikimedia.org/mailman/listinfo/analytics
>>>
>>
>>
>>
>> --
>> Dan Garry
>> Associate Product Manager for Platform
>> Wikimedia Foundation
>>
>> _______________________________________________
>> Analytics mailing list
>> Analytics(a)lists.wikimedia.org
>>
https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>