More. Ideas? I'll email wikitech-l separately.
---------- Forwarded message ----------
From: Adam Baso <abaso(a)wikimedia.org>
Date: Wed, Apr 16, 2014 at 2:20 PM
Subject: Re: [Wikimedia-l] Mobile Operator IP Drift Tracking and Remediation
To: Wikimedia Mailing List <wikimedia-l(a)lists.wikimedia.org>
Great idea!
Anyone on the list know if there's a way to make the debug log facilities
do the YYYYMMDD timestamp instead of the longer one?
If not, I suppose we could work to update the core MediaWiki code. [1]
-Adam
1. For those with PHP skills or equivalent, I'm referring to
https://git.wikimedia.org/blob/mediawiki%2Fcore.git/a26687e81532def3faba646….
Scroll to the bottom of the function definition to see the datetimestamp
approach.
On Wed, Apr 16, 2014 at 12:47 PM, Andrew Gray <andrew.gray(a)dunelm.org.uk>wrote:
> Hi Adam,
>
> One thought: you don't really need the date/time data at any detailed
> resolution, do you? If what you're wanting it for is to track major
> changes ("last month it all switched to this IP") and to purge old
> data ("delete anything older than 10 March"), you could simply log day
> rather than datetime.
>
> enwiki / 127.0.0.1 / 123.45 / 2014-04-16:1245.45
>
> enwiki / 127.0.0.1 / 123.45 / 2014-04-16
>
> - the latter gives you the data you need while making it a lot harder
> to do any kind of close user-identification.
>
> Andrew.
> On 16 Apr 2014 19:17, "Adam Baso" <abaso(a)wikimedia.org> wrote:
>
> > Inline.
> >
> > Thanks for starting this thread.
> > >
> > > Sorry if I've overlooked this, but who/what will have access to this
> > data?
> > > Only members of the mobile team? Local project CheckUsers? Wikimedia
> > > Foundation-approved researchers? Wikimedia shell users? AbuseFilter
> > > filters?
> > >
> >
> > It's a good question. The thought is to put it in the customary
> wfDebugLog
> > location (with, for example, filename "mccmnc.log") on fluorine.
> >
> > It just occurred to me that the wiki name (e.g., "enwiki"), but not the
> > full URL, gets logged additionally as part of the wfDebugLog call; to
> make
> > the implicit explicit, wfDebugLog adds a datetime stamp as well, and
> that's
> > useful for purging old records. I'll forward this email to mobile-l and
> > wikitech-l to underscore this.
> >
> >
> > > And this may be a silly question, but is there a reasonable means of
> > > approximating how identifying these two data points alone are? That is,
> > > Using a mobile country code and exit IP address, is it possible to
> > > identify a particular editor or reader? Or perhaps rephrased, is this
> > data
> > > considered anonymized?
> > >
> >
> > Not a silly question. My approximation is these tuples (datetime, now
> that
> > it hit me - XYwiki, exit IP, and MCC-MNC) alone, although not perfectly
> > anonymized, are low identifying (that is, indirect inferences on the data
> > in isolation are unlikely, but technically possible, through examination
> of
> > short tail outliers in a cluster analysis where such readers/editors
> exist
> > in the short tail outliers sets), in contrast to regular web access logs
> > (where direct inferences are easy).
> >
> > Thanks. I'll forward this along now.
> >
> > -Adam
> > _______________________________________________
> > Wikimedia-l mailing list
> > Wikimedia-l(a)lists.wikimedia.org
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
> _______________________________________________
> Wikimedia-l mailing list
> Wikimedia-l(a)lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
>
Okay, I updated https://gerrit.wikimedia.org/r/#/c/126188/.
Here's what the UA looks like:
WikipediaApp/4.0 (iPhone OS 7.1; Phone)
Is that right?
Note, the #.# will change from 4.0 to a #.#.# per Apple guidelines. Looks
like the YAML covers that.
The form factor can take on "Phone", "Tablet", or "Other". If another form
factor idiom is introduced, I suppose we'll want to update the code to
reflect another form factor idiom and not just use "Other".
-Adam
On Wed, Apr 16, 2014 at 12:45 PM, Christian Aistleitner <
christian(a)quelltextlich.at> wrote:
> On Wed, Apr 16, 2014 at 10:07:44AM -0700, Oliver Keyes wrote:
> > Okay, worked it out for iOS. You want something like...
> >
> > WikipediaApp/v2.0 (iPhone OS 4_4; [tablet/mobile])
>
> The "v" after the slash in "WikipediaApp/v2.0" looks unusual.
> While the RfC does not forbid them, none of the RfC's examples comes
> with such a "v", nor do we see "v"s used much in the wild.
>
> Best regards,
> Christian
>
> --
> ---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
> Companies' registry: 360296y in Linz
> Christian Aistleitner
> Gruendbergstrasze 65a Email: christian(a)quelltextlich.at
> 4040 Linz, Austria Phone: +43 732 / 26 95 63
> Fax: +43 732 / 26 95 63
> Homepage: http://quelltextlich.at/
> ---------------------------------------------------------------
>
Moving to mobile-l.
---------- Forwarded message ----------
From: Vibha Bamba <vbamba(a)wikimedia.org>
Date: Wed, Apr 16, 2014 at 11:41 PM
Subject: Captcha Inspiration
To: Yuvaraj Pandian <yuvipanda(a)wikimedia.org>, Monte Hurd <
mhurd(a)wikimedia.org>
I've been seeing some simpler examples of captcha.
Certainly we need community consensus to simplify form entry, what do you
think about this?
http://minus.com/i/PRu9xzYTFDHy
----
Vibha Bamba
Senior Designer | WMF Design
*To expect the unexpected shows a thoroughly modern intellect - Oscar Wilde*
FYI to mobile-l - reply to MZMcBride. See
http://lists.wikimedia.org/pipermail/wikimedia-l/2014-April/071131.html to
follow or contribute to the thread on wikimedia-l if you're not subscribed
there already.
-Adam
---------- Forwarded message ----------
From: Adam Baso <abaso(a)wikimedia.org>
Date: Wed, Apr 16, 2014 at 11:16 AM
Subject: Re: [Wikimedia-l] Mobile Operator IP Drift Tracking and Remediation
To: Wikimedia Mailing List <wikimedia-l(a)lists.wikimedia.org>
Inline.
Thanks for starting this thread.
>
> Sorry if I've overlooked this, but who/what will have access to this data?
> Only members of the mobile team? Local project CheckUsers? Wikimedia
> Foundation-approved researchers? Wikimedia shell users? AbuseFilter
> filters?
>
It's a good question. The thought is to put it in the customary wfDebugLog
location (with, for example, filename "mccmnc.log") on fluorine.
It just occurred to me that the wiki name (e.g., "enwiki"), but not the
full URL, gets logged additionally as part of the wfDebugLog call; to make
the implicit explicit, wfDebugLog adds a datetime stamp as well, and that's
useful for purging old records. I'll forward this email to mobile-l and
wikitech-l to underscore this.
> And this may be a silly question, but is there a reasonable means of
> approximating how identifying these two data points alone are? That is,
> Using a mobile country code and exit IP address, is it possible to
> identify a particular editor or reader? Or perhaps rephrased, is this data
> considered anonymized?
>
Not a silly question. My approximation is these tuples (datetime, now that
it hit me - XYwiki, exit IP, and MCC-MNC) alone, although not perfectly
anonymized, are low identifying (that is, indirect inferences on the data
in isolation are unlikely, but technically possible, through examination of
short tail outliers in a cluster analysis where such readers/editors exist
in the short tail outliers sets), in contrast to regular web access logs
(where direct inferences are easy).
Thanks. I'll forward this along now.
-Adam
On 16 April 2014 10:21, Christian Aistleitner <christian(a)quelltextlich.at>wrote:
> Hi Oliver,
>
> On Wed, Apr 16, 2014 at 09:15:53AM -0700, Oliver Keyes wrote:
> > So, it identifies the first one as Android, but can't pick out version
> > number,
>
> you're lagging behind master. Android version should be correctly
> picked since
>
>
> https://github.com/tobie/ua-parser/commit/e9d5238513b3184ef0cbcb6e4c403a20f…
>
>
Good catch! Updated at my end.
> > and identifies the second as running Mobile Safari, but can't pick
> > out the OS or device.
> >
> > I would recommend tweaking and testing these strings
> > before deploying them [...]
>
> Regardless of how you tweak the User-Agent strings ... how would you
> get ua_parser the to report the User-Agent family as "WikipediaApp"?
>
> You would have to teach ua_parser about it.
>
> And if we have to teach ua_parser something anyways ... we might as
> well stick with standards for our User-Agents and teach ua_parser to
> extract not only the User-Agent, but also to be more robust when
> extracting OS information.
>
> That would benefit us and ua_parser.
>
> It's just a simple two line patch [1].
>
>
Sure; for app identification we could just handle it ourselves - we
probably want to avoid pushing WM-specific strings upstream.
> > if we want accurate device numbers (and we totally
> > want accurate device numbers).
>
> Device information is not at all included in the User-Agent.
> And that's actually good. No need to leak all over the Internet who
> uses which device.
>
> But as device information is not included in the User-Agent, we cannot
> parse it out to get per device numbers.
>
It's not at the moment, but it could be, and I think that just including
device */class/* (tablet versus mobile versus other) would probably be
fine. I don't see how this would be 'leak[ing] all over the internet'.
>
> Have fun,
> Christian
>
>
>
> [1] Something along the lines of (probably do not want to split
> version number parts at -, but do not know)
>
> git diff HEAD^
> diff --git a/regexes.yaml b/regexes.yaml
> index 3ecd0b4..cfdf595 100644
> --- a/regexes.yaml
> +++ b/regexes.yaml
> @@ -1,6 +1,8 @@
> user_agent_parsers:
> #### SPECIAL CASES TOP ####
>
> + - regex: '(WikipediaApp)/([^-]*)-([^-]*)-([^ ]*) '
> +
> # HbbTV standard defines what features the browser should understand.
> # but it's like targeting "HTML5 browsers", effective browser support
> depends on the model
> # See os_parsers if you want to target a specific TV
> @@ -645,7 +647,7 @@ os_parsers:
> # iOS
> # http://en.wikipedia.org/wiki/IOS_version_history
> ##########
> - - regex: '(CPU OS|iPhone OS|CPU iPhone) (\d+)[_\.](\d+)(?:[_\.](\d+))?'
> + - regex: '(CPU OS|iPhone OS|CPU iPhone)[
> /](\d+)[_\.](\d+)(?:[_\.](\d+))?'
> os_replacement: 'iOS'
>
> # remaining cases are mostly only opera uas, so catch opera as to not
> catch iphone spoofs
>
>
>
>
> --
> ---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
> Companies' registry: 360296y in Linz
> Christian Aistleitner
> Gruendbergstrasze 65a Email: christian(a)quelltextlich.at
> 4040 Linz, Austria Phone: +43 732 / 26 95 63
> Fax: +43 732 / 26 95 63
> Homepage: http://quelltextlich.at/
> ---------------------------------------------------------------
>
> _______________________________________________
> Analytics mailing list
> Analytics(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
--
Oliver Keyes
Research Analyst
Wikimedia Foundation
Hello!
We are getting closer to a general release of the Wikipedia Android
and iOS apps, and I think we should standardize on a User-Agent
format. The old app just appended an identifier in front of the
phone's default UA[1] but I think we can do better, to avoid privacy
concerns[2].
How about:
WikipediaApp/<version> <OS>/<form-factor>/<version>
This gives us all the info we need (App version, OS, Form Factor
(Tablet / Phone) and OS version) without giving away too much. It is
also fairly simple to construct and parse.
For the latest alpha, my Nexus 4 would generate
WikipediaApp/32 Android/Phone/4.4
While an iOS device might generate
WkipediaApp/2.0 iOS/Phone/7.1
form-factor would just be Phone|Tablet for now, and can be expanded
later if necessary.
Thoughts?
[1]: https://www.mediawiki.org/wiki/Mobile/User_agents#Apps
[2]: https://www.mediawiki.org/wiki/EventLogging/UserAgentSanitization
--
Yuvi Panda T
http://yuvi.in/blog
I ran into an Android implementation of
http://mattt.github.io/Chroma-Hash/ lately, and was wondering if
experimenting with that would be a good idea for the Android app.
Thoughts?
--
Yuvi Panda T
http://yuvi.in/blog
Forwarding
---------- Forwarded message ----------
From: Dario Taraborelli <dtaraborelli(a)wikimedia.org>
Date: Wed, Apr 16, 2014 at 10:32 AM
Subject: Mobile first?
To: WMF Product Team <wmfproduct(a)lists.wikimedia.org>, mobile-tech <
mobile-tech(a)wikimedia.org>
Keep an eye on mobile traffic and user agents
http://lifehacker.com/use-wikipedias-mobile-site-for-easier-split-screen-re…