On 16 April 2014 10:21, Christian Aistleitner <christian@quelltextlich.at> wrote:
Hi Oliver,

On Wed, Apr 16, 2014 at 09:15:53AM -0700, Oliver Keyes wrote:
> So, it identifies the first one as Android, but can't pick out version
> number,

you're lagging behind master. Android version should be correctly
picked since

  https://github.com/tobie/ua-parser/commit/e9d5238513b3184ef0cbcb6e4c403a20f46bd5db


Good catch! Updated at my end.
 
> and identifies the second as running Mobile Safari, but can't pick
> out the OS or device.
>
> I would recommend tweaking and testing these strings
> before deploying them [...]

Regardless of how you tweak the User-Agent strings ... how would you
get ua_parser the to report the User-Agent family as “WikipediaApp”?

You would have to teach ua_parser about it.

And if we have to teach ua_parser something anyways ... we might as
well stick with standards for our User-Agents and teach ua_parser to
extract not only the User-Agent, but also to be more robust when
extracting OS information.

That would benefit us and ua_parser.

It's just a simple two line patch [1].


Sure; for app identification we could just handle it ourselves - we probably want to avoid pushing WM-specific strings upstream.
 
> if we want accurate device numbers (and we totally
> want accurate device numbers).

Device information is not at all included in the User-Agent.
And that's actually good. No need to leak all over the Internet who
uses which device.

But as device information is not included in the User-Agent, we cannot
parse it out to get per device numbers.

It's not at the moment, but it could be, and I think that just including device /class/ (tablet versus mobile versus other) would probably be fine. I don't see how this would be 'leak[ing] all over the internet'.

Have fun,
Christian



[1] Something along the lines of (probably do not want to split
version number parts at -, but do not know)

git diff HEAD^
diff --git a/regexes.yaml b/regexes.yaml
index 3ecd0b4..cfdf595 100644
--- a/regexes.yaml
+++ b/regexes.yaml
@@ -1,6 +1,8 @@
 user_agent_parsers:
   #### SPECIAL CASES TOP ####

+  - regex: '(WikipediaApp)/([^-]*)-([^-]*)-([^ ]*) '
+
   # HbbTV standard defines what features the browser should understand.
   # but it's like targeting "HTML5 browsers", effective browser support depends on the model
   # See os_parsers if you want to target a specific TV
@@ -645,7 +647,7 @@ os_parsers:
   # iOS
   # http://en.wikipedia.org/wiki/IOS_version_history
   ##########
-  - regex: '(CPU OS|iPhone OS|CPU iPhone) (\d+)[_\.](\d+)(?:[_\.](\d+))?'
+  - regex: '(CPU OS|iPhone OS|CPU iPhone)[ /](\d+)[_\.](\d+)(?:[_\.](\d+))?'
     os_replacement: 'iOS'

   # remaining cases are mostly only opera uas, so catch opera as to not catch iphone spoofs




--
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
                           Companies' registry: 360296y in Linz
Christian Aistleitner
Gruendbergstrasze 65a        Email:  christian@quelltextlich.at
4040 Linz, Austria           Phone:          +43 732 / 26 95 63
                             Fax:            +43 732 / 26 95 63
                             Homepage: http://quelltextlich.at/
---------------------------------------------------------------

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics




--
Oliver Keyes
Research Analyst
Wikimedia Foundation