Thanks Christian -- nice to see you back on the lists :)

-Toby

On Thu, Aug 20, 2015 at 1:16 PM, Christian Aistleitner <christian@quelltextlich.at> wrote:
Hi,

I've been asked in a private email why WMF forked ua-parser [1]
(a library used to extract information from User-Agents headers).
There is no need to discuss this is private, hence I am replying to
the mailing list.

TL;DR: It was no real fork. We just worked around issues with
upstream's release management.

-----------------------------------------------------

What follows is a bit detailed. But given the context I decided to
better err on being over-verbose.

Back in October 2014, WMF pushed towards analyzing User-Agent headers
in the logs to for example allow more accurate estimations of how many
requests WMF sees from Android vs. iPhone devices, which Browsers get
used in which version etc.

Extracting information from User-Agents is a bit tricky as there are
quite some corner cases. So it was decided to use a third-party
library for it. ua-parser [1] got chosen for this purpose.

ua-parser comes with a Java build, so it naturally matched the log
processing's Java eco-system. However, (at least) back then ua-parser
did not offer compelling prebuilt jars, and ua-parser's versioning and
release cycle of the Java part was broken.
The latest release was about a year old, and no proper release was in
sight. So all upstream gave us was a jar versioned as

  ua-parser-1.3.0-SNAPSHOT.jar

Deploying such jar to the cluster is a bad idea, as its name does not
give a clue on which commit it is based.  For this concrete setting,
there would be about 250 commits in ua-parser that would produce the
same version number.  That would make debugging hard and nix
reproducability.

Since WMF cannot do a proper release for ua-parser, the typical
workaround for WMF in such cases is to produce a “wmf” branch in
Gerrit and do “wmf” releases at known commits. And that's what the
ua-parser “fork” in Gerrit does.

Comparing upstream with the “fork” in Gerrit, the only difference is:

  https://gerrit.wikimedia.org/r/#/c/169204/

That commit allows for a wmf release, is tagged 1.3.0-wmf1 and results
in an artifact name of

  ua-parser-1.3.0-wmf1.jar

which (due to the 1.3.0-wmf1 tag) is good for releasing [2].

As one of the questions in the private email was whether WMF could
switch back to upstream ... I hope you see that WMF never switched
away from upstream and WMF never “forked” upstream. WMF only rolled
their own release.
If upstream now provides proper releases, sure, just use them :-)

Have fun,
Christian



P.S.:
* How can I find out who actually created a repository?

Look at the first commit to the meta/config branch. Like here:

  https://git.wikimedia.org/commit/analytics%2Fua-parser/2fd5dc00ac9e087b307f42669029f9b05cdcb090


* How can I see the difference between branches?

Use `git cherry` (Yes, really. Just “cherry”, no trailing “-pick”)

An example session is at [3].


* How could one have found out about the wmf1 thing?

For example from the IRC logs of the day from the commit [4]:
[20:23:08] <ottomata>    we can just make wmf1 be our release of the current master?
[20:23:13] <qchris>      k



----------------------------------------------------------------------

[1] Back then at

  https://github.com/tobie/ua-parser

now the relevant repos for WMF seem to be at

  https://github.com/ua-parser/uap-core
  https://github.com/ua-parser/uap-java

.



[2] It made it into archiva:

  https://archiva.wikimedia.org/#artifact/ua_parser/ua-parser/1.3.0-wmf1

into the refinery-hive jars:

  https://gerrit.wikimedia.org/r/#/c/166142/11..14/refinery-hive/pom.xml

and also to the cluster:

  https://gerrit.wikimedia.org/r/#/c/170373/1/refinery-hive/pom.xml
  https://gerrit.wikimedia.org/r/#/c/170375/



[3]
_________________________________________________________________
christian@spencer // jobs: 0 // time: 21:40:28 // exit code: 0
cwd: ~/tmp
git clone https://github.com/tobie/ua-parser
Cloning into 'ua-parser'...
remote: Counting objects: 4507, done.
remote: Total 4507 (delta 0), reused 0 (delta 0), pack-reused 4507
Receiving objects: 100% (4507/4507), 4.31 MiB | 923 KiB/s, done.
Resolving deltas: 100% (2301/2301), done.

_________________________________________________________________
christian@spencer // jobs: 0 // time: 21:41:10 // exit code: 0
cwd: ~/tmp
cd ua-parser

_________________________________________________________________
christian@spencer // jobs: 0 // time: 21:41:14 // exit code: 0
cwd: ~/tmp/ua-parser
git remote add gerrit https://gerrit.wikimedia.org/r/analytics/ua-parser

_________________________________________________________________
christian@spencer // jobs: 0 // time: 21:41:33 // exit code: 0
cwd: ~/tmp/ua-parser
git fetch gerrit
remote: Finding sources: 100% (4/4)
remote: Total 4 (delta 3), reused 4 (delta 3)
Unpacking objects: 100% (4/4), done.
>From https://gerrit.wikimedia.org/r/analytics/ua-parser
 * [new branch]      master     -> gerrit/master
 * [new branch]      wmf        -> gerrit/wmf
 * [new tag]         v1.3.0-wmf1 -> v1.3.0-wmf1

_________________________________________________________________
christian@spencer // jobs: 0 // time: 21:41:38 // exit code: 0
cwd: ~/tmp/ua-parser
git cherry origin/master gerrit/master

_________________________________________________________________
christian@spencer // jobs: 0 // time: 21:42:10 // exit code: 0
cwd: ~/tmp/ua-parser
git cherry origin/master gerrit/wmf
+ 2a44875355b558d9f880a63c86630af229044a63

_________________________________________________________________
christian@spencer // jobs: 0 // time: 21:42:17 // exit code: 0
cwd: ~/tmp/ua-parser
git cherry origin/master v1.3.0-wmf1
+ 2a44875355b558d9f880a63c86630af229044a63



[4] http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-analytics/20141027.txt



--
---- quelltextlich e.U. ---- \\ ---- Christian Aistleitner ----
                           Companies' registry: 360296y in Linz
Christian Aistleitner
Kefermarkterstrasze 6a/3     Email:  christian@quelltextlich.at
4293 Gutau, Austria          Phone:          +43 7946 / 20 5 81
                             Fax:            +43 7946 / 20 5 81
                             Homepage: http://quelltextlich.at/
---------------------------------------------------------------

_______________________________________________
Analytics mailing list
Analytics@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics