Hi,
I've been asked in a private email why WMF forked ua-parser [1] (a library used to extract information from User-Agents headers). There is no need to discuss this is private, hence I am replying to the mailing list.
TL;DR: It was no real fork. We just worked around issues with upstream's release management.
-----------------------------------------------------
What follows is a bit detailed. But given the context I decided to better err on being over-verbose.
Back in October 2014, WMF pushed towards analyzing User-Agent headers in the logs to for example allow more accurate estimations of how many requests WMF sees from Android vs. iPhone devices, which Browsers get used in which version etc.
Extracting information from User-Agents is a bit tricky as there are quite some corner cases. So it was decided to use a third-party library for it. ua-parser [1] got chosen for this purpose.
ua-parser comes with a Java build, so it naturally matched the log processing's Java eco-system. However, (at least) back then ua-parser did not offer compelling prebuilt jars, and ua-parser's versioning and release cycle of the Java part was broken. The latest release was about a year old, and no proper release was in sight. So all upstream gave us was a jar versioned as
ua-parser-1.3.0-SNAPSHOT.jar
Deploying such jar to the cluster is a bad idea, as its name does not give a clue on which commit it is based. For this concrete setting, there would be about 250 commits in ua-parser that would produce the same version number. That would make debugging hard and nix reproducability.
Since WMF cannot do a proper release for ua-parser, the typical workaround for WMF in such cases is to produce a “wmf” branch in Gerrit and do “wmf” releases at known commits. And that's what the ua-parser “fork” in Gerrit does.
Comparing upstream with the “fork” in Gerrit, the only difference is:
https://gerrit.wikimedia.org/r/#/c/169204/
That commit allows for a wmf release, is tagged 1.3.0-wmf1 and results in an artifact name of
ua-parser-1.3.0-wmf1.jar
which (due to the 1.3.0-wmf1 tag) is good for releasing [2].
As one of the questions in the private email was whether WMF could switch back to upstream ... I hope you see that WMF never switched away from upstream and WMF never “forked” upstream. WMF only rolled their own release. If upstream now provides proper releases, sure, just use them :-)
Have fun, Christian
P.S.: * How can I find out who actually created a repository?
Look at the first commit to the meta/config branch. Like here:
https://git.wikimedia.org/commit/analytics%2Fua-parser/2fd5dc00ac9e087b307f4...
* How can I see the difference between branches?
Use `git cherry` (Yes, really. Just “cherry”, no trailing “-pick”)
An example session is at [3].
* How could one have found out about the wmf1 thing?
For example from the IRC logs of the day from the commit [4]: [20:23:08] <ottomata> we can just make wmf1 be our release of the current master? [20:23:13] <qchris> k
----------------------------------------------------------------------
[1] Back then at
https://github.com/tobie/ua-parser
now the relevant repos for WMF seem to be at
https://github.com/ua-parser/uap-core https://github.com/ua-parser/uap-java
.
[2] It made it into archiva:
https://archiva.wikimedia.org/#artifact/ua_parser/ua-parser/1.3.0-wmf1
into the refinery-hive jars:
https://gerrit.wikimedia.org/r/#/c/166142/11..14/refinery-hive/pom.xml
and also to the cluster:
https://gerrit.wikimedia.org/r/#/c/170373/1/refinery-hive/pom.xml https://gerrit.wikimedia.org/r/#/c/170375/
[3] _________________________________________________________________ christian@spencer // jobs: 0 // time: 21:40:28 // exit code: 0 cwd: ~/tmp git clone https://github.com/tobie/ua-parser Cloning into 'ua-parser'... remote: Counting objects: 4507, done. remote: Total 4507 (delta 0), reused 0 (delta 0), pack-reused 4507 Receiving objects: 100% (4507/4507), 4.31 MiB | 923 KiB/s, done. Resolving deltas: 100% (2301/2301), done.
_________________________________________________________________ christian@spencer // jobs: 0 // time: 21:41:10 // exit code: 0 cwd: ~/tmp cd ua-parser
_________________________________________________________________ christian@spencer // jobs: 0 // time: 21:41:14 // exit code: 0 cwd: ~/tmp/ua-parser git remote add gerrit https://gerrit.wikimedia.org/r/analytics/ua-parser
_________________________________________________________________ christian@spencer // jobs: 0 // time: 21:41:33 // exit code: 0 cwd: ~/tmp/ua-parser git fetch gerrit remote: Finding sources: 100% (4/4) remote: Total 4 (delta 3), reused 4 (delta 3) Unpacking objects: 100% (4/4), done. From https://gerrit.wikimedia.org/r/analytics/ua-parser * [new branch] master -> gerrit/master * [new branch] wmf -> gerrit/wmf * [new tag] v1.3.0-wmf1 -> v1.3.0-wmf1
_________________________________________________________________ christian@spencer // jobs: 0 // time: 21:41:38 // exit code: 0 cwd: ~/tmp/ua-parser git cherry origin/master gerrit/master
_________________________________________________________________ christian@spencer // jobs: 0 // time: 21:42:10 // exit code: 0 cwd: ~/tmp/ua-parser git cherry origin/master gerrit/wmf + 2a44875355b558d9f880a63c86630af229044a63
_________________________________________________________________ christian@spencer // jobs: 0 // time: 21:42:17 // exit code: 0 cwd: ~/tmp/ua-parser git cherry origin/master v1.3.0-wmf1 + 2a44875355b558d9f880a63c86630af229044a63
[4] http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-analytics/20141027.txt