[QA] Package managers caching

Antoine Musso hashar+wmf at free.fr
Thu Oct 22 13:54:53 UTC 2015


Hello,

Looking for some advices regarding maintaining a cache store for the
various package managers.  The Nodepool instances have cold caches and
that causes npm/pip/gem/composer/gradle/maven to fetch everything over
the internet.

When a job start, I would like the cache to be populated from some copy
to save time (and be nice with the package managers repos).

The tracking task:

  Disposable VMs need a cache for package managers
  https://phabricator.wikimedia.org/T112560


I evaluated some solutions, each having their associated tasks. A
summary is:


  devpi for pip / pypi
  https://phabricator.wikimedia.org/T114871

Just change pip index url to it and it magically keep a cache. Con: it
is only for pip.


  angry-caching-proxy
  https://phabricator.wikimedia.org/T112561

A nodejs proxy. set http_proxy to it and override the index url to use
HTTP.  Drawback: doesn't use HTTPS with upstream and some package
managers are only available over HTTPS.


 squid based man in the middle proxy
 https://phabricator.wikimedia.org/T116015

Point https_proxy to it, Squid has a feature to terminate the connection
and sign it with a custom CA.  Works like a charm but I had to rebuild
the squid package to support SSL and hit a wall with bundler/gem.
Having to maintain custom CA is intimidating.



Another strategy Dan suggested is to let the job run and download
everything from the internet, then save that cache to some central
place. The next time the job run we can warm the cache from that place
and eventually refresh it.

https://phabricator.wikimedia.org/T116017

To avoid cache pollution, we could namespace it by repo name/branch and
have it only be refreshed after a change is merged.

Each package managers would be configured to have their cache path to
something like /home/jenkins/cache/<package manager>

After a merge occur, we can trigger the install phases and rsync that to
some central place.
For other jobs, just rsync from the central place to /home/jenkins/cache
then execute the rest of the job.


Mukunda suggested Phabricator Drydock
https://phabricator.wikimedia.org/T116038

That is a good step forward to have CI under Phabricator, but it is a
bit intimidating compared to a home made rsync solution.



I am tempted to pick and sprint a rsync based solution which should be
straightforward.

Any ideas?

-- 
Antoine "hashar" Musso




More information about the QA mailing list