Hello,
On Tue, 7 Jul 2020 at 00:30, Kunal Mehta <legoktm(a)member.fsf.org> wrote:
hosting a *private node package repository* in
some form,
typically in a git repository, where only *vetted versions* of packages
are
checked in.
In theory this addresses the problems, but I think the biggest problem
is just the volume and quality of code that needs reviewing.
I've suspected vetting and quantity poses big challenges. To my surprise I
haven't found any information on how that's done.
To discuss this topic, I've made the subtask to the build step RFC:
T257072 <https://phabricator.wikimedia.org/T257072> "Determine Node package
auditing workflows"
Here's what the actual diff looks like:
<https://libup-diff.wmflabs.org/change/605082>.
Analyzing all that information is a superhuman task. A gerrit/gitlab like
review interface would make it more approachable.
The way Pnpm stores packages (uncompressed in a git repo) would enable
that. Yarn stores .zip files, but a separate uncompressed repo could be
used for code review, updates submitted as regularly reviewed patches.
I don't believe it's possible to review that
much code on a regular
basis, reacting to the speed at which many npm packages move. We could
stop upgrading all the time, but that would effectively be forking and
IMO put us in a worse position.
100% review wouldn't be sustainable IMO (causing developer burnout very
quickly), but looking for specific patterns exhibited in malicious packages
could be a successful approach to increase trust in the audited code.
Patterns like:
* An unmaintained repo receiving an update, which is a common solution to
inject malicious code.
* New packages added to the dependency tree.
An interesting article in this regard:
https://portswigger.net/daily-swig/new-npm-scanning-tool-sniffs-out-malicio…
I wonder if the npm-scan <https://github.com/spaceraccoon/npm-scan> tool
mentioned therein has been evaluated.
The repo of former malicious packages (npm-zoo
<https://github.com/spaceraccoon/npm-zoo>) is also worth mentioning.
I've collected a few notable incidents in the RFC under section
A_few_examples_of_NPM_incidents
<https://www.mediawiki.org/wiki/User:Aron_Manning/RfC:_Evaluate_alternative_Node_package_managers_for_improved_package_security#A_few_examples_of_NPM_incidents>
.
I also note that it's impossible to review just
the git changelog of a
package, because the npm maintainer can upload any arbitrary tarball of
code to npm, whether or not it matches the git repo. (This isn't
exclusive to npm, pypi, crates.io suffer from this problem too.
composer/packagist doesn't though.)
A tool looking for differences between the git repo and the npm tarball
could be useful.
It's possible though that many packages would require special treatment if
the tarball isn't simple to map on the git repo.
However, just a simple check to see if a new npm release has a
corresponding git tag or release - or any commits at all - would catch
injections done with a stolen NPM token.
How do these alternative package managers address the quantity of npm
packages installed that need review?
None of the package managers can or intend to do code review apart from
`npm audit`, available with all 3.
It seems to me there is an expectation that PMs will protect us. It should
be clarified that no tool can do that, the purpose of these tools is to
give 100% control over what packages and versions are installed.
What these PMs provide is detailed in the RfC for evaluating alternative PMs
<https://www.mediawiki.org/wiki/User:Aron_Manning/RfC:_Evaluate_alternative_Node_package_managers_for_improved_package_security#Package_managers>
What versions we add to the local package repo is up to us. Separate
ticket: T257072 <https://phabricator.wikimedia.org/T257072>
I think a 2 stage deployment process would subject packages to as much
scrutiny as possible within the constraints:
1. An auditing package repository with all the updates to be vetted. This
would be used in sandboxed environments to expose updates to developers,
who could notice outstanding behavior, warning signs.
2. A stable package repository for CI and not sandboxed environments.
The review process:
1. The auditing repo only includes packages used in WMF projects.
2. Package versions need to be greenlighted for auditing too. This is
preceded by a basic check of the validity of that version to look for eg,
stolen credential injections, but code is only reviewed if suspicious, eg.
new packages, unexpected updates.
3. Package versions would stay in this stage for some time (eg. 2 weeks),
depending on the package and urgency.
4. A changelog in the auditing repo tracks the newest updates, informing
developers about what packages to pay attention to.
5. One of the developers dedicated to package vetting does a deeper review
of the code. This should be aided by heuristic tools.
6. When confidence in a version is built, that version is greenlighted for
the stable repo.
Demian (Aron)