Thank you, Alexandra! I like this format, it was informative and more
On Thu, 2 Jul 2020 at 01:32, Alexandra Paskulin <apaskulin(a)wikimedia.org>
Present: Dan Andreescu, Daniel Kinzler, Timo Tijhof,
== RFC Frontend build step ==
* TT: This impacts security for the developers running
insecure code on
developer host machines, security for production (can be contained/network
isolated), and security for the end-user (this isn’t just helping create
commits or run tests, it modifies and adds code we sent to a billion
devices). Reproducing the same locally, in CI and prod. Workflow problems
like cherry-pick and revert in production branches.
Recently, I haven't seen the security question of npm being discussed on
and thought that question was not on topic. If that's not the case: is
there another discussion I'm not aware of?
The discussion on the ticket seems to be focused on a different topic now
and I don't see security discussed elsewhere. To avoid distraction, I'm
asking here. If you are aware of a relevant ticket, please share.
The issue of package security has been answered by the Node community
multiple times in different forms through the years.
Were these solutions evaluated? These generally avoid unvetted code by
hosting a *private node package repository* in some form,
typically in a git repository, where only *vetted versions* of packages are
These include the complete dependency trees, the package hashes and the
full source code of packages, therefore provide more complete security than
libraryupgrader2, which - in my understanding - only controls our top-level
Without completeness I'd mention 2 recent solutions:
* Yarn 2 (berry) offline cache <https://yarnpkg.com/features/offline-cache>.
Yarn 2 (introductory article
<https://dev.to/arcanis/introducing-yarn-2-4eh1>) came out this year. It is
a recent, fundamentally different version from (Classic) Yarn 1, which made
only minor improvements to npm. Yarn 2 adds package deduplication and
removes the need to uncompress packages, thus speeding up the `npm ci` step.
* Pnpm package store <https://pnpm.js.org/en/about-package-store>. Pnpm
came out 3 years ago, after Classic Yarn. Adds package deduplication with
symlinking, recreating node_modules with uncompressed packages, but without
dependency tree flattening (node_modules contains only the direct
Both package managers can live alongside npm, thus can be evaluated and
introduced gracefully without hard transitions:
* 'package.json' is shared, the package versions installed are expected to
be the same if run at the same time (same state of npm repo).
* 'package-lock.json' is NOT shared, there are separate 'yarn.lock' and
'pnpm-lock.yaml' files, these need to be generated separately.
My question in this regard:
1. Were these options discussed, is there an initiative to evaluate these
PMs and their impact? I haven't found signs of this on phab.
2. I've been using these PMs and made the necessary additions to some of
our projects. Where to discuss my findings and is there someone interested
in reviewing patches adding the missing dependencies to 'package.json'
files, that would make the repositories compatible with these PMs without
altering npm's behavior?
To clarify: this inquiry is about evaluating these alternatives without
affecting the use of npm or targetting the replacement of npm.
Some of my findings, in advance:
Neither PM is plug-and-play: these package managers are more strict than
npm, some packages need to be updated and our 'package.json' files need
1. npm's package tree flattening in node_modules avoids strict dependency
checking: packages can use other packages installed by non-direct
dependencies, or some other random package. This hides some missing
dependency declarations, most notably packages' `peerDependencies` are
often not declared in `dependencies` of the including package. Both Yarn2
and Pnpm are more strict than npm in this matter and require the
declaration of those. This is a trivial update to mediawiki's
'package.json' files that's worth the added security.
2. Yarn 2 removes the node_modules folder and uses a custom package loader
that mounts package archives directly, without uncompressing to disk
(except for "unplugged" packages, such as the usual suspects 'node-gpy'
'fibers'...). Packages using `require.resolve()` work seamlessly, but there
are many packages that make assumptions about the node_modules folder and
try to load files directly. Major packages have been updated (support list
<https://yarnpkg.com/features/pnp#native-support>), but there are packages
in our dependency tree that aren't yet. These can be solved by a simple
patch to upstream - thus benefiting the open-source community - or by
unplugging those packages. In any case, this is a cross-cutting concern in
all repos and a coordinated approach would be beneficial.
* DK: To what extent are we willing to run arbitrary code on our systems?
Which communities do we trust? (Example: Debian
maintainers are vetted, NPM
packages are not)
* TT: NPM packages are known for depending on a lot of unreviewed/unknown
<https://phabricator.wikimedia.org/T199004#6045136>. But, there
are communities within the NPM ecosystem that follow different principles,
and use fewer dependencies.
* DA: We could set a policy about reviewing and vendoring such service,
run in a sandbox, pinned to specific versions. We could set a requirement
that packages need to be vetted.
With a privately hosted package store 100% of the code can be vetted. It's
up to the capacity and diligence of the maintainers how thoroughly the code
A git package store would require the evaluation of the performance team. I
assume for CI that would act as a local cache, possibly reducing network
For developers I wouldn't suggest putting load on a WMF hosted store.
Packages downloaded from the public repositories are checksummed, rejecting
any package that has different contents than the vetted version.
I assume the repository would be updated by libraryupgrader, which would
manage a significantly larger number of packages than now.
Thank you for reading.