Thank you, Alexandra! I like this format, it was informative and more interesting.
On Thu, 2 Jul 2020 at 01:32, Alexandra Paskulin apaskulin@wikimedia.org wrote:
Present: Dan Andreescu, Daniel Kinzler, Timo Tijhof, Alex Paskulin, Niklas Laxstrom
== RFC Frontend build step == https://phabricator.wikimedia.org/T199004 ==
...
- TT: This impacts security for the developers running insecure code on
developer host machines, security for production (can be contained/network isolated), and security for the end-user (this isn’t just helping create commits or run tests, it modifies and adds code we sent to a billion people’s devices). Reproducing the same locally, in CI and prod. Workflow problems like cherry-pick and revert in production branches.
Recently, I haven't seen the security question of npm being discussed on https://phabricator.wikimedia.org/T199004 and thought that question was not on topic. If that's not the case: is there another discussion I'm not aware of?
The discussion on the ticket seems to be focused on a different topic now and I don't see security discussed elsewhere. To avoid distraction, I'm asking here. If you are aware of a relevant ticket, please share.
The issue of package security has been answered by the Node community multiple times in different forms through the years. Were these solutions evaluated? These generally avoid unvetted code by hosting a *private node package repository* in some form, typically in a git repository, where only *vetted versions* of packages are checked in.
These include the complete dependency trees, the package hashes and the full source code of packages, therefore provide more complete security than libraryupgrader2, which - in my understanding - only controls our top-level dependencies.
Without completeness I'd mention 2 recent solutions: * Yarn 2 (berry) offline cache https://yarnpkg.com/features/offline-cache. Yarn 2 (introductory article https://dev.to/arcanis/introducing-yarn-2-4eh1) came out this year. It is a recent, fundamentally different version from (Classic) Yarn 1, which made only minor improvements to npm. Yarn 2 adds package deduplication and removes the need to uncompress packages, thus speeding up the `npm ci` step. * Pnpm package store https://pnpm.js.org/en/about-package-store. Pnpm came out 3 years ago, after Classic Yarn. Adds package deduplication with symlinking, recreating node_modules with uncompressed packages, but without dependency tree flattening (node_modules contains only the direct dependencies).
Both package managers can live alongside npm, thus can be evaluated and introduced gracefully without hard transitions: * 'package.json' is shared, the package versions installed are expected to be the same if run at the same time (same state of npm repo). * 'package-lock.json' is NOT shared, there are separate 'yarn.lock' and 'pnpm-lock.yaml' files, these need to be generated separately.
My question in this regard: 1. Were these options discussed, is there an initiative to evaluate these PMs and their impact? I haven't found signs of this on phab. 2. I've been using these PMs and made the necessary additions to some of our projects. Where to discuss my findings and is there someone interested in reviewing patches adding the missing dependencies to 'package.json' files, that would make the repositories compatible with these PMs without altering npm's behavior?
To clarify: this inquiry is about evaluating these alternatives without affecting the use of npm or targetting the replacement of npm.
Some of my findings, in advance:
Neither PM is plug-and-play: these package managers are more strict than npm, some packages need to be updated and our 'package.json' files need some corrections.
1. npm's package tree flattening in node_modules avoids strict dependency checking: packages can use other packages installed by non-direct dependencies, or some other random package. This hides some missing dependency declarations, most notably packages' `peerDependencies` are often not declared in `dependencies` of the including package. Both Yarn2 and Pnpm are more strict than npm in this matter and require the declaration of those. This is a trivial update to mediawiki's 'package.json' files that's worth the added security. 2. Yarn 2 removes the node_modules folder and uses a custom package loader that mounts package archives directly, without uncompressing to disk (except for "unplugged" packages, such as the usual suspects 'node-gpy' and 'fibers'...). Packages using `require.resolve()` work seamlessly, but there are many packages that make assumptions about the node_modules folder and try to load files directly. Major packages have been updated (support list https://yarnpkg.com/features/pnp#native-support), but there are packages in our dependency tree that aren't yet. These can be solved by a simple patch to upstream - thus benefiting the open-source community - or by unplugging those packages. In any case, this is a cross-cutting concern in all repos and a coordinated approach would be beneficial.
* DK: To what extent are we willing to run arbitrary code on our systems?
Which communities do we trust? (Example: Debian maintainers are vetted, NPM packages are not)
* TT: NPM packages are known for depending on a lot of unreviewed/unknown
code. See https://phabricator.wikimedia.org/T199004#6045136. But, there are communities within the NPM ecosystem that follow different principles, and use fewer dependencies.
- DA: We could set a policy about reviewing and vendoring such service,
run in a sandbox, pinned to specific versions. We could set a requirement that packages need to be vetted.
With a privately hosted package store 100% of the code can be vetted. It's up to the capacity and diligence of the maintainers how thoroughly the code is reviewed. A git package store would require the evaluation of the performance team. I assume for CI that would act as a local cache, possibly reducing network traffic. For developers I wouldn't suggest putting load on a WMF hosted store. Packages downloaded from the public repositories are checksummed, rejecting any package that has different contents than the vetted version. I assume the repository would be updated by libraryupgrader, which would manage a significantly larger number of packages than now.
Thank you for reading. Aron (Demian)