TechCom Radar 2020-07-01

List overview All Threads
Download

newer

older

Testing via GitHub actions

Moving from Gerrit to GitLab

Alexandra Paskulin

1 Jul 2020 1 Jul '20

7:31 p.m.

Hi all,

This week, we're sending out a more detailed version of the TechCom meeting minutes. Let us know if you find this helpful.

Present: Dan Andreescu, Daniel Kinzler, Timo Tijhof, Alex Paskulin, Niklas Laxstrom

== RFC Frontend build step == https://phabricator.wikimedia.org/T199004 == * Recently moved from open to stalled * Discussion of the issues presented in the RFC from a process perspective: * DA: The question on the RFC was originally: How can we implement a build step? The discussion on the RFC has been centered around whether we should implement a build step. The authors are trying to address concerns to enable them to implement it since it’s an industry-wide practice. * TT: We should focus on the underlying problems this is trying to solve. A build step is inevitable. Many problems discussed so far on the RFC don’t call for a build step. There are some architecture issues in the things proposed that can’t be reconciled. We can build towards a deliverable of compiling a Vue component. * DK: Process-wise, what is the problem with a team deciding that they want a server side build step? What’s the impact? In theory, we want to maximize coherence and autonomy; there’s a polarity between the two. * DA: If they find a mainstream way, they’ll migrate to it. I don’t think it’s problematic. They’re saying that they’ll address all the concerns if they can. * TT: This impacts security for the developers running insecure code on developer host machines, security for production (can be contained/network isolated), and security for the end-user (this isn’t just helping create commits or run tests, it modifies and adds code we sent to a billion people’s devices). Reproducing the same locally, in CI and prod. Workflow problems like cherry-pick and revert in production branches. * DK: To what extent are we willing to run arbitrary code on our systems? Which communities do we trust? (Example: Debian maintainers are vetted, NPM packages are not) * TT: NPM packages are known for depending on a lot of unreviewed/unknown code. See https://phabricator.wikimedia.org/T199004#6045136. But, there are communities within the NPM ecosystem that follow different principles, and use fewer dependencies. * DA: We could set a policy about reviewing and vendoring such service, run in a sandbox, pinned to specific versions. We could set a requirement that packages need to be vetted.

== Send regular overview about Wikimedia development policies == https://phabricator.wikimedia.org/T164538 * Moved from Inbox to In progress on TechCom board

== RFC: Amendment to the Stability interface policy == https://phabricator.wikimedia.org/T255803 * On last call ending July 8 * Discussion ongoing

== RFC: Hybrid extension management == https://phabricator.wikimedia.org/T250406 * In Phase 3: Explore * Discussion ongoing

== Next week public IRC discussion == No discussion scheduled for next week

See also the TechCom RFC board https://phabricator.wikimedia.org/tag/mediawiki-rfcs/.

-- Alex Paskulin Technical Writer Wikimedia Foundation

Show replies by date

Aron Manning

2 Jul 2 Jul

5:41 a.m.

Thank you, Alexandra! I like this format, it was informative and more interesting.

On Thu, 2 Jul 2020 at 01:32, Alexandra Paskulin apaskulin@wikimedia.org wrote:

...

Present: Dan Andreescu, Daniel Kinzler, Timo Tijhof, Alex Paskulin, Niklas Laxstrom

== RFC Frontend build step == https://phabricator.wikimedia.org/T199004 ==

...

TT: This impacts security for the developers running insecure code on

developer host machines, security for production (can be contained/network isolated), and security for the end-user (this isn’t just helping create commits or run tests, it modifies and adds code we sent to a billion people’s devices). Reproducing the same locally, in CI and prod. Workflow problems like cherry-pick and revert in production branches.

Recently, I haven't seen the security question of npm being discussed on https://phabricator.wikimedia.org/T199004 and thought that question was not on topic. If that's not the case: is there another discussion I'm not aware of?

The discussion on the ticket seems to be focused on a different topic now and I don't see security discussed elsewhere. To avoid distraction, I'm asking here. If you are aware of a relevant ticket, please share.

The issue of package security has been answered by the Node community multiple times in different forms through the years. Were these solutions evaluated? These generally avoid unvetted code by hosting a *private node package repository* in some form, typically in a git repository, where only *vetted versions* of packages are checked in.

These include the complete dependency trees, the package hashes and the full source code of packages, therefore provide more complete security than libraryupgrader2, which - in my understanding - only controls our top-level dependencies.

Without completeness I'd mention 2 recent solutions: * Yarn 2 (berry) offline cache https://yarnpkg.com/features/offline-cache. Yarn 2 (introductory article https://dev.to/arcanis/introducing-yarn-2-4eh1) came out this year. It is a recent, fundamentally different version from (Classic) Yarn 1, which made only minor improvements to npm. Yarn 2 adds package deduplication and removes the need to uncompress packages, thus speeding up the `npm ci` step. * Pnpm package store https://pnpm.js.org/en/about-package-store. Pnpm came out 3 years ago, after Classic Yarn. Adds package deduplication with symlinking, recreating node_modules with uncompressed packages, but without dependency tree flattening (node_modules contains only the direct dependencies).

Both package managers can live alongside npm, thus can be evaluated and introduced gracefully without hard transitions: * 'package.json' is shared, the package versions installed are expected to be the same if run at the same time (same state of npm repo). * 'package-lock.json' is NOT shared, there are separate 'yarn.lock' and 'pnpm-lock.yaml' files, these need to be generated separately.

My question in this regard: 1. Were these options discussed, is there an initiative to evaluate these PMs and their impact? I haven't found signs of this on phab. 2. I've been using these PMs and made the necessary additions to some of our projects. Where to discuss my findings and is there someone interested in reviewing patches adding the missing dependencies to 'package.json' files, that would make the repositories compatible with these PMs without altering npm's behavior?

To clarify: this inquiry is about evaluating these alternatives without affecting the use of npm or targetting the replacement of npm.

Some of my findings, in advance:

Neither PM is plug-and-play: these package managers are more strict than npm, some packages need to be updated and our 'package.json' files need some corrections.

1. npm's package tree flattening in node_modules avoids strict dependency checking: packages can use other packages installed by non-direct dependencies, or some other random package. This hides some missing dependency declarations, most notably packages' `peerDependencies` are often not declared in `dependencies` of the including package. Both Yarn2 and Pnpm are more strict than npm in this matter and require the declaration of those. This is a trivial update to mediawiki's 'package.json' files that's worth the added security. 2. Yarn 2 removes the node_modules folder and uses a custom package loader that mounts package archives directly, without uncompressing to disk (except for "unplugged" packages, such as the usual suspects 'node-gpy' and 'fibers'...). Packages using `require.resolve()` work seamlessly, but there are many packages that make assumptions about the node_modules folder and try to load files directly. Major packages have been updated (support list https://yarnpkg.com/features/pnp#native-support), but there are packages in our dependency tree that aren't yet. These can be solved by a simple patch to upstream - thus benefiting the open-source community - or by unplugging those packages. In any case, this is a cross-cutting concern in all repos and a coordinated approach would be beneficial.

* DK: To what extent are we willing to run arbitrary code on our systems?

...

Which communities do we trust? (Example: Debian maintainers are vetted, NPM packages are not)

* TT: NPM packages are known for depending on a lot of unreviewed/unknown

...

code. See https://phabricator.wikimedia.org/T199004#6045136. But, there are communities within the NPM ecosystem that follow different principles, and use fewer dependencies.

DA: We could set a policy about reviewing and vendoring such service,

run in a sandbox, pinned to specific versions. We could set a requirement that packages need to be vetted.

With a privately hosted package store 100% of the code can be vetted. It's up to the capacity and diligence of the maintainers how thoroughly the code is reviewed. A git package store would require the evaluation of the performance team. I assume for CI that would act as a local cache, possibly reducing network traffic. For developers I wouldn't suggest putting load on a WMF hosted store. Packages downloaded from the public repositories are checksummed, rejecting any package that has different contents than the vetted version. I assume the repository would be updated by libraryupgrader, which would manage a significantly larger number of packages than now.

Thank you for reading. Aron (Demian)

Kunal Mehta

6 Jul 6 Jul

6:29 p.m.

New subject: npm and security (was: Re: TechCom Radar 2020-07-01)

Hi,

On 2020-07-02 02:41, Aron Manning wrote:

...

Recently, I haven't seen the security question of npm being discussed on https://phabricator.wikimedia.org/T199004 and thought that question was not on topic. If that's not the case: is there another discussion I'm not aware of?

There's been some work around protecting developers from npm packages, especially https://gerrit.wikimedia.org/g/fresh/ but I don't believe there are any active public discussions.

...

The issue of package security has been answered by the Node community multiple times in different forms through the years. Were these solutions evaluated? These generally avoid unvetted code by hosting a *private node package repository* in some form, typically in a git repository, where only *vetted versions* of

packages are

...

checked in.

In theory this addresses the problems, but I think the biggest problem is just the volume and quality of code that needs reviewing.

Take https://gerrit.wikimedia.org/r/c/mediawiki/skins/Vector/+/605082/ for example. It bumps 3 packages: stylelint-config-wikimedia (owned by us), webpack (minor version bump), and markdown-to-jsx (patch bump). The latter two are security issues (whether they're actually exploitable in our context is another discussion).

Here's what the actual diff looks like: https://libup-diff.wmflabs.org/change/605082.

There's at least a thousand lines changed, not including the minified/compiled code that's impossible to review. And that doesn't even show the npm packages that download compiled binaries on installation.

(Feel free to throw other patches at the libup-diff tool, but it's alpha quality still, I hadn't fixed it up enough yet to announce it.)

I don't believe it's possible to review that much code on a regular basis, reacting to the speed at which many npm packages move. We could stop upgrading all the time, but that would effectively be forking and IMO put us in a worse position.

I also note that it's impossible to review just the git changelog of a package, because the npm maintainer can upload any arbitrary tarball of code to npm, whether or not it matches the git repo. (This isn't exclusive to npm, pypi, crates.io suffer from this problem too. composer/packagist doesn't though.)

...

These include the complete dependency trees, the package hashes and the full source code of packages, therefore provide more complete security

than

...

libraryupgrader2, which - in my understanding - only controls our

top-level

...

dependencies.

npm dependency trees should be fully locked via package-lock.json. libup currently only upgrades top-level dependencies, but it also runs `npm audit fix` in response to npm security advisories (see the current listing: https://libraryupgrader2.wmflabs.org/vulns/npm).

But honestly I consider libup's "npm audit fix" mode just damage control at this point. If we can't trust the code we're running, then we're more likely to protect ourselves by fixing the security issues we do know about.

In short, if you've been running npm install/test/etc. on your machine, I would consider any ssh/GPG/etc. keys that it could've accessed compromised. Krinkle goes into this much better in his blog post: https://timotijhof.net/posts/2019/protect-yourself-from-npm/.

...

Without completeness I'd mention 2 recent solutions:

<snip>

How do these alternative package managers address the quantity of npm packages installed that need review?

...

I've been using these PMs and made the necessary additions to some of

our projects. Where to discuss my findings and is there someone interested in reviewing patches adding the missing dependencies to 'package.json' files, that would make the repositories compatible with these PMs without altering npm's behavior?

First I'd start with documenting the current problems we face with npm and its ecosystem and once we have agreement on that, then beginning to look for solutions.

Ultimately, I think it's important to remember that npm is a proprietary service run by a for-profit company (formerly npm inc., now Github). We're always going to be fighting against them with little ability to cause significant change. I think a free software, community based registry, like practically every other major language, could do so much better in this area.

-- Legoktm

Aron Manning

7 Jul 7 Jul

6:49 p.m.

New subject: npm and security (was: Re: TechCom Radar 2020-07-01)

Hello,

On Tue, 7 Jul 2020 at 00:30, Kunal Mehta legoktm@member.fsf.org wrote:

...

...
hosting a *private node package repository* in some form, typically in a git repository, where only *vetted versions* of packages

are

...
checked in.

In theory this addresses the problems, but I think the biggest problem is just the volume and quality of code that needs reviewing.

I've suspected vetting and quantity poses big challenges. To my surprise I haven't found any information on how that's done. To discuss this topic, I've made the subtask to the build step RFC: T257072 https://phabricator.wikimedia.org/T257072 "Determine Node package auditing workflows"

...

Here's what the actual diff looks like: https://libup-diff.wmflabs.org/change/605082.

Analyzing all that information is a superhuman task. A gerrit/gitlab like review interface would make it more approachable. The way Pnpm stores packages (uncompressed in a git repo) would enable that. Yarn stores .zip files, but a separate uncompressed repo could be used for code review, updates submitted as regularly reviewed patches.

...

I don't believe it's possible to review that much code on a regular basis, reacting to the speed at which many npm packages move. We could stop upgrading all the time, but that would effectively be forking and IMO put us in a worse position.

100% review wouldn't be sustainable IMO (causing developer burnout very quickly), but looking for specific patterns exhibited in malicious packages could be a successful approach to increase trust in the audited code. Patterns like: * An unmaintained repo receiving an update, which is a common solution to inject malicious code. * New packages added to the dependency tree.

An interesting article in this regard: https://portswigger.net/daily-swig/new-npm-scanning-tool-sniffs-out-maliciou... I wonder if the npm-scan https://github.com/spaceraccoon/npm-scan tool mentioned therein has been evaluated. The repo of former malicious packages (npm-zoo https://github.com/spaceraccoon/npm-zoo) is also worth mentioning. I've collected a few notable incidents in the RFC under section A_few_examples_of_NPM_incidents https://www.mediawiki.org/wiki/User:Aron_Manning/RfC:_Evaluate_alternative_Node_package_managers_for_improved_package_security#A_few_examples_of_NPM_incidents .

...

I also note that it's impossible to review just the git changelog of a package, because the npm maintainer can upload any arbitrary tarball of code to npm, whether or not it matches the git repo. (This isn't exclusive to npm, pypi, crates.io suffer from this problem too. composer/packagist doesn't though.)

A tool looking for differences between the git repo and the npm tarball could be useful. It's possible though that many packages would require special treatment if the tarball isn't simple to map on the git repo. However, just a simple check to see if a new npm release has a corresponding git tag or release - or any commits at all - would catch injections done with a stolen NPM token.

How do these alternative package managers address the quantity of npm

...

packages installed that need review?

None of the package managers can or intend to do code review apart from `npm audit`, available with all 3. It seems to me there is an expectation that PMs will protect us. It should be clarified that no tool can do that, the purpose of these tools is to give 100% control over what packages and versions are installed. What these PMs provide is detailed in the RfC for evaluating alternative PMs https://www.mediawiki.org/wiki/User:Aron_Manning/RfC:_Evaluate_alternative_Node_package_managers_for_improved_package_security#Package_managers

What versions we add to the local package repo is up to us. Separate ticket: T257072 https://phabricator.wikimedia.org/T257072

I think a 2 stage deployment process would subject packages to as much scrutiny as possible within the constraints:

1. An auditing package repository with all the updates to be vetted. This would be used in sandboxed environments to expose updates to developers, who could notice outstanding behavior, warning signs. 2. A stable package repository for CI and not sandboxed environments.

The review process:

1. The auditing repo only includes packages used in WMF projects. 2. Package versions need to be greenlighted for auditing too. This is preceded by a basic check of the validity of that version to look for eg, stolen credential injections, but code is only reviewed if suspicious, eg. new packages, unexpected updates. 3. Package versions would stay in this stage for some time (eg. 2 weeks), depending on the package and urgency. 4. A changelog in the auditing repo tracks the newest updates, informing developers about what packages to pay attention to. 5. One of the developers dedicated to package vetting does a deeper review of the code. This should be aided by heuristic tools. 6. When confidence in a version is built, that version is greenlighted for the stable repo.

Demian (Aron)

1605

Age (days ago)

1611

Last active (days ago)

wikitech-l@lists.wikimedia.org

3 comments

3 participants

tags (0)

participants (3)

Alexandra Paskulin
Aron Manning
Kunal Mehta