On Fri, Feb 24, 2017 at 10:34 AM, Florian Schmidt < florian.schmidt.welzow@t-online.de> wrote:
I searched in phabricator, if we've a task already, but couldn't find any. However, as the phabricator search and me aren't really good friends, it's possible, that the search wasn't as honest to me, as I would wish and I missed something, so I ask on this list :) Do we've a task already to track the work on this topic? A short github search[1] showed some usages of sha1 (at least the string), so I suspect, that there're some places where we use it, right?
[1] https://github.com/wikimedia/mediawiki/search?utf8=%E2%9C%93&q=SHA1
Usages are listed at https://www.mediawiki.org/wiki/Manual:Hashing -- I've added a "purpose" column on the DB fields list. There might be a couple we missed, so feel free to edit!
It looks like img_sha1 and fa_storage_key are the biggest practical dangers of collision, in that pairs of images could be created such that, say, one is a cute kitten and the other is a shock image, such that deletion and undeletion might surface the wrong file from what the undeleting admin expected.
However since the attack requires creating a common prefix and suffix for the matching pair of files, there's little or no danger of an attack that replaces legitimate files already uploaded by someone else.
I would probably recommend migrating rev_sha1 and img_sha1's usages in collision detection over to a rev_sha256 and img_sha256 columns, and reworking filearchve to use a sha-256 hash for content addressing of new files, but this doesn't need to be a "drop everything" hurry task.
-- brion