When we process Event Logging events, we hash the origin IP address and add it to the event as part of the "capsule. We salt the hash function and rotate the salt frequently for security, but within those periods of time the same IP would get hashed to the same hash, and some people depended on that.
We recently made the Event Logging processor parallel, and we accidentally forgot to make this hashing consistent across all the parallel instances. So from September 10, 2015 until we fix the bug, client IPs will not be hashed consistently.
We are tracking this issue here: https://phabricator.wikimedia.org/T112688
If you have some data crunching that's affected by this, come talk to us. We are already adding a temporary fix to the scripts that generate the edit-analysis dashboard [1]
Does this mean:
1. Same IP, different hashes; 2. Different IPs, same hash; 3. Both?
(I imagine just 1, MD5 isn't /that/ crap at collision resistance, but.)
On 15 September 2015 at 15:50, Dan Andreescu dandreescu@wikimedia.org wrote:
When we process Event Logging events, we hash the origin IP address and add it to the event as part of the "capsule. We salt the hash function and rotate the salt frequently for security, but within those periods of time the same IP would get hashed to the same hash, and some people depended on that.
We recently made the Event Logging processor parallel, and we accidentally forgot to make this hashing consistent across all the parallel instances. So from September 10, 2015 until we fix the bug, client IPs will not be hashed consistently.
We are tracking this issue here: https://phabricator.wikimedia.org/T112688
If you have some data crunching that's affected by this, come talk to us. We are already adding a temporary fix to the scripts that generate the edit-analysis dashboard [1]
[1] https://edit-analysis.wmflabs.org/compare/
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics
- Same IP, different hashes;
Yes, the same IP could emit events that get handled by different instances of the now-parallel processor. This means that events with the same IP could have different hashes.
- Different IPs, same hash;
This was always possible in the old system due to hash collisions. But, like, very very unlikely. And this hasn't been influenced by the new bug in any way I'm aware of. Though maybe some number theoretician somewhere just did a telepathic spit take.
On 15 September 2015 at 16:02, Dan Andreescu dandreescu@wikimedia.org wrote:
- Same IP, different hashes;
Yes, the same IP could emit events that get handled by different instances of the now-parallel processor. This means that events with the same IP could have different hashes.
Awesome; thanks!
- Different IPs, same hash;
This was always possible in the old system due to hash collisions. But, like, very very unlikely. And this hasn't been influenced by the new bug in any way I'm aware of. Though maybe some number theoretician somewhere just did a telepathic spit take.
Hah!
Analytics mailing list Analytics@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics