On Wed, Jun 17, 2009 at 5:39 AM, Daniel Kinzlerdaniel@brightbyte.de wrote:
As to disabling on Wikimedia projects: this would make things easier for the toolserver mainly. We don't need this info there, and having it might conceivably make WMDE the target of legal action. We can easily exclude cu_changes from replication, but we can't easily filter rc_ip from recentchanges. So it would be better if it wasn't there in the first place.
We could create a trigger to stop rc_ip from being populated, or maybe something fancy with inserting into views. Or even change rc_ip to a CHAR(0) or something so MySQL silently truncates the insert value. But the values would still be in the binlogs, so it's not perfect. On the other hand, we don't keep binlogs for very long, so this would be a large improvement over the current situation.
If we need to scrub the binlogs as well, we'd have to delete IP addresses from the binlogs as they're pulled in from the Wikipedia masters, before they get executed. I recall River discussing the possibility of doing something similar for a different purpose, but I assume it would be more involved.
I don't think this change would be a good idea from the perspective of non-toolserver users.