Hi all
I wonder if the column rc_ip can be safely removed from mediawiki. As far as I know, it has been superceeded by the information in cu_changes, right? So I propose to
a) either completely remove it b) or set $wgPutIPinRC = false per default c) and set $wgPutIPinRC = false for wikimedia projects.
The rationale is that IP addresses should only be recorded if this is explicitly requested. In some jurisdiction (e.g. Germany) it is illegal to record this information without notifying the users, and there is a requirement to only record data at all if necessary. So it should be off per default. Many privacy-concious hosters disable IP records in the server logs, recording the IP in the database per default subverts this.
As to disabling on Wikimedia projects: this would make things easier for the toolserver mainly. We don't need this info there, and having it might conceivably make WMDE the target of legal action. We can easily exclude cu_changes from replication, but we can't easily filter rc_ip from recentchanges. So it would be better if it wasn't there in the first place.
Thoughts?
Daniel
On 17/06/2009, at 10:39 AM, Daniel Kinzler wrote:
Hi all
I wonder if the column rc_ip can be safely removed from mediawiki. As far as I know, it has been superceeded by the information in cu_changes, right?
rc_ip is used by the retroactive autoblock functionality (i.e. when a user is first blocked, an autoblock is placed on the last IP address they used, using rc_ip as the data source).
-- Andrew Garrett Contract Developer, Wikimedia Foundation agarrett@wikimedia.org http://werdn.us
Andrew Garrett schrieb:
On 17/06/2009, at 10:39 AM, Daniel Kinzler wrote:
Hi all
I wonder if the column rc_ip can be safely removed from mediawiki. As far as I know, it has been superceeded by the information in cu_changes, right?
rc_ip is used by the retroactive autoblock functionality (i.e. when a user is first blocked, an autoblock is placed on the last IP address they used, using rc_ip as the data source).
Hm... i guess that could be made to use cu_changes instead, using a hook like "getPastUserIPs". That feels a little icky, but might still be worth it...
-- daniel
nc the cu is an extension and not part of trunk so thst will not work
On 6/17/09, Daniel Kinzler daniel@brightbyte.de wrote:
Andrew Garrett schrieb:
On 17/06/2009, at 10:39 AM, Daniel Kinzler wrote:
Hi all
I wonder if the column rc_ip can be safely removed from mediawiki. As far as I know, it has been superceeded by the information in cu_changes, right?
rc_ip is used by the retroactive autoblock functionality (i.e. when a user is first blocked, an autoblock is placed on the last IP address they used, using rc_ip as the data source).
Hm... i guess that could be made to use cu_changes instead, using a hook like "getPastUserIPs". That feels a little icky, but might still be worth it...
-- daniel
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
John Doe schrieb:
nc the cu is an extension and not part of trunk so thst will not work
Huh, why shouldn't it? The idea was that the extension would hijack the way Block.php looks up the past IP addresses, and does it via cu_changes. So the IPÜs would no langer be needed in rc.
I admin though that this is completly pointless except for the one thing I want it for - avoiding to get user IPs on the toolserver.
On 6/17/09, Daniel Kinzler daniel@brightbyte.de wrote:
Andrew Garrett schrieb:
On 17/06/2009, at 10:39 AM, Daniel Kinzler wrote:
Hi all
I wonder if the column rc_ip can be safely removed from mediawiki. As far as I know, it has been superceeded by the information in cu_changes, right?
rc_ip is used by the retroactive autoblock functionality (i.e. when a user is first blocked, an autoblock is placed on the last IP address they used, using rc_ip as the data source).
Hm... i guess that could be made to use cu_changes instead, using a hook like "getPastUserIPs". That feels a little icky, but might still be worth it...
-- daniel
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
I have access to the toolserver and rcip is not available. please note that the chechuser table is created by an extension and not by mediawiki thus you cannot depend on extensions that my or may not be insalled on all wikis. it may be that way on MWF wikis bit mediawiki is not just used on MWF wikis. what you want done cannot be done untill CU is merged into the trunk
On 6/17/09, Daniel Kinzler daniel@brightbyte.de wrote:
John Doe schrieb:
nc the cu is an extension and not part of trunk so thst will not work
Huh, why shouldn't it? The idea was that the extension would hijack the way Block.php looks up the past IP addresses, and does it via cu_changes. So the IPÜs would no langer be needed in rc.
I admin though that this is completly pointless except for the one thing I want it for - avoiding to get user IPs on the toolserver.
On 6/17/09, Daniel Kinzler daniel@brightbyte.de wrote:
Andrew Garrett schrieb:
On 17/06/2009, at 10:39 AM, Daniel Kinzler wrote:
Hi all
I wonder if the column rc_ip can be safely removed from mediawiki. As far as I know, it has been superceeded by the information in cu_changes, right?
rc_ip is used by the retroactive autoblock functionality (i.e. when a user is first blocked, an autoblock is placed on the last IP address they used, using rc_ip as the data source).
Hm... i guess that could be made to use cu_changes instead, using a hook like "getPastUserIPs". That feels a little icky, but might still be worth it...
-- daniel
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
2009/6/17 John Doe phoenixoverride@gmail.com:
please note that the chechuser table is created by an extension and not by mediawiki thus you cannot depend on extensions that my or may not be insalled on all wikis. it may be that way on MWF wikis bit mediawiki is not just used on MWF wikis. what you want done cannot be done untill CU is merged into the trunk
The point is that in core, the hook would be implemented using rc_ip. The CheckUser extension could then override the hook to use cu_changes, which enables any wiki with the CheckUser extension to set $wgPutIPinRC = false without any loss of functionality.
-- [[cs:User:Mormegil | Petr Kadlec]]
On 17/06/2009, at 2:26 PM, John Doe wrote:
I have access to the toolserver and rcip is not available. please note that the chechuser table is created by an extension and not by mediawiki thus you cannot depend on extensions that my or may not be insalled on all wikis. it may be that way on MWF wikis bit mediawiki is not just used on MWF wikis. what you want done cannot be done untill CU is merged into the trunk
As Daniel explained, the proposal is to disable the functionality except for wikis with an extension providing the past-IP data installed. On these wikis, the data would be provided by means of a hook.
Approximate pseudocode:
Instead of selecting rc_ip in the Block class, use the following $ips = array(); wfRunHooks( 'UserGetPastIPs', array( $user, &$ips ) );
In the CheckUser extension, hook this: function onUserGetPastIPs( $user, &$ips ) { // Add IPs found by checkuser extension to the $ips array. }
Of course, this is pointless if you can filter out rc_ip with a view on the toolserver...
-- Andrew Garrett Contract Developer, Wikimedia Foundation agarrett@wikimedia.org http://werdn.us
Andrew Garrett schrieb:
On 17/06/2009, at 2:26 PM, John Doe wrote:
I have access to the toolserver and rcip is not available. please note that the chechuser table is created by an extension and not by mediawiki thus you cannot depend on extensions that my or may not be insalled on all wikis. it may be that way on MWF wikis bit mediawiki is not just used on MWF wikis. what you want done cannot be done untill CU is merged into the trunk
As Daniel explained, the proposal is to disable the functionality except for wikis with an extension providing the past-IP data installed. On these wikis, the data would be provided by means of a hook.
Yep. But it's a bit painful... hmpf.
Of course, this is pointless if you can filter out rc_ip with a view on the toolserver...
No - the problem is, if it is on the toolserver at all, wmde might be sued into revealing it. So it would be nice to not copy it there. Hiding it from *users* is not the issue, we have been doing that for years.
-- daniel
On Wed, Jun 17, 2009 at 6:44 AM, Daniel Kinzler daniel@brightbyte.dewrote:
Andrew Garrett schrieb:
On 17/06/2009, at 2:26 PM, John Doe wrote:
I have access to the toolserver and rcip is not available. please note that the chechuser table is created by an extension and not by mediawiki thus you cannot depend on extensions that my or may not be insalled on all wikis. it may be that way on MWF wikis bit mediawiki is not just used on MWF wikis. what you want done cannot be done untill CU is merged into the trunk
As Daniel explained, the proposal is to disable the functionality except for wikis with an extension providing the past-IP data installed. On these wikis, the data would be provided by means of a hook.
Yep. But it's a bit painful... hmpf.
Of course, this is pointless if you can filter out rc_ip with a view on the toolserver...
No - the problem is, if it is on the toolserver at all, wmde might be sued into revealing it. So it would be nice to not copy it there. Hiding it from *users* is not the issue, we have been doing that for years.
If the current concern is the toolserver, and the toolserver never actually uses this info, shouldn't it be possible to set up a process that routinely wiped out those records? Not an ideal solution perhaps, but the contortions you seem to be suggesting for the core don't strike me as ideal either.
-Robert Rohde
Robert Rohde schrieb:
If the current concern is the toolserver, and the toolserver never actually uses this info, shouldn't it be possible to set up a process that routinely wiped out those records? Not an ideal solution perhaps, but the contortions you seem to be suggesting for the core don't strike me as ideal either.
-Robert Rohde
Yes, that is basically the tradeoff that I'm, considering. Running such a wipe process on nearly 800 databases is a bit annoying.
Also, I have no idea what the legal implicaytions are of actively destroying the information vs. never receiving it.
-- daniel
John Doe schrieb:
I have access to the toolserver and rcip is not available.
It's not available to users via the xxx_p views. it is however replicated and available to admins, and this accessible to wmde.
please note that the chechuser table is created by an extension and not by mediawiki thus you cannot depend on extensions that my or may not be insalled on all wikis.
As Petr explained, it wouldn't. The extension would simply override the default way to access the IP, making it unnecessary to keep the IP data in rc_ip *and* cuc_ip.
it may be that way on MWF wikis bit mediawiki is not just used on MWF wikis.
I'm well aware of that.
what you want done cannot be done untill CU is merged into the trunk
Not true, as explained above. Using a hook the way I proposed is a bit painful though. The question is if it's worth is, and if there's a better way.
-- daniel
On 6/17/09, Daniel Kinzler daniel@brightbyte.de wrote:
John Doe schrieb:
nc the cu is an extension and not part of trunk so thst will not work
Huh, why shouldn't it? The idea was that the extension would hijack the way Block.php looks up the past IP addresses, and does it via cu_changes. So the IPÜs would no langer be needed in rc.
I admin though that this is completly pointless except for the one thing I want it for - avoiding to get user IPs on the toolserver.
On 6/17/09, Daniel Kinzler daniel@brightbyte.de wrote:
Andrew Garrett schrieb:
On 17/06/2009, at 10:39 AM, Daniel Kinzler wrote:
Hi all
I wonder if the column rc_ip can be safely removed from mediawiki. As far as I know, it has been superceeded by the information in cu_changes, right?
rc_ip is used by the retroactive autoblock functionality (i.e. when a user is first blocked, an autoblock is placed on the last IP address they used, using rc_ip as the data source).
Hm... i guess that could be made to use cu_changes instead, using a hook like "getPastUserIPs". That feels a little icky, but might still be worth it...
-- daniel
Platonides schrieb:
Daniel Kinzler wrote:
Not true, as explained above. Using a hook the way I proposed is a bit painful though. The question is if it's worth is, and if there's a better way.
-- daniel
It could replace the block class overriding doRetroactiveAutoblock()
Right, but I don't see how that would be less painful :)
-- daniel
On Wed, Jun 17, 2009 at 5:39 AM, Daniel Kinzlerdaniel@brightbyte.de wrote:
As to disabling on Wikimedia projects: this would make things easier for the toolserver mainly. We don't need this info there, and having it might conceivably make WMDE the target of legal action. We can easily exclude cu_changes from replication, but we can't easily filter rc_ip from recentchanges. So it would be better if it wasn't there in the first place.
We could create a trigger to stop rc_ip from being populated, or maybe something fancy with inserting into views. Or even change rc_ip to a CHAR(0) or something so MySQL silently truncates the insert value. But the values would still be in the binlogs, so it's not perfect. On the other hand, we don't keep binlogs for very long, so this would be a large improvement over the current situation.
If we need to scrub the binlogs as well, we'd have to delete IP addresses from the binlogs as they're pulled in from the Wikipedia masters, before they get executed. I recall River discussing the possibility of doing something similar for a different purpose, but I assume it would be more involved.
I don't think this change would be a good idea from the perspective of non-toolserver users.
Aryeh Gregor schrieb:
On Wed, Jun 17, 2009 at 5:39 AM, Daniel Kinzlerdaniel@brightbyte.de wrote:
As to disabling on Wikimedia projects: this would make things easier for the toolserver mainly. We don't need this info there, and having it might conceivably make WMDE the target of legal action. We can easily exclude cu_changes from replication, but we can't easily filter rc_ip from recentchanges. So it would be better if it wasn't there in the first place.
We could create a trigger to stop rc_ip from being populated, or maybe something fancy with inserting into views. Or even change rc_ip to a CHAR(0) or something so MySQL silently truncates the insert value.
Hehe, nice hack :) But tricky when new databases get added.
Naw, in the end, I want to get rid of the entries in recentchanges alltogether. One clean solution would to simply say that retroactive autoblocks become tied to checkuser functionality. But I suppose that, too, is a bit awkward.
-- daniel
On 17/06/2009, at 8:33 PM, Daniel Kinzler wrote:
Naw, in the end, I want to get rid of the entries in recentchanges alltogether. One clean solution would to simply say that retroactive autoblocks become tied to checkuser functionality. But I suppose that, too, is a bit awkward.
I don't think the toolserver should really influence core MediaWiki development to this extent.
If replicating entire tables to the toolserver isn't working from a legal perspective, then maybe we should try something else. There are plenty of other problems with this approach, too.
-- Andrew Garrett Contract Developer, Wikimedia Foundation agarrett@wikimedia.org http://werdn.us
On Wed, Jun 17, 2009 at 12:37 PM, Andrew Garrett agarrett@wikimedia.orgwrote:
If replicating entire tables to the toolserver isn't working from a legal perspective, then maybe we should try something else. There are plenty of other problems with this approach, too.
I must confess my understanding of replication is crude, but it seems like a lot of problems might be solved if WMF could establish some form of intermediate replication server that took existing databases, stripped out the private bits, and then offered the remainder for public replication. I'd imagine it could be useful not just for the toolserver, but also to academics, and others.
-Robert Rohde
Robert Rohde wrote:
I must confess my understanding of replication is crude, but it seems like a lot of problems might be solved if WMF could establish some form of intermediate replication server that took existing databases, stripped out the private bits, and then offered the remainder for public replication. I'd imagine it could be useful not just for the toolserver, but also to academics, and others.
Given the amount of trouble it is just to maintain the toolserver replica in the first place, supporting third-party users with replication doesn't sound like a lot of fun to me. Is that something that toolserver administrators think they would have the time and manpower to handle?
-- brion
Brion Vibber schrieb:
Robert Rohde wrote:
I must confess my understanding of replication is crude, but it seems like a lot of problems might be solved if WMF could establish some form of intermediate replication server that took existing databases, stripped out the private bits, and then offered the remainder for public replication. I'd imagine it could be useful not just for the toolserver, but also to academics, and others.
Given the amount of trouble it is just to maintain the toolserver replica in the first place, supporting third-party users with replication doesn't sound like a lot of fun to me. Is that something that toolserver administrators think they would have the time and manpower to handle?
No, people can either use the XML dumps or get a TS account.
-- daniel
On Wed, Jun 17, 2009 at 3:33 PM, Daniel Kinzlerdaniel@brightbyte.de wrote:
Hehe, nice hack :) But tricky when new databases get added.
No, just alter them. It will truncate all existing entries too.
Naw, in the end, I want to get rid of the entries in recentchanges alltogether.
Why, other than the toolserver?
One clean solution would to simply say that retroactive autoblocks become tied to checkuser functionality. But I suppose that, too, is a bit awkward.
And it would mean IP autoblocks would fail in stock MediaWiki, which is a regression that would have to be justified seriously.
wikitech-l@lists.wikimedia.org