In less than a few moments after the straw poll at [[Wikipedia:Quick and dirty Checkuser policy/proposal]] passed, a new "Requests for checkuser rights" section was added to WP:RFA. As of this writing, two users have already created nomination pages for that section.
Although I am for having more users use the checkuser tool, my biggest concern is that we are jumping the gun without fully setting up a full privacy policy of checks and balances for those that will have access to checkuser. A few users, including myself, have indicated this on those nomination pages.
Zzyzx11 at en.wikipedia.org http://en.wikipedia.org/wiki/User:Zzyzx11 zzyzx11@hotmail.com
_________________________________________________________________ Express yourself instantly with MSN Messenger! Download today - it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/
We don't need more than a minimal quota of checkusers-enabled people. If people apply for checkuser rights, refuse. It's as simple as that.
Zzyzx is right. A lot of people supported the idea, provided more detailed policy was created. We should do that before accepting any nominations.
--Mgm
Can someone clarify whether the software records your IP address every time you access a page, or only when you edit? If it's the former, does anyone have any ideas how to block this information without constantly logging in and out? I don't mind Wikimedia tracking what I edit, but I don't want you tracking what I read.
On 10/19/05, Anthony DiPierro wikispam@inbox.org wrote:
Can someone clarify whether the software records your IP address every time you access a page, or only when you edit? If it's the former, does anyone have any ideas how to block this information without constantly logging in and out? I don't mind Wikimedia tracking what I edit, but I don't want you tracking what I read.
see http://wikimediafoundation.org/wiki/Privacy_policy#Private_logging
On 10/19/05, Anthony DiPierro wikispam@inbox.org wrote:
Can someone clarify whether the software records your IP address every time you access a page, or only when you edit? If it's the former, does anyone have any ideas how to block this information without constantly logging in and out? I don't mind Wikimedia tracking what I edit, but I don't want you tracking what I read.
[[Special:CheckUser]] only shows what you've been editing. The server logs will show what your IP has been reading, but isn't logged (afaik) by user name.
Angela.
On 10/19/05, Angela beesley@gmail.com wrote:
On 10/19/05, Anthony DiPierro wikispam@inbox.org wrote:
Can someone clarify whether the software records your IP address every
time
you access a page, or only when you edit? If it's the former, does
anyone
have any ideas how to block this information without constantly logging
in
and out? I don't mind Wikimedia tracking what I edit, but I don't want you
tracking
what I read.
[[Special:CheckUser]] only shows what you've been editing. The server logs will show what your IP has been reading, but isn't logged (afaik) by user name.
So in theory a developer could match up the IP with the username and then look through the server logs, but that'd require a very intentional breach of ethics. Also, according to the link below, even these logs are only kept for 2 weeks.
If all of this is true, I think y'all have done a pretty good job of keeping this information private. I first became particularly concerned about this when Jimbo mentioned considering selling log data to researchers (implying to some extent that they were kept for longer than 2 weeks), and also I thought someone presented a sample log line which included username information.
I just looked back (gmail search is awesome), and here was the log line that was provided by Jerome Jamnicky: " 1124167686.523 210 12.34.56.78 http://12.34.56.78/ TCP_MISS/200 2962 GET http://en.wikipedia.org/wiki/Special:Search?search=Potato&go=Go - PARENT_HIT/207.142.131.200 text/html [Host: en.wikipedia.org\r\nUser-Agent : Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.10) Gecko/20050716 Firefox/1.0.6\r\nAccept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q= 0.8,image/png,*/*;q=0.5\r\nAccept-Language: en-us,en;q=0.5\r\nAccept-Encoding: gzip,deflate\r\nAccept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\nKeep-Alive: 300\r\nConnection: keep-alive\r\nReferer: http://en.wikipedia.org/wiki/Esoterica%5Cr%5Cn]http://en.wikipedia.org/wiki/Esoterica%5Cr%5Cn%5D [HTTP/1.0 200 OK\r\nDate: Tue, 16 Aug 2005 04:48:06 GMT\r\nServer: Apache\r\nX-Powered-By: PHP/4.3.11\r\nContent-language: en\r\nVary: Accept-Encoding,Cookie\r\nExpires: -1\r\nCache-Control: private, must-revalidate, max-age=0\r\nContent-Encoding: gzip\r\nConnection: close\r\nContent-Type: text/html; charset=utf-8\r\n\r]"
Now that I look at it again, it doesn't seem to have username information (not sure what 1124167686.523 is though, maybe a timestamp). Are these log files still thrown away after 2 weeks?
(For those following at home, the thread was entitled "Research access to logs", in September 2005 on the wikipedia-l mailing list.)
Angela.
And from Puddl Duk, "see http://wikimediafoundation.org/wiki/Privacy_policy#Private_logging"http://wikimediafoundation.org/wiki/Privacy_policy#Private_logging
Yes, I was going on my apparently bad memory of that previous thread and the contradiction of it with the privacy policy which was last updated in May. I also noticed recently that cookies were kept containing my username even after I log out.
The line from the log file given in that thread and the one in the privacy policy *are* different, too. Presumably one is the apache log and the other is the cache log.
Anthony DiPierro wrote:
If all of this is true, I think y'all have done a pretty good job of keeping this information private. I first became particularly concerned about this when Jimbo mentioned considering selling log data to researchers (implying to some extent that they were kept for longer than 2 weeks), and also I thought someone presented a sample log line which included username information.
There was never any proposal to provide access logs to researchers which allow identification of users. The idea was for researchers to write scripts which output either aggregate data or anonymised logs. We would run them on the logs and provide them with the results. It hasn't eventuated, though.
Any suggestion that Wikipedia might sell personal access pattern data to marketing firms in the way that the online advertising agencies do is pure paranoia.
I just looked back (gmail search is awesome), and here was the log line that was provided by Jerome Jamnicky: " 1124167686.523 210 12.34.56.78 http://12.34.56.78/ TCP_MISS/200 2962 GET http://en.wikipedia.org/wiki/Special:Search?search=Potato&go=Go - PARENT_HIT/207.142.131.200 text/html [Host: en.wikipedia.org\r\nUser-Agent : Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.10) Gecko/20050716 Firefox/1.0.6\r\nAccept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q= 0.8,image/png,*/*;q=0.5\r\nAccept-Language: en-us,en;q=0.5\r\nAccept-Encoding: gzip,deflate\r\nAccept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\nKeep-Alive: 300\r\nConnection: keep-alive\r\nReferer: http://en.wikipedia.org/wiki/Esoterica%5Cr%5Cn]http://en.wikipedia.org/wiki/Esoterica%5Cr%5Cn%5D [HTTP/1.0 200 OK\r\nDate: Tue, 16 Aug 2005 04:48:06 GMT\r\nServer: Apache\r\nX-Powered-By: PHP/4.3.11\r\nContent-language: en\r\nVary: Accept-Encoding,Cookie\r\nExpires: -1\r\nCache-Control: private, must-revalidate, max-age=0\r\nContent-Encoding: gzip\r\nConnection: close\r\nContent-Type: text/html; charset=utf-8\r\n\r]"
Now that I look at it again, it doesn't seem to have username information (not sure what 1124167686.523 is though, maybe a timestamp). Are these log files still thrown away after 2 weeks?
(For those following at home, the thread was entitled "Research access to logs", in September 2005 on the wikipedia-l mailing list.)
Angela.
We only keep such logs for about an hour, and at the moment, they're not even merged and processed. They're spread over the hard drives of 24 squids.
At the time Jerome was writing, we had a log aggregation and processing system set up, to generate statistics, but the logs were still deleted after they were processed, rather than archived. If we wanted to provide such information to external researchers, we'd first have to start gathering it.
Of course, just because we're not gathering such information at the moment doesn't mean we won't gather it in the future, within the limits of the privacy policy. If you're worried about the cops finding out what you were reading yesterday, don't be, but if they asked us today to give them a list of everything you read tomorrow, well we could probably sort that out.
-- Tim Starling