I'm afraid our Tatar is correct in some senses and others in this thread are in a failing or failed mode.
Each web server, of which the WMF has a few, collects details on the behaviour of IPs, in logs. Those logs can be and probably have been requested by certain government officials, most likely for the purpose of tracking down who is behind a certain "Bad" posting to a BLP.
In addition, courts can make such orders in order to determine an otherwise "John Doe" named in a suit, such as for libel, etc. It's happened it will continue to happen, the WMF does keep such logs.
Knowing the IP, it can then be tracked back to that user's ISP and a log again requested to determine the exact person, or at least business or household, who used the IP at that exact time. So playing with words, doesn't let us get around that point.
I'm still not clear why we would want to know the IP exactly for analytical purposes. Some intrepid programmer could write a program which would simply collect detailed analysis of a person's in-world behaviour and call them "Bob992" instead of 13.42.204.192 or whatever. Making the information packets anonymous. That would still allow any sort of analysis the Tatars want to make, and not reveal any private information.
W
On Sun, Nov 28, 2010 at 12:38 PM, Domas Mituzas midom.lists@gmail.com wrote:
Logs cannot be read by wikipedia owners or us government because they don't exist.
There aren't any raw logs?
On Sun, Nov 28, 2010 at 2:30 PM, WJhonson@aol.com wrote:
Each web server, of which the WMF has a few, collects details on the behaviour of IPs, in logs. Those logs can be and probably have been requested by certain government officials, most likely for the purpose of tracking down who is behind a certain "Bad" posting to a BLP.
Presumably they would usually just use CheckUser data for that.
On Sun, Nov 28, 2010 at 2:30 PM, WJhonson@aol.com wrote:
I'm still not clear why we would want to know the IP exactly for analytical purposes. Some intrepid programmer could write a program which would simply collect detailed analysis of a person's in-world behaviour and call them "Bob992" instead of 13.42.204.192 or whatever. Making the information packets anonymous. That would still allow any sort of analysis the Tatars want to make, and not reveal any private information.
It's a bit more complicated than that. Sometimes anonymous isn't anonymous enough: http://en.wikipedia.org/wiki/AOL_search_data_scandal
2010/11/28 WJhonson@aol.com:
I'm still not clear why we would want to know the IP exactly for analytical purposes. Some intrepid programmer could write a program which would simply collect detailed analysis of a person's in-world behaviour and call them "Bob992" instead of 13.42.204.192 or whatever. Making the information packets anonymous. That would still allow any sort of analysis the Tatars want to make, and not reveal any private information.
i just has not thought about that as threat. theoretically ip addresses can be used to count how much wikipedia readers are in russia regions. such statistics is made by russian counters: liveinternet, and maybe, mail.ru . but i do not know whether any tatar can get such database to make such counter for wikipedia logs. probably some russian companies can make such analysis for russian and tatar and other wikipedias of languages of russia.
the Tatars want to make
on the one hand, i do not represent [all] tatars, and on the one hand, i think i represent also other language native speakers.
On Sun, Nov 28, 2010 at 3:21 PM, dinar qorbanof qdinar@gmail.com wrote:
2010/11/28 WJhonson@aol.com:
I'm still not clear why we would want to know the IP exactly for
analytical
purposes. Some intrepid programmer could write a program which would simply collect detailed analysis of a person's in-world behaviour and
call them
"Bob992" instead of 13.42.204.192 or whatever. Making the information packets anonymous. That would still allow any sort of analysis the
Tatars want to
make, and not reveal any private information.
i just has not thought about that as threat. theoretically ip addresses can be used to count how much wikipedia readers are in russia regions. such statistics is made by russian counters: liveinternet, and maybe, mail.ru . but i do not know whether any tatar can get such database to make such counter for wikipedia logs. probably some russian companies can make such analysis for russian and tatar and other wikipedias of languages of russia.
the Tatars want to make
on the one hand, i do not represent [all] tatars, and on the one hand, i think i represent also other language native speakers.
The sampled 1/1000 squid logs can be used for statistical purposes, such as page view stats. Someone more techy can answer that better than I can, if the samples include IP addresses that could be used w/ geoip for geographic analysis. (I think perhaps not)
Here are the page view stats generated from the squid sample logs:
For other analysis of readership, we do get stats from comScore, but that's survey data from panelists and nothing to do with logs.
http://meta.wikimedia.org/wiki/User:Stu/comScore_data_on_Wikimedia
-Katie (@aude)
_______________________________________________
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
The sampled 1/1000 squid logs can be used for statistical purposes, such as page view stats. Someone more techy can answer that better than I can, if the samples include IP addresses that could be used w/ geoip for geographic analysis. (I think perhaps not)
we do aggregations on full sample, not 1/1000 1/1000 gets saved to a file for post-mortems and "wtf is going on" type of analysis.
Domas
On 29 November 2010 10:11, Domas Mituzas midom.lists@gmail.com wrote:
The sampled 1/1000 squid logs can be used for statistical purposes, such as page view stats. Someone more techy can answer that better than I can, if the samples include IP addresses that could be used w/ geoip for geographic analysis. (I think perhaps not)
we do aggregations on full sample, not 1/1000 1/1000 gets saved to a file for post-mortems and "wtf is going on" type of analysis.
Ah, that explains it - I was wondering how we could get something as precise as "three views one day, five the next" out of a 1/1000 sample! So am I right in assuming that what happens is:
1) page request comes in and is served 2) every thousandth request is sent to a separate file and logged 3) the rest are stripped of all data bar "X page requested" 4) this is kept for the pageview statistics, which are very fine-grained
The end result: one file with 0.1% of requests logged in detail and another file with "hit counts" and no more.
On Sun, Nov 28, 2010 at 2:30 PM, WJhonson@aol.com wrote:
I'm afraid our Tatar is correct in some senses and others in this thread are in a failing or failed mode.
Each web server, of which the WMF has a few, collects details on the behaviour of IPs, in logs. Those logs can be and probably have been requested by certain government officials, most likely for the purpose of tracking down who is behind a certain "Bad" posting to a BLP.
CheckUser data (IPs of editors) are kept for 3 months.
http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/CheckUser/CheckUs...
WMF does not keep apache logs which would track what pages people are reading.''
http://noc.wikimedia.org/conf/httpd.conf (see CustomLog which is commented out, meaning that access logs are not kept)
There are some logs for the squid servers which are used to generate page view stats, but those take a 1/1000 sample and there are full squid logs for click throughs on the fundraising banners.
http://wikitech.wikimedia.org/view/Squid_logging
So, we do not have readership logs except for the sampled squid logs. For performance reasons, it's not desirable to collect more detailed logs, nor would we really want them.
-Katie (@aude)
In addition, courts can make such orders in order to determine an otherwise "John Doe" named in a suit, such as for libel, etc. It's happened it will continue to happen, the WMF does keep such logs.
Knowing the IP, it can then be tracked back to that user's ISP and a log again requested to determine the exact person, or at least business or household, who used the IP at that exact time. So playing with words, doesn't let us get around that point.
I'm still not clear why we would want to know the IP exactly for analytical purposes. Some intrepid programmer could write a program which would simply collect detailed analysis of a person's in-world behaviour and call them "Bob992" instead of 13.42.204.192 or whatever. Making the information packets anonymous. That would still allow any sort of analysis the Tatars want to make, and not reveal any private information.
W _______________________________________________ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
:) ok then. thank you. i should ask first whether wikipedia collects logs.
2010/11/28 aude aude.wiki@gmail.com:
On Sun, Nov 28, 2010 at 2:30 PM, WJhonson@aol.com wrote:
I'm afraid our Tatar is correct in some senses and others in this thread are in a failing or failed mode.
Each web server, of which the WMF has a few, collects details on the behaviour of IPs, in logs. Those logs can be and probably have been requested by certain government officials, most likely for the purpose of tracking down who is behind a certain "Bad" posting to a BLP.
CheckUser data (IPs of editors) are kept for 3 months.
http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/CheckUser/CheckUs...
WMF does not keep apache logs which would track what pages people are reading.''
http://noc.wikimedia.org/conf/httpd.conf (see CustomLog which is commented out, meaning that access logs are not kept)
There are some logs for the squid servers which are used to generate page view stats, but those take a 1/1000 sample and there are full squid logs for click throughs on the fundraising banners.
http://wikitech.wikimedia.org/view/Squid_logging
So, we do not have readership logs except for the sampled squid logs. For performance reasons, it's not desirable to collect more detailed logs, nor would we really want them.
-Katie (@aude)
In addition, courts can make such orders in order to determine an otherwise "John Doe" named in a suit, such as for libel, etc. It's happened it will continue to happen, the WMF does keep such logs.
Knowing the IP, it can then be tracked back to that user's ISP and a log again requested to determine the exact person, or at least business or household, who used the IP at that exact time. So playing with words, doesn't let us get around that point.
I'm still not clear why we would want to know the IP exactly for analytical purposes. Some intrepid programmer could write a program which would simply collect detailed analysis of a person's in-world behaviour and call them "Bob992" instead of 13.42.204.192 or whatever. Making the information packets anonymous. That would still allow any sort of analysis the Tatars want to make, and not reveal any private information.
W _______________________________________________ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
and i write again, do not you or somebody know why my messages are not published in the official mail archive? i do not format my message correctly?
We should all be asking "Is there really a problem here that would justify creating a major exception to our privacy policies?" -- because I haven't seen one. Did anyone notice how some of the earlier posts were suggesting that it was OK because people can anonymize themselves with a proxy or some other option -- a situation that would require a user (possibly one with no understanding of the concept of open proxies) to take technical steps simply to "opt-in" to privacy. Also, did anyone think to ask the tech team whether they'd be OK shouldering the burden of releasing these logs? Or the OTRS team whether they're OK with dealing the email burden that would come with that? Or Communications to see whether they agree with the negative PR of this?
Any one of these above steps would probably have revealed that it is a bad idea. Just sayin.
-Dan On Nov 28, 2010, at 3:41 PM, dinar qorbanof wrote:
:) ok then. thank you. i should ask first whether wikipedia collects logs.
2010/11/28 aude aude.wiki@gmail.com:
On Sun, Nov 28, 2010 at 2:30 PM, WJhonson@aol.com wrote:
I'm afraid our Tatar is correct in some senses and others in this thread are in a failing or failed mode.
Each web server, of which the WMF has a few, collects details on the behaviour of IPs, in logs. Those logs can be and probably have been requested by certain government officials, most likely for the purpose of tracking down who is behind a certain "Bad" posting to a BLP.
CheckUser data (IPs of editors) are kept for 3 months.
http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/CheckUser/CheckUs...
WMF does not keep apache logs which would track what pages people are reading.''
http://noc.wikimedia.org/conf/httpd.conf (see CustomLog which is commented out, meaning that access logs are not kept)
There are some logs for the squid servers which are used to generate page view stats, but those take a 1/1000 sample and there are full squid logs for click throughs on the fundraising banners.
http://wikitech.wikimedia.org/view/Squid_logging
So, we do not have readership logs except for the sampled squid logs. For performance reasons, it's not desirable to collect more detailed logs, nor would we really want them.
-Katie (@aude)
In addition, courts can make such orders in order to determine an otherwise "John Doe" named in a suit, such as for libel, etc. It's happened it will continue to happen, the WMF does keep such logs.
Knowing the IP, it can then be tracked back to that user's ISP and a log again requested to determine the exact person, or at least business or household, who used the IP at that exact time. So playing with words, doesn't let us get around that point.
I'm still not clear why we would want to know the IP exactly for analytical purposes. Some intrepid programmer could write a program which would simply collect detailed analysis of a person's in-world behaviour and call them "Bob992" instead of 13.42.204.192 or whatever. Making the information packets anonymous. That would still allow any sort of analysis the Tatars want to make, and not reveal any private information.
W _______________________________________________ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
and i write again, do not you or somebody know why my messages are not published in the official mail archive? i do not format my message correctly?
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
On Sun, Nov 28, 2010 at 3:41 PM, dinar qorbanof qdinar@gmail.com wrote:
:) ok then. thank you. i should ask first whether wikipedia collects logs.
2010/11/28 aude aude.wiki@gmail.com:
On Sun, Nov 28, 2010 at 2:30 PM, WJhonson@aol.com wrote:
I'm afraid our Tatar is correct in some senses and others in this thread are in a failing or failed mode.
Each web server, of which the WMF has a few, collects details on the behaviour of IPs, in logs. Those logs can be and probably have been requested by certain government officials, most likely for the purpose of tracking
down
who is behind a certain "Bad" posting to a BLP.
CheckUser data (IPs of editors) are kept for 3 months.
http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/CheckUser/CheckUs...
WMF does not keep apache logs which would track what pages people are reading.''
http://noc.wikimedia.org/conf/httpd.conf (see CustomLog which is
commented
out, meaning that access logs are not kept)
There are some logs for the squid servers which are used to generate page view stats, but those take a 1/1000 sample and there are full squid logs
for
click throughs on the fundraising banners.
http://wikitech.wikimedia.org/view/Squid_logging
So, we do not have readership logs except for the sampled squid logs.
For
performance reasons, it's not desirable to collect more detailed logs,
nor
would we really want them.
-Katie (@aude)
In addition, courts can make such orders in order to determine an
otherwise
"John Doe" named in a suit, such as for libel, etc. It's happened it
will
continue to happen, the WMF does keep such logs.
Knowing the IP, it can then be tracked back to that user's ISP and a log again requested to determine the exact person, or at least business or household, who used the IP at that exact time. So playing with words, doesn't let us get around that point.
I'm still not clear why we would want to know the IP exactly for
analytical
purposes. Some intrepid programmer could write a program which would simply collect detailed analysis of a person's in-world behaviour and
call
them "Bob992" instead of 13.42.204.192 or whatever. Making the information packets anonymous. That would still allow any sort of analysis the
Tatars
want to make, and not reveal any private information.
W _______________________________________________ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
and i write again, do not you or somebody know why my messages are not published in the official mail archive? i do not format my message correctly?
I don't know. :/
-Katie (aude)
_______________________________________________
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
2010/11/28 dinar qorbanof qdinar@gmail.com:
i should ask first whether wikipedia collects logs.
no, probably as i asked is better, because it like has 2 questions in it, and usually logs are collected, and it is said in privacy policy that they may be collected, and even if they are not collected now, they can be collected.
2010/11/28 dinar qorbanof qdinar@gmail.com:
2010/11/28 FastLizard4 fastlizard4@gmail.com:
Here in the U.S., ISPs keep records of who used what IP address at what time. So, let's say that I had a dynamic IP address that changed every day. If I got arrested and the courts ordered my ISP to give them a list of IP addresses I have used in the last month, they would do so, complete with the times I used each IP address.
so in russia. i say only about relative anonymousity, not against government, but against different people.
i said "only about relative anonymousity, not against government, but against different people" about this: i said:
i think that probably they intentionally use dynamic ip for some anonymousity
for that i had said "some". also it is anonymous against government temporarily, before they trace source ip. (temporary anonymousity of reading is basic right, for more, easier trustability of internet, to check that server is not fooling users sending different content to different people. for example, theoretically, "closed" social network or forum can show to user his post with link, hoping he will feel ok, but hide it from other users, this maybe, for example because it is link to competitor site and owners of the closed site do some agressive smo blocking link to competitor sites. "closed" mean that you cannot read it if you are not logged in. but in this case the closed site also has risk to be "caught", if user checks his post with several people through alternative channels or publishing in open site about that he posted in closed site. but it is hard to check so. availability of anonymous reading is easy. and should be no premoderation. in that case, even users who do not see any link to competitor sites can be sure that they see that anybody else see in the site, and that anybody can post useful links and other posts, at least temporarily. )
Hi!
Each web server, of which the WMF has a few, collects details on the behaviour of IPs, in logs. Those logs can be and probably have been requested by certain government officials, most likely for the purpose of tracking down who is behind a certain "Bad" posting to a BLP.
We log edits, not page views. These are not 'web server' logs, these are mediawiki logs.
Domas
wikimedia-l@lists.wikimedia.org