Hello..
..I think it could be useful to compute the probability an article B is read on condition that another article A is read whithin a short timeframe before from a specific reader. Based on this probabilities suggestions could be made to the reader of a specific article which articles could be also interesting (maybe a kind of collaborative filtering or Amazon's "Customers who bought this book also bought.."). Subscribed user could offered personalized recommendations based on the computation how probable it is that an article is of interest to that specific user who read those articles. I'd be interested in implementing that idea, so as a first step I'd be interested in a sample log file with a size of some Megabyte.
Tobias
Tobias Denninger wrote:
Hello..
..I think it could be useful to compute the probability an article B is read on condition that another article A is read whithin a short timeframe before from a specific reader. Based on this probabilities suggestions could be made to the reader of a specific article which articles could be also interesting (maybe a kind of collaborative filtering or Amazon's "Customers who bought this book also bought.."). Subscribed user could offered personalized recommendations based on the computation how probable it is that an article is of interest to that specific user who read those articles. I'd be interested in implementing that idea, so as a first step I'd be interested in a sample log file with a size of some Megabyte.
I'd better explain that. This is a person who approached me during Wikimania asking for access to logs. I was positive at the time but more guarded by email recently, given the tone of discussion in this thread. I suggested to him (and to the other person who approached me at Wikimania, who will hopefully post soon), that he state his case to the list, to allow for public debate. So please take him seriously, and ask him some questions.
-- Tim Starling
Tobias Denninger wrote:
Hello..
..I think it could be useful to compute the probability an article B is read on condition that another article A is read whithin a short timeframe before from a specific reader. Based on this probabilities suggestions could be made to the reader of a specific article which articles could be also interesting (maybe a kind of collaborative filtering or Amazon's "Customers who bought this book also bought.."). Subscribed user could offered personalized recommendations based on the computation how probable it is that an article is of interest to that specific user who read those articles. I'd be interested in implementing that idea, so as a first step I'd be interested in a sample log file with a size of some Megabyte.
Tobias
If you haven't already, please read the privacy policy carefully, and also this thread, where somebody made a similar request for a similar purpose: http://mail.wikipedia.org/pipermail/wikitech-l/2005-July/thread.html#30917
A single line from one of the squid server log files looks like this:
1124167686.523 210 12.34.56.78 TCP_MISS/200 2962 GET http://en.wikipedia.org/wiki/Special:Search?search=Potato&go=Go - PARENT_HIT/207.142.131.200 text/html [Host: en.wikipedia.org\r\nUser-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.10) Gecko/20050716 Firefox/1.0.6\r\nAccept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5\r\nAccept-Language: en-us,en;q=0.5\r\nAccept-Encoding: gzip,deflate\r\nAccept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7\r\nKeep-Alive: 300\r\nConnection: keep-alive\r\nReferer: http://en.wikipedia.org/wiki/Esoterica%5Cr%5Cn] [HTTP/1.0 200 OK\r\nDate: Tue, 16 Aug 2005 04:48:06 GMT\r\nServer: Apache\r\nX-Powered-By: PHP/4.3.11\r\nContent-language: en\r\nVary: Accept-Encoding,Cookie\r\nExpires: -1\r\nCache-Control: private, must-revalidate, max-age=0\r\nContent-Encoding: gzip\r\nConnection: close\r\nContent-Type: text/html; charset=utf-8\r\n\r]
Note that this format may change in the future. Is there anything else you need to know?
-Jerome
---- Send instant messages to your online friends http://au.messenger.yahoo.com
Jerome Jamnicky <jeronimwp@...> writes:
If you haven't already, please read the privacy policy carefully, and also this thread, where somebody made a similar request for a similar purpose: http://mail.wikipedia.org/pipermail/wikitech-l/2005-July/thread.html#30917
I don't need a direct access to the log file. I also don't want the statistics for myself but I had the idea a feature like I explained before could be useful for the wikipedia. So all personal data could remain on a wikipedia server. Personal recommendations would be visible only to the concerned user. I would just need a sample of the logfile to test scripts. But maybe we should first clarify if we want to add this feature..Tobias
Tobias Denninger wrote:
Jerome Jamnicky <jeronimwp@...> writes:
If you haven't already, please read the privacy policy carefully, and also this thread, where somebody made a similar request for a similar purpose: http://mail.wikipedia.org/pipermail/wikitech-l/2005-July/thread.html#30917
I don't need a direct access to the log file. I also don't want the statistics for myself but I had the idea a feature like I explained before could be useful for the wikipedia. So all personal data could remain on a wikipedia server. Personal recommendations would be visible only to the concerned user. I would just need a sample of the logfile to test scripts. But maybe we should first clarify if we want to add this feature..Tobias
I thought it would be obvious, but giving you a sample of the logfile is a violation of the (current) privacy policy. So your options are to offer a script to alter the logfile so that we would not violate the policy by giving the resulting file to you, or to get the policy changed.
-Jerome --- Send instant messages to your online friends http://au.messenger.yahoo.com
Jerome Jamnicky wrote:
I thought it would be obvious, but giving you a sample of the logfile is a violation of the (current) privacy policy. So your options are to offer a script to alter the logfile so that we would not violate the policy by giving the resulting file to you, or to get the policy changed.
Well, it was me who suggested that, we could just give him log files for a few IP addresses corresponding to people who have given their consent. For example, we could give him logs of his own requests.
I'm now thinking, however, that log files are not what Tobias wants at all, instead he wants something similar to the page view counter, which was disabled for efficiency reasons.
-- Tim Starling
Tim Starling wrote:
Jerome Jamnicky wrote:
I thought it would be obvious, but giving you a sample of the logfile is a violation of the (current) privacy policy. So your options are to offer a script to alter the logfile so that we would not violate the policy by giving the resulting file to you, or to get the policy changed.
Well, it was me who suggested that, we could just give him log files for a few IP addresses corresponding to people who have given their consent. For example, we could give him logs of his own requests.
Oh okay. Well, if that's what he wants, I'll help out.
I'm now thinking, however, that log files are not what Tobias wants at all, instead he wants something similar to the page view counter, which was disabled for efficiency reasons.
Logwood gives page counts, but it's broken for en: at the moment. See for example: http://www2.knams.wikimedia.org/logwood//archive/hr.wikipedia.org/2005-08/20...
-- Tim Starling
foundation-l mailing list foundation-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/foundation-l
Send instant messages to your online friends http://au.messenger.yahoo.com
Tim Starling <t.starling@...> writes:
Jerome Jamnicky wrote:
I thought it would be obvious, but giving you a sample of the logfile is a violation of the (current) privacy policy. So your options are to offer a script to alter the logfile so that we would not violate the policy by giving the resulting file to you, or to get the policy changed.
Well, it was me who suggested that, we could just give him log files for a few IP addresses corresponding to people who have given their consent. For example, we could give him logs of his own requests.
I'm now thinking, however, that log files are not what Tobias wants at all, instead he wants something similar to the page view counter, which was disabled for efficiency reasons.
A page view counter does not include any information on the frequency two articles are viewed within a short timeframe by the same user. This information is stored in the logfile or it could be obtained changing the mediawiki software so that the relevant statistics are stored online in a database table with a kind of the following structure: (id_article_a, id_article_b, counter, timestamp). To prevent the table from scaling with the square of the number of articles the entries with the lowest counter could be deleted if they have not been changed for a specified period of time.
wikimedia-l@lists.wikimedia.org