Wikipedia has become a popular data resource for research in IR, natural language processing, knowledge discovery etc. etc. Wikipedia is especially attractive because of its open nature.
I was wondering whether search logs on wikipedia and the corresponding clickthrough data, (e.g. disambiguation pages etc) coulde be made available to the research community in an anonymized form (e.g. with the IP address mapped to a unique number). The objective is to derive meaningful statistics from the combination of server log files and webpage content, e.g. correlations between search terms and accessed pages, temporal patterns, navigation patterns etc. The analysis could help to improve the user's search experience on wikipedia.
--Wessel Kraaij
This e-mail and its contents are subject to the DISCLAIMER at http://www.tno.nl/disclaimer/email.html
wikitech-l@lists.wikimedia.org