S. Nunes wrote:
Hi all,
I presume that Wikipedia keeps data about HTTP accesses to all articles. Can anybody inform me if this data is available for research purposes?
No. With the amount of traffic it has, space needs would be immense, and Wikimedia is not interested in logging all accesses.
You can use domas wikistats if they contain enough data for you. You may get a sampled feed for processing after contacting the foundation.
I am particularly interested in HTTP referral information for each article. I suspect that this information could be used to estimate topical relevance for each document.
Just from wikimedia referers, or from all web? How does knowing the page from which they reached wikipedia help to estimate the document relevance?
Access to this information poses no risk to users' privacy since no user information is made available
- sessions' id, hour/minute timestamp data and IPs could be easily
discarded.
What if your referer was your facebook personal page leaking your full real name? It may be possible to properly anonymize them, but it's not trivial either.