--- El mar, 13/4/10, Martin Hellberg Olsson <martin.hellbergolsson(a)ugent.be>
escribió:
De: Martin Hellberg Olsson
<martin.hellbergolsson(a)ugent.be>
Asunto: Re: [Wiki-research-l] Access to HTTP access logs for Wikipedia articles?
Para: wiki-research-l(a)lists.wikimedia.org
Fecha: martes, 13 de abril, 2010 23:33
Felipe: You're mostly right, and I
wouldn't expect to know more about
this than you, but it's not only main. You can do things
like:
On the contrary, thanks for pointing out this.
I was just guessing from what I've seen in the content of those files. But I just had
a quick look, so most probably you're right about that :-).
http://stats.grok.se/en/201003/Wikipedia%3AAbout
So all wiki pages, but probably not page histories and such
things.
Also, I believed it was calculated on some kind of "full"
data while
most things that had been discussed here were on things
like
1/100th:s. Looking at the about page again, though, I'm not
sure this
was right.
Still, if Domas doesn't read this list (I have no idea),
could it hurt
to contact him?
Definitely, it'd be great that the data were already there. Many people is interested
in information about traffic, and the more analyses we get, the more we'll eventually
know about Wikipedia dynamics.
Domas is always quite busy, but it'd be great to know more about this.
Best,
Felipe.
Kind regards,
Martin
Citerar "Felipe Ortega" <glimmer_phoenix(a)yahoo.es>es>:
--- El mar, 13/4/10, Martin Hellberg Olsson
<Martin.HellbergOlsson(a)UGent.be>
escribió:
De: Martin Hellberg Olsson <Martin.HellbergOlsson(a)UGent.be
Asunto: Re: [Wiki-research-l] Access to HTTP access
logs for
Wikipedia articles?
Para: "Research into Wikimedia content and
communities"
<wiki-research-l(a)lists.wikimedia.org>
Fecha: martes, 13 de abril, 2010 19:14
A reply to the whole discussion, and at least one
other recent one,
rather than this last question.
This probably doesn't do everything that any of the
people asking need,
but should be relevant. I'm a bit surprised
it hasn't
been mentioned -
at least the people involved in these should be
able
to advise, even if
the online data isn't usable.
Sorry, correct me if I'm wrong, but I think that
Domas' dumps only
contain info about articles (that is, pages in
main
namespace) and a
summary count of hits for each page visited (so
you
can say which
article is the most visited).
We receive raw data for all namespaces of all
Wikimedia projects, so
you can get more info parsing the URLs (like
different
actions
requested: view, preview, save...).
Best,
Felipe.
Wikipedia article traffic statistics:
http://stats.grok.se/
is a "mere visualizer" for the raw data available
here:
http://dammit.lt/wikistats/
as stated in the visualizer's FAQ:
http://stats.grok.se/about
" Domas Mituzas put together a system
to gather access statistics from wikipedia's squid
cluster and
publishes it here. This site
is a mere visualizer of that data."
Very happy if this can be of any help!
Martin
S. Nunes wrote:
Thanks for the quick feedback.
Can you tell me to whom should this 'direct request'
be addressed?
A 1/100 sample or similar would be great. Is
referral
data included in
this sample?
Regards,
--
Sérgio Nunes
On 13 April 2010 16:09, Felipe Ortega <glimmer_phoenix(a)yahoo.es>
wrote:
Hi Sérgio,
Some universities (like ours) receive a 1/100 sample
of the whole
set of petitions processed by Wikimedia Squid
servers.
It is provided on direct request, however. As far as I
know the data
is not consistently archived in a public
repository
anywhere (but I
maybe unaware of some system storing that info).
Some work has already been published on this topic:
* A. J. Reinoso, J. M. Gonzalez-Barahona, G. Robles,
and F. Ortega,
"A quantitative approach to the use of the
wikipedia,"
in 2009 IEEE
Symposium on Computers and Communications.
IEEE,
July 2009, pp.
56-61. [Online]. Available:
http://dx.doi.org/10.1109/ISCC.2009.5202401
Regards,
Felipe.
--- El mar, 13/4/10, S. Nunes <snunes(a)gmail.com>
escribió:
De: S. Nunes <snunes(a)gmail.com>
Asunto: [Wiki-research-l] Access to HTTP access logs
for Wikipedia articles?
Para: "Wikipedia Research List"
<wiki-research-l(a)lists.wikimedia.org>
Fecha: martes, 13 de abril, 2010 13:23
Hi all,
I presume that Wikipedia keeps data about HTTP
accesses to
all articles.
Can anybody inform me if this data is available for
research purposes?
I am particularly interested in HTTP referral
information
for each
article. I suspect that this information could be used
to
estimate
topical relevance for each document. Access to this
information poses
no risk to users' privacy since no user information is
made
available
- sessions' id, hour/minute timestamp data and IPs
could be
easily
discarded.
I am new to this list, so I really don't know if this
has
been
previously discussed.
I searched the archives and found no relevant results
on
this issue.
Thanks in advance for your feedback,
--
Sérgio Nunes
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
--
-----Adjunto en línea a continuación-----
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
--
Assistent
Vakroep Scandinavistiek en Noordeuropakunde
Universiteit Gent
Rozier 44
B-9000 Gent
+32 9 264 38 04
martin.hellbergolsson(a)ugent.be
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l