Emily, I believe the pagecount data was never collected in a structured way
before 2007. See for example this discussion about some archive data that
took some pains to uncover:
https://phabricator.wikimedia.org/T232563
If edits per article would work as a proxy for attention, or in combination
with views you can extrapolate somehow, we are in the process of vetting
and releasing a simple full history of editing on all wikis:
https://dumps.wikimedia.org/other/mediawiki_history/readme.html
On Thu, Jan 16, 2020 at 7:52 AM Ryan Kaldari <rkaldari(a)wikimedia.org> wrote:
Note that the definition of pageviews has changed
several times over the
years. Only the data from 2015 to present is strictly comparable. I'm sure
some data analysts will chime in with more details. Good luck with your
project!
On Jan 15, 2020, at 6:59 PM, Emily Chen <echen920(a)usc.edu> wrote:
Hi,
My name is Emily Chen and I'm a Computer Science Ph.D. student at the
University of Southern California. I tried sending this email earlier
before I had joined the mailer, so apologies if this email was sent out
twice! I'm currently conducting research on collective attention decay in
Wikipedia articles that are more heavily cited by other Wikipedia articles
within the Wikipedia ecosystem. This work builds upon the observations made
in Candia et al's paper on "The universal decay of collective memory and
attention
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.nature.com_articles_s41562-2D018-2D0474-2D5_&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=L28nNkR1PtjB2SmfWmCyJg&m=3zQGbMO5CmRLzz0FfWu7BXNsaJO9bff2gb1F5xG8EB8&s=tViSDkiMKEu9TZRabpoJ3dZ-BjHniCvK_5KtxIEVXts&e=>",
and I have been using the number of page views articles receive as a proxy
for attention.
From what I can find, there is a maintained page view data set on
dumps.wikipmedia.org
<https://urldefense.proofpoint.com/v2/url?u=http-3A__dumps.wikipmedia.org&d=DwMFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=L28nNkR1PtjB2SmfWmCyJg&m=3zQGbMO5CmRLzz0FfWu7BXNsaJO9bff2gb1F5xG8EB8&s=UtOnWjAQWI4l2Mz9WGXCjzGTD1DyHmyToCBOcoipq3c&e=>
that spans 2011-current, and statistics that Domas Mituzas began collecting
from 2007 - 2016. This data seems to capture the gradual decay in an
individual article's pageviews, but doesn't capture the initial growth of
an article's page views. Would you happen to know if there are article page
view statistics from the earlier years of Wikipedia (2001-2007) or if there
are any general page view statistics from that time frame? Or would you
happen to know who I could contact for such a dataset? It would be really
interesting to study the temporal page view dynamics over Wikipedia's
lifespan alongside my current work in collective attention.
Thank you so much for your time!
Best,
Emily Chen
--
Emily Chen (echen920 [at] usc [dot] edu)
Ph.D. Student | Computer Science
Viterbi School of Engineering & Information Sciences Institute
University of Southern California
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics