Sure, info gets lost. And the Long Tail is meaningful for some research no
doubt.
But my resources are finite.
Actually I do store some all inclusive counts in the compacted 24 hr file:
# Lines starting with ampersand (@) show totals per 'namespace' (including
omitted counts for low traffic articles)
# Since valid namespace string are not known in the compression script any
string followed by colon (:) counts as possible namespace string
# Please reconcile with real namespace name strings later
# 'namespaces' with count < 5 are combined in 'Other' (on larger wikis
these
are surely false positives)
@ aa.z Category 9
@ aa.z File 20
@ aa.z Image 9
@ aa.z MediaWiki 20
@ aa.z NamespaceArticles 163
@ aa.z Special 97
@ aa.z Talk 17
@ aa.z User 35
@ aa.z Wikipedia 16
@ aa.z -other- 11
Erik Zachte
-----Original Message-----
From: wikitech-l-bounces(a)lists.wikimedia.org [mailto:wikitech-l-
bounces(a)lists.wikimedia.org] On Behalf Of Robert Rohde
Sent: Friday, September 18, 2009 02:33
To: Wikimedia developers
Cc: Mathias Schindler; Frédéric Schütz; toolserver-
l(a)lists.wikimedia.org
Subject: Re: [Wikitech-l] [Toolserver-l] Archive of visitor stats
2009/9/17 Erik Zachte <erikzachte(a)infodisiac.com>om>:
I think it is extremely important to keep these
files for later
analysis by
historians and others.
Mathias Schindler also keep an archive or at least did till April
(Berlin
conference).
He even bought a dedicated external drive for it.
I collect files daily and merge 24 hourly files into one daily file.
That saves a lot on disk space and makes processing faster.
Titles with less than 10 requests per day are discarded that also
saves a
lot.
Careful, a recent analysis I did suggested that 15% of all page
requests for articles on Wikipedia are for topics requested less than
once per hour. There are a very large number of pages that rarely see
hits, but collectively the traffic to such topics is important. You
could end up biasing certain kinds of analysis if you always exclude
the rarely visited pages.
-Robert Rohde
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l