On Fri, Sep 18, 2009 at 10:02 AM, Platonides <Platonides(a)gmail.com> wrote:
Erik Zachte wrote:
Sure, info gets lost. And the Long Tail is
meaningful for some research no
doubt.
But my resources are finite.
Actually I do store some all inclusive counts in the compacted 24 hr file:
# Lines starting with ampersand (@) show totals per 'namespace' (including
omitted counts for low traffic articles)
# Since valid namespace string are not known in the compression script any
string followed by colon (:) counts as possible namespace string
# Please reconcile with real namespace name strings later
# 'namespaces' with count < 5 are combined in 'Other' (on larger wikis
these
are surely false positives)
Making the script aware of namespace names would be quite easy.
For English this is obviously true, but Erik writes scripts intended
to be language agnostic and work with all WMF projects. While
certainly possible to teach it about namespaces in the general sense,
it would take rather a bit of effort to call up the local namespace
names and all legitimate variants for every different project/language
in turn.
-Robert Rohde