Erik Zachte wrote:
Sure, info gets lost. And the Long Tail is meaningful
for some research no
doubt.
But my resources are finite.
Actually I do store some all inclusive counts in the compacted 24 hr file:
# Lines starting with ampersand (@) show totals per 'namespace' (including
omitted counts for low traffic articles)
# Since valid namespace string are not known in the compression script any
string followed by colon (:) counts as possible namespace string
# Please reconcile with real namespace name strings later
# 'namespaces' with count < 5 are combined in 'Other' (on larger wikis
these
are surely false positives)
Making the script aware of namespace names would be quite easy.