Erik Zachte wrote:
Sure, info gets lost. And the Long Tail is meaningful for some research no doubt. But my resources are finite.
Actually I do store some all inclusive counts in the compacted 24 hr file:
# Lines starting with ampersand (@) show totals per 'namespace' (including omitted counts for low traffic articles) # Since valid namespace string are not known in the compression script any string followed by colon (:) counts as possible namespace string # Please reconcile with real namespace name strings later # 'namespaces' with count < 5 are combined in 'Other' (on larger wikis these are surely false positives)
Making the script aware of namespace names would be quite easy.