On Fri, Sep 18, 2009 at 10:02 AM, Platonides Platonides@gmail.com wrote:
Erik Zachte wrote:
Sure, info gets lost. And the Long Tail is meaningful for some research no doubt. But my resources are finite.
Actually I do store some all inclusive counts in the compacted 24 hr file:
# Lines starting with ampersand (@) show totals per 'namespace' (including omitted counts for low traffic articles) # Since valid namespace string are not known in the compression script any string followed by colon (:) counts as possible namespace string # Please reconcile with real namespace name strings later # 'namespaces' with count < 5 are combined in 'Other' (on larger wikis these are surely false positives)
Making the script aware of namespace names would be quite easy.
For English this is obviously true, but Erik writes scripts intended to be language agnostic and work with all WMF projects. While certainly possible to teach it about namespaces in the general sense, it would take rather a bit of effort to call up the local namespace names and all legitimate variants for every different project/language in turn.
-Robert Rohde