Hi!
will still return the same results, wouldn't it
make more sense to
teach the stat's logger to ignore both? Or is there a reason that we
actually want to track one and not the other?
Pretty URLs are for being pretty URLs (e.g. in your address bar). That
leads to very easy assumption, that if there's a pretty URL, it
probably indicates a pageview :-) We quite like other pretty URLs for
Special pages e.g. Watchlist or Recentchanges - as we track their
accesses.
It seems like an awful lot of trouble to teach every
software author
that they need to follow a particular convention just so the stats
engine will work as intended. It would seem like it would be much
simpler to teach the stats engine to simply detect and ignore this
special case. Or is there a reason that doing so is not possible?
Heh, apparently stats became a big deal lately, so one with powers to
change that can feel important! ;-)
Anyway, there're few choices to resolve it on the stats side:
1) Implement pulling of a namespace map for each project, build out
an efficient rules engine (in C) for dealing with this (do note, every
project will have different namespace for this URL). Also, make it
extensible, so each developer tells about which names will be
not-a-pageview ;-) There's nothing as fun as writing that kind of
code, and do note, it won't be just five (or fifty) lines.
2) Add additional internal header (X-Pageview: true!), that would be
logged by squids inside the stream :) That probably asks for large
review inside MediaWiki, as well as squid code changes (and of course,
rollout of new binary). Would be nice inter-group effort.
3) Not care about inflated per-project numbers, or have people adjust
the numbers, as the source data is there (They can filter out banner
loader themselves!)
You can pick any of these, make sure it gets into strategy plan, as we
don't decide things on wikitech-l anymore :)
I prefer, hehehe, not doing anything, and just having pretty URLs just
for pageviews ;-)
Domas