I'm also interested....but I'm concerned that a vandal could find articles
that are never looked at and target those. RC patrol can't get everything.
On 7/25/06, Erik Garrison <erik.garrison(a)gmail.com> wrote:
On Thu, Jul 06, 2006 at 07:58:45AM +0200, Steve Bennett wrote:
On 7/6/06, Abigail Brady
<morwen(a)evilmagic.org> wrote:
And then the way to stop this is to abstract the
logs for the traffic
we want, and throw the raw logs away as quickly as possible.
Something of the the level of data that google trends can provide to
the public is basically the type of thing we'd want to have (broad
numbers on the only most popular search terms/pages).
Yep. If the biggest concerns are disk space and privacy, then the
answer is obviously to collect logs for short periods of time that
look vaguely like:
[[George W Bush]] 130.158.1.4 1/4/2006 12:00
[[Bill Clinton] 130.158.1.4 1/4/2006:12:01
[[George W Bush]] 200.0.0.4 1/4/2006:12:01
then every few hours or even minutes reprocess them into this sort of
format:
[[George W Bush]] 2 1/4/2006
[[Bill Clinton]] 1 1/4/2006
and discard the original log files. Less disk space (entries that
receive less than N hits could even be discarded altogether from the
aggregate log) and no privacy concerns.
I understand if there's no one to actually implement this at the moment
though.
I'd like to revive this discussion.
I have time (several full workweeks, if needed) to implement this at the
moment. Is there anyone else who would be interested, would be capable of
helping, or would be capable of authorizing the work?
-Erik Garrison
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)wikimedia.org
http://mail.wikipedia.org/mailman/listinfo/wikitech-l