On Tue, Aug 15, 2006 at 03:50:25PM +0200, Jan Kulveit wrote:
On Tue, Aug 15, 2006 at 02:04:37PM +0200, Jens Frank wrote:
Than, there must be some flaw in the implementation. Othervise the
method
(running some client-side javascript contacting statistics server) is industry standard (e.g. Google Analytics works this way).
Moreover, it could be easily done with modest hardware and existing statistics software. It's unnecessary to process the whole huge dataset
it would be enough to process random selection. Which can be easily done with JS ... the JS code would contact the statics server if some
$random>0.999
for example.
That's exactly how the JS for the German wiki works and that's also why it doesn't work. With only a few requests you can fake the statistics, since every hit on the statistics server counts as 1000 page views.
Hmm... You'd have to purposefully hack the javascript, otherwise those
"few requests" would have a slim chance of being the 1 in a 1000 that gets posted. So that must mean there is someone maliciously messing with the statistics? Okay, so instead, and only slightly more expensive, go ahead and hit the counter, but the counter does the randomization check - if it's the 1 in 1000, run the code that posts to the database. You could even have the counter script that gets run every time be very, very slim: just a random number generator and an if statement, then do an include if the condition is met (so you don't even load all the code into memory). You could also opt to send back no response, and that saves some bandwidth too (doesn't it?)... So you'd hit the server with a zillion GET requests, but it wouldn't have to do much to deal with them.
Also... unless they're so determined that they're using a proxy to spoof the statistics, you could look for an unusual number of hits from specific IPs...
Aerik
"Aerik Sylvan" wrote:
Okay, so instead, and only slightly more expensive, go ahead and hit the counter, but the counter does the randomization check - if it's the 1 in 1000, run the code that posts to the database.
But then we fail in the original problem. A server with billions of conections... die.
wikitech-l@lists.wikimedia.org