From: David Gerard dgerard@gmail.com
The MW file cache might be fast, but Squid is always going to be faster.
We're quite fond of our "This article has been viewed x times" at the bottom of the pages, though I'm quite aware that's an expensive affectation that we may well have outgrown.
I think it's just a simple database query. At least I have a "mw_hitcounter" MEMORY table in my database. But maybe it's leftover cruft from past upgrades...
---------------- :::: Some of the current attempts at energy accounting, like the triple bottom line, are an absolute a joke. They're an insult to children even in terms of their intellectual content, because they try and compare vague abstractions of social and environmental values -- just dot pointed -- against a completely econometric financial accounting system of an organization which is actually doing the work. -- David Holmgren :::: Jan Steinman, EcoReality Co-op ::::
On Mon, Oct 15, 2012 at 10:17 AM, Jan Steinman Jan@bytesmiths.com wrote:
From: David Gerard dgerard@gmail.com
The MW file cache might be fast, but Squid is always going to be faster.
We're quite fond of our "This article has been viewed x times" at the bottom of the pages, though I'm quite aware that's an expensive affectation that we may well have outgrown.
I think it's just a simple database query. At least I have a "mw_hitcounter" MEMORY table in my database. But maybe it's leftover cruft from past upgrades...
There's a lot more to it than that, and it is indeed a performance drain (which is why we don't use it at WMF). Each pageview writes to the hitcounter table. Every so often the page table is updated with the hits from hitcounter.
Sure, displaying the hits is a simple query to the page table, but updating hitcounter each pageview just does not scale.
-Chad
Another way to look at it is how many page requests Squid/Varnish can eliminate from Apache. In our MediaWiki Squid has an average cache hit rate of 85% which means the Apache page requests is 6.5x smaller. For small wikis this isn't a big deal but as you scale up reducing Apache requests by a factor of 6 is huge. Since RationalWiki appears to be kinda of in the middle you just have to ask yourself whether page access numbers or the Apache load is more important.
On 15 October 2012 10:32, Chad innocentkiller@gmail.com wrote:
There's a lot more to it than that, and it is indeed a performance drain (which is why we don't use it at WMF). Each pageview writes to the hitcounter table. Every so often the page table is updated with the hits from hitcounter.
Sure, displaying the hits is a simple query to the page table, but updating hitcounter each pageview just does not scale.
-Chad
MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
On Tue, Oct 16, 2012 at 7:32 AM, Dave Humphrey dave@uesp.net wrote:
Another way to look at it is how many page requests Squid/Varnish can eliminate from Apache. In our MediaWiki Squid has an average cache hit rate of 85% which means the Apache page requests is 6.5x smaller. For small wikis this isn't a big deal but as you scale up reducing Apache requests by a factor of 6 is huge. Since RationalWiki appears to be kinda of in the middle you just have to ask yourself whether page access numbers or the Apache load is more important.
Indeed. If you've got caches in front of Apache, you won't even increment the hit counter on many requests. It's why our reports are pulled from the squid logs themselves.
-Chad
On 15/10/12 16:32, Chad wrote:
There's a lot more to it than that, and it is indeed a performance drain (which is why we don't use it at WMF). Each pageview writes to the hitcounter table. Every so often the page table is updated with the hits from hitcounter.
Sure, displaying the hits is a simple query to the page table, but updating hitcounter each pageview just does not scale.
-Chad
I think the problem was in writes to the page table locking the `page` table for read, which in turn slowed everything, since requests to read `page` are ubiquitous. This is clearly a problem with MyISAM. I would expect InnoDB to not have so much problems with the writes. Does anyone know? Also, the page views could be moved to a separate table for freeing `page` from locks. Incrementing a counter doesn't even need to take a lock. The cost is probably just the journaling. Which is likely the reason we first aggregate in a MEMORY table. I think it could scale (while we still are with a single webserver), although I'm not good at determining where would the bottleneck lie.
Regards
On 17/10/12 08:56, Platonides wrote:
I think the problem was in writes to the page table locking the `page` table for read, which in turn slowed everything, since requests to read `page` are ubiquitous. This is clearly a problem with MyISAM. I would expect InnoDB to not have so much problems with the writes. Does anyone know?
Erik Moeller thought so:
http://article.gmane.org/gmane.science.linguistics.wikipedia.technical/1337
But Jimmy Wales suggested just getting rid of page view counters since they are mostly pointless:
http://article.gmane.org/gmane.science.linguistics.wikipedia.technical/2389
They were disabled in the UI, but it took until approximately September 2003 for someone to notice that the UPDATE queries were still being done:
https://www.mediawiki.org/wiki/Special:Code/MediaWiki/1711
And that was 8 months after the switch to InnoDB:
http://article.gmane.org/gmane.science.linguistics.wikipedia.technical/2279
So I don't think the rationale for r1711 could have been to work around MyISAM problems. In any case, it soon became a moot point, since we deployed Squid in early 2004 and the existing feature could not work with it. The hitcounter table was introduced in December 2003 by E23, but I don't think it was ever deployed to Wikimedia.
Also, the page views could be moved to a separate table for freeing `page` from locks. Incrementing a counter doesn't even need to take a lock. The cost is probably just the journaling. Which is likely the reason we first aggregate in a MEMORY table.
It aggregates in a MEMORY table because the MEMORY table is faster, I don't think it was ever analysed further than that.
I think it could scale (while we still are with a single webserver), although I'm not good at determining where would the bottleneck lie.
That's a funny definition of "scale".
-- Tim Starling
mediawiki-l@lists.wikimedia.org