Re: [MediaWiki-l] Apache KeepAlive on or off?

List overview All Threads
Download

newer

older

command line export templates

partial page read access ?

Jan Steinman

15 Oct 2012 15 Oct '12

2:17 p.m.

...

From: David Gerard dgerard@gmail.com

...
The MW file cache might be fast, but Squid is always going to be faster.

We're quite fond of our "This article has been viewed x times" at the bottom of the pages, though I'm quite aware that's an expensive affectation that we may well have outgrown.

I think it's just a simple database query. At least I have a "mw_hitcounter" MEMORY table in my database. But maybe it's leftover cruft from past upgrades...

---------------- :::: Some of the current attempts at energy accounting, like the triple bottom line, are an absolute a joke. They're an insult to children even in terms of their intellectual content, because they try and compare vague abstractions of social and environmental values -- just dot pointed -- against a completely econometric financial accounting system of an organization which is actually doing the work. -- David Holmgren :::: Jan Steinman, EcoReality Co-op ::::

Show replies by date

Chad

15 Oct 15 Oct

2:32 p.m.

New subject: Apache KeepAlive on or off?

On Mon, Oct 15, 2012 at 10:17 AM, Jan Steinman Jan@bytesmiths.com wrote:

...

...
From: David Gerard dgerard@gmail.com

...
The MW file cache might be fast, but Squid is always going to be faster.

We're quite fond of our "This article has been viewed x times" at the bottom of the pages, though I'm quite aware that's an expensive affectation that we may well have outgrown.

I think it's just a simple database query. At least I have a "mw_hitcounter" MEMORY table in my database. But maybe it's leftover cruft from past upgrades...

There's a lot more to it than that, and it is indeed a performance drain (which is why we don't use it at WMF). Each pageview writes to the hitcounter table. Every so often the page table is updated with the hits from hitcounter.

Sure, displaying the hits is a simple query to the page table, but updating hitcounter each pageview just does not scale.

-Chad

Dave Humphrey

16 Oct 16 Oct

11:32 a.m.

New subject: Apache KeepAlive on or off?

Another way to look at it is how many page requests Squid/Varnish can eliminate from Apache. In our MediaWiki Squid has an average cache hit rate of 85% which means the Apache page requests is 6.5x smaller. For small wikis this isn't a big deal but as you scale up reducing Apache requests by a factor of 6 is huge. Since RationalWiki appears to be kinda of in the middle you just have to ask yourself whether page access numbers or the Apache load is more important.

On 15 October 2012 10:32, Chad innocentkiller@gmail.com wrote:

...

There's a lot more to it than that, and it is indeed a performance drain (which is why we don't use it at WMF). Each pageview writes to the hitcounter table. Every so often the page table is updated with the hits from hitcounter.

Sure, displaying the hits is a simple query to the page table, but updating hitcounter each pageview just does not scale.

-Chad

MediaWiki-l mailing list MediaWiki-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/mediawiki-l

-- Dave Humphrey -- dave@uesp.net Founder/Server Admin of the Unofficial Elder Scrolls Pages -- www.uesp.net

Chad

12:27 p.m.

New subject: Apache KeepAlive on or off?

On Tue, Oct 16, 2012 at 7:32 AM, Dave Humphrey dave@uesp.net wrote:

...

Another way to look at it is how many page requests Squid/Varnish can eliminate from Apache. In our MediaWiki Squid has an average cache hit rate of 85% which means the Apache page requests is 6.5x smaller. For small wikis this isn't a big deal but as you scale up reducing Apache requests by a factor of 6 is huge. Since RationalWiki appears to be kinda of in the middle you just have to ask yourself whether page access numbers or the Apache load is more important.

Indeed. If you've got caches in front of Apache, you won't even increment the hit counter on many requests. It's why our reports are pulled from the squid logs themselves.

-Chad

Platonides

9:56 p.m.

New subject: Apache KeepAlive on or off?

On 15/10/12 16:32, Chad wrote:

...

There's a lot more to it than that, and it is indeed a performance drain (which is why we don't use it at WMF). Each pageview writes to the hitcounter table. Every so often the page table is updated with the hits from hitcounter.

Sure, displaying the hits is a simple query to the page table, but updating hitcounter each pageview just does not scale.

-Chad

I think the problem was in writes to the page table locking the `page` table for read, which in turn slowed everything, since requests to read `page` are ubiquitous. This is clearly a problem with MyISAM. I would expect InnoDB to not have so much problems with the writes. Does anyone know? Also, the page views could be moved to a separate table for freeing `page` from locks. Incrementing a counter doesn't even need to take a lock. The cost is probably just the journaling. Which is likely the reason we first aggregate in a MEMORY table. I think it could scale (while we still are with a single webserver), although I'm not good at determining where would the bottleneck lie.

Regards

Tim Starling

17 Oct 17 Oct

3:25 a.m.

New subject: Apache KeepAlive on or off?

On 17/10/12 08:56, Platonides wrote:

...

I think the problem was in writes to the page table locking the `page` table for read, which in turn slowed everything, since requests to read `page` are ubiquitous. This is clearly a problem with MyISAM. I would expect InnoDB to not have so much problems with the writes. Does anyone know?

Erik Moeller thought so:

http://article.gmane.org/gmane.science.linguistics.wikipedia.technical/1337

But Jimmy Wales suggested just getting rid of page view counters since they are mostly pointless:

http://article.gmane.org/gmane.science.linguistics.wikipedia.technical/2389

They were disabled in the UI, but it took until approximately September 2003 for someone to notice that the UPDATE queries were still being done:

https://www.mediawiki.org/wiki/Special:Code/MediaWiki/1711

And that was 8 months after the switch to InnoDB:

http://article.gmane.org/gmane.science.linguistics.wikipedia.technical/2279

So I don't think the rationale for r1711 could have been to work around MyISAM problems. In any case, it soon became a moot point, since we deployed Squid in early 2004 and the existing feature could not work with it. The hitcounter table was introduced in December 2003 by E23, but I don't think it was ever deployed to Wikimedia.

...

Also, the page views could be moved to a separate table for freeing `page` from locks. Incrementing a counter doesn't even need to take a lock. The cost is probably just the journaling. Which is likely the reason we first aggregate in a MEMORY table.

It aggregates in a MEMORY table because the MEMORY table is faster, I don't think it was ever analysed further than that.

...

I think it could scale (while we still are with a single webserver), although I'm not good at determining where would the bottleneck lie.

That's a funny definition of "scale".

-- Tim Starling

4301

Age (days ago)

4303

Last active (days ago)

mediawiki-l@lists.wikimedia.org

5 comments

5 participants

tags (0)

participants (5)

Chad
Dave Humphrey
Jan Steinman
Platonides
Tim Starling