Serving dynamic contents using Squid

List overview All Threads
Download

newer

older

Squid version(s), configs

Wikimedia web servers

howard chen

12 Feb 2008 12 Feb '08

1:08 p.m.

Hi Folks,

...

From the wikipedia' system diagram, seems that Squid play an important

role in the system architecture.

But how does Squid handle user customized page?

E.g. A page containing user logon name or IP ect. at the top of the page?

I think squid can't handle this, right (Unless you are using squid 3 ESI)?

Regards, Howard

Show replies by date

Platonides

12 Feb 12 Feb

3:20 p.m.

howard chen wrote:

...

Hi Folks,

From the wikipedia' system diagram, seems that Squid play an important role in the system architecture.

But how does Squid handle user customized page?

E.g. A page containing user logon name or IP ect. at the top of the page?

I think squid can't handle this, right (Unless you are using squid 3 ESI)?

Squids handle anonymous requests. Mediawiki can show the IP on the top of the page, but it's disabled on wikimedia sites precisely to have it cacheable.

They are obviously not cached at squid level for logged in users (but things like rendered html are cached for everyone at memcache) although they do serve the images for everyone.

Finally, an interesting point on wikimedia squid caching is that the pages aren't cached by time, as there's no way to know how much time will elapse before next edit, but they are cached sine die and then purged when there's an edit (or purge, or a template is changed...) Squids at tampa get purged with a udp multicast notice, which is routed via tcp to the other squid clusters, where they are converted again to udp multicast.

Gregory Maxwell

3:33 p.m.

On Feb 12, 2008 7:08 AM, howard chen howachen@gmail.com wrote:

...

Hi Folks, From the wikipedia' system diagram, seems that Squid play an important role in the system architecture. But how does Squid handle user customized page?

It doesn't. Squid passes those through performing only the service of connection pooling. The images you view are all still cached.

...

From an overall performance perspective this isn't very relevant:

logged in users are a fairly small portion of the total page requests. Even though all logged in pages are squid misses the squids are still getting hit rates for text of 92%, which is only a little less than the ~99% from the image cache hierarchy.

In cases where the user is closer to a remote squid cluster than Tampa the squid persistent connections towards the backend servers should be a significant performance improvement: Clients don't tend to be too good about keeping persistent connections up and the savings of a single transatlantic RTT (which would be lost to TCP setup) will nearly double the loading speed of many pages.

While some fancy footwork at the caching level could possibly do simple tasks like paste in your username, logged in users can set preferences which substantially change how the page text is displayed which could not be reasonable performed at that layer. There are other layers of caching on the backend (i.e. memcached used by MediaWiki) which help with requests from logged in users.

Thomas Dalton

8:39 p.m.

...

Even though all logged in pages are squid misses the squids are still getting hit rates for text of 92%, which is only a little less than the ~99% from the image cache hierarchy.

That's actually quite a lot less. 8 times as many text requests aren't cached as image requests. You have to be careful when comparing percentages - the absolute difference between the two is a pretty meaningless number.

Gregory Maxwell

9:02 p.m.

On Feb 12, 2008 2:39 PM, Thomas Dalton thomas.dalton@gmail.com wrote:

...

...
Even though all logged in pages are squid misses the squids are still getting hit rates for text of 92%, which is only a little less than the ~99% from the image cache hierarchy.

That's actually quite a lot less. 8 times as many text requests aren't cached as image requests. You have to be careful when comparing percentages - the absolute difference between the two is a pretty meaningless number.

Indeed, but still both are "very effective". If you're going to go the 'absolute' route then even 99% is leaving a lot of requests uncached in absolute terms.

6169

Age (days ago)

6169

Last active (days ago)

wikitech-l@lists.wikimedia.org

4 comments

4 participants

tags (0)

participants (4)

Gregory Maxwell
howard chen
Platonides
Thomas Dalton