Hello,
I was heard that wikipedia is running on Apache/mod_php, are there any reason not to use fast-cgi approach for performance (e.g. lighty / ngnix + fast cgi?)
Thanks.
On Tue, Aug 26, 2008 at 10:44 AM, howard chen howachen@gmail.com wrote:
I was heard that wikipedia is running on Apache/mod_php, are there any reason not to use fast-cgi approach for performance (e.g. lighty / ngnix + fast cgi?)
The reason that mod_php is slow is because you have to have an instance of PHP running when much if not most of the time, Apache is waiting to serve results to slow clients. This means that you have many more instances of PHP than you actually need, which means much more memory. In Wikipedia's case, however, Apache is serving data exclusively to Squid, which is reading the data at least as fast as it can write it. It wouldn't save anything to have FastCGI serve data to lighttpd or nginx instead; it would be no faster than mod_php serving it to Squid. So little to nothing would be saved by using FastCGI.
Hi,
On 8/26/08, Aryeh Gregor Simetrical+wikilist@gmail.com wrote:
The reason that mod_php is slow is because you have to have an instance of PHP running when much if not most of the time, Apache is waiting to serve results to slow clients. This means that you have many more instances of PHP than you actually need, which means much more memory. In Wikipedia's case, however, Apache is serving data exclusively to Squid, which is reading the data at least as fast as it can write it. It wouldn't save anything to have FastCGI serve data to lighttpd or nginx instead; it would be no faster than mod_php serving it to Squid. So little to nothing would be saved by using FastCGI.
But there are still a number of pages which cannot be cached by Squid?
Thanks.
On Tue, Aug 26, 2008 at 11:28 AM, howard chen howachen@gmail.com wrote:
But there are still a number of pages which cannot be cached by Squid?
Even pages that aren't cached by Squid are still proxied through it. It proxies all requests, and caches those that are cacheable. No one outside Wikimedia's internal network is ever communicating directly with an Apache or lighttpd server when using the Wikimedia projects, to my knowledge. (lighttpd is in fact used by the image servers, incidentally.)
No PHP benefit either way. There's no reason to go to nginx for performance, but there's also no reason not to use it. Personally, I've got a preference for nginx, so I'm using it myself.
And for smaller sites not using a front end cache or multiple domains, it is nice when you are serving out the /images folder with the same webserver.
So, no nginx performance benefit for MediaWiki. But there's no real benefit of Apache over other webservers either.
~Daniel Friesen(Dantman, Nadir-Seen-Fire) of: -The Nadir-Point Group (http://nadir-point.com) --It's Wiki-Tools subgroup (http://wiki-tools.com) --The ElectronicMe project (http://electronic-me.org) --Games-G.P.S. (http://ggps.org) -And Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG) --Animepedia (http://anime.wikia.com) --Narutopedia (http://naruto.wikia.com)
Aryeh Gregor wrote:
On Tue, Aug 26, 2008 at 11:28 AM, howard chen howachen@gmail.com wrote:
But there are still a number of pages which cannot be cached by Squid?
Even pages that aren't cached by Squid are still proxied through it. It proxies all requests, and caches those that are cacheable. No one outside Wikimedia's internal network is ever communicating directly with an Apache or lighttpd server when using the Wikimedia projects, to my knowledge. (lighttpd is in fact used by the image servers, incidentally.)
On Tue, Aug 26, 2008 at 12:44 PM, Daniel Friesen dan_the_man@telus.net wrote:
No PHP benefit either way. There's no reason to go to nginx for performance, but there's also no reason not to use it. Personally, I've got a preference for nginx, so I'm using it myself.
The only reason Apache is still being used, AFAIK, is because it's what Wikimedia has always used. With no gain in switching, you may as well stick with whatever you have to avoid transition costs. lighttpd didn't even exist in 2001, let alone nginx.
Actually, is there any reason lighttpd is used for image serving? Just because it was trendy when the image servers got set up, or because it actually has concrete advantages of some kind?
For lighttpd vs. Apache. lighttpd is a lightweight webserver, and does have a real performance benefit over serving out images with bloated Apache.
As for lighttpd vs. nginx... I believe the author of lighttpd actually had some contact with Wikimedia, perhaps a bit of changes were made to lighttpd or something for them... Dunno... So whether anyone else gos for lighttpd vs. nginx there's no relevant benefit of one over the other.
~Daniel Friesen(Dantman, Nadir-Seen-Fire) of: -The Nadir-Point Group (http://nadir-point.com) --It's Wiki-Tools subgroup (http://wiki-tools.com) --The ElectronicMe project (http://electronic-me.org) --Games-G.P.S. (http://ggps.org) -And Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG) --Animepedia (http://anime.wikia.com) --Narutopedia (http://naruto.wikia.com)
Aryeh Gregor wrote:
On Tue, Aug 26, 2008 at 12:44 PM, Daniel Friesen dan_the_man@telus.net wrote:
No PHP benefit either way. There's no reason to go to nginx for performance, but there's also no reason not to use it. Personally, I've got a preference for nginx, so I'm using it myself.
The only reason Apache is still being used, AFAIK, is because it's what Wikimedia has always used. With no gain in switching, you may as well stick with whatever you have to avoid transition costs. lighttpd didn't even exist in 2001, let alone nginx.
Actually, is there any reason lighttpd is used for image serving? Just because it was trendy when the image servers got set up, or because it actually has concrete advantages of some kind?
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Daniel Friesen wrote:
For lighttpd vs. Apache. lighttpd is a lightweight webserver, and does have a real performance benefit over serving out images with bloated Apache.
We started using lighty to serve images in 2005: http://meta.wikimedia.org/wiki/November_2005_image_server
Domas did a fair amount of benchmarking and testing on this over the months prior to the switch, including working with the author on some fixes, and it was a pretty clear winner at the time.
As for lighttpd vs. nginx... I believe the author of lighttpd actually had some contact with Wikimedia, perhaps a bit of changes were made to lighttpd or something for them... Dunno... So whether anyone else gos for lighttpd vs. nginx there's no relevant benefit of one over the other.
Well, to be honest I never heard of nginx before today. :)
- -- brion
Brion Vibber wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Daniel Friesen wrote:
For lighttpd vs. Apache. lighttpd is a lightweight webserver, and does have a real performance benefit over serving out images with bloated Apache.
We started using lighty to serve images in 2005: http://meta.wikimedia.org/wiki/November_2005_image_server
Domas did a fair amount of benchmarking and testing on this over the months prior to the switch, including working with the author on some fixes, and it was a pretty clear winner at the time.
As for lighttpd vs. nginx... I believe the author of lighttpd actually had some contact with Wikimedia, perhaps a bit of changes were made to lighttpd or something for them... Dunno... So whether anyone else gos for lighttpd vs. nginx there's no relevant benefit of one over the other.
Well, to be honest I never heard of nginx before today. :)
- -- brion
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAki0SkQACgkQwRnhpk1wk46O0wCgkAbv8gM5/LTWBvoZD5logxcF ZNsAn2K0G34CRGUQHQ0bYH7aX/8z31Ir =uUIu -----END PGP SIGNATURE-----
Heh, that's amusing... Jae poked me about it all the time back on the old Wiki-Tools project. http://nginx.net/
http://survey.netcraft.com/Reports/200806/ http://hostingfu.com/article/nginx-vs-lighttpd-for-a-small-vps http://www.joeandmotorboat.com/2008/02/28/apache-vs-nginx-web-server-perform...
I love how I have it setup for my own MediaWiki sites. I passed off nginx on the old wiki-tools project as we couldn't find a single way to get nginx to work with short urls. When the second wiki-tools project came around I made yet another attempt at it... I even attempted using real complex rewrites. I manged to get it to sorta work... well, short urls worked, and amperstands didn't have an issue... well, to a limit... heh, the regex I used worked, but not all to well... ^_^ after a certain number of amperstands in the title, everything just sorta... went white... heh Some time after that I suddenly thought of a bit of a brilliant idea. Just came to me, ^_^ and personally I find it beautifully elegant. More beautiful than anything you could ever do with a rewrite...
location ~ /(index.php5?/|wiki|view|render|print|viewsource|purge|(form)?edit|submit|history|info|credits|(un)?watch|(un)?delete|revert|rollback|(un)?protect|markpatrolled|validate|deletetrackback|dublincore|creativecommons) { include fastcgi.conf; fastcgi_param SCRIPT_FILENAME $document_root/index.php; fastcgi_param SCRIPT_NAME /index.php; fastcgi_param PHP_SELF /index.php; fastcgi_param SCRIPT_URL /index.php; fastcgi_param PATH_INFO $fastcgi_script_name; fastcgi_pass php; }
include /etc/nginx/default.conf; include /etc/nginx/php.conf;
The one line is a bit long because I have an affinity for action paths. Honestly, that could be reduced to a single "location /wiki {" line if you just wanted short urls (actually that's what I basically listed on the English nginx wiki.
Quite simply... the regex matching "/wiki", "/edit", etc... whatever... matches as if it was a directory and $fastcgi_script_name matches whatever's after it. ie: /wiki/Foobar, $fastcgi_script_name is /Foobar Really the PATH_INFO is just there for a preference on completeness. MediaWiki uses the REQUEST_PATH so it's not really used (actually that's a good thing, since I needed to add a / after the index.php5?). Basically it takes all those requests and points them to /index.php, basically funneling all those through MediaWiki using it as a web application.
No cruddy rewrite rules trying to rewrite some ugly path into something that vaguely matches a query to index.php... Instead all requests to the wiki are just funneled through the wiki letting the software handle all the paths like it was built to.
Sorry bout the long code... It's just I get an indescribable feeling looking at this kind of elegant code, I was in awe for a fair bit of time after this idea hit me and worked...
~Daniel Friesen(Dantman, Nadir-Seen-Fire) of: -The Nadir-Point Group (http://nadir-point.com) --It's Wiki-Tools subgroup (http://wiki-tools.com) --The ElectronicMe project (http://electronic-me.org) --Games-G.P.S. (http://ggps.org) -And Wikia ACG on Wikia.com (http://wikia.com/wiki/Wikia_ACG) --Animepedia (http://anime.wikia.com) --Narutopedia (http://naruto.wikia.com)
Brion Vibber wrote:
Daniel Friesen wrote:
For lighttpd vs. Apache. lighttpd is a lightweight webserver, and does have a real performance benefit over serving out images with bloated Apache.
We started using lighty to serve images in 2005: http://meta.wikimedia.org/wiki/November_2005_image_server
Domas did a fair amount of benchmarking and testing on this over the months prior to the switch, including working with the author on some fixes, and it was a pretty clear winner at the time.
Domas tested CPU performance, which lighttpd did much better than Apache, especially in the large file case, because at the time, lighttpd used sendfile() and Apache didn't. Since 2.0.44, Apache has sendfile support, and our storage servers use negligible CPU anyway, they're disk bound. So it's pretty likely that we could put apache on them with no significant performance loss.
-- Tim Starling
Hi!
sendfile() and Apache didn't. Since 2.0.44, Apache has sendfile support, and our storage servers use negligible CPU anyway, they're disk bound. So it's pretty likely that we could put apache on them with no significant performance loss.
Well, there's another side issue - the memory used. Event-model based server doesn't have to spawn that many children, and even with all the copy-on-write efficiency, memory footprint of 2.0 apaches would be way higher. Of course, 2.2 has the event based model too. And memory wasted = memory not used for filesystem caches.
But yeah, at the time lighty was the champ, and now it probably doesn't make much difference.
I also liked lighttpd configuration simplicity :)
On Tue, Aug 26, 2008 at 2:18 PM, Daniel Friesen dan_the_man@telus.net wrote:
For lighttpd vs. Apache. lighttpd is a lightweight webserver, and does have a real performance benefit over serving out images with bloated Apache.
By itself, certainly, but I wasn't sure if it had an advantage serving to Squid as well. If Domas said so, though, I'm happy to believe him. :)
Hi!
By itself, certainly, but I wasn't sure if it had an advantage serving to Squid as well.
Squid may not buffer entire result immediately, so writing to squids sometimes blocks. Though it is tolerable with lots and lots of app servers, few poor image servers better to be more efficient :)
If Domas said so, though, I'm happy to believe him. :)
woot!
wikitech-l@lists.wikimedia.org