Timwi-
Oh, *that* kind of profiling. I have done that with C++ applications under Windows, but I never thought it'd be necessary for webserver applications... Clearly, at least our *current* problem - and also the current problem of LiveJournal (sorry I keep mentioning LiveJournal, but it's the first and only other major website I've major contributed at) - is database performance, not CPU usage.
Python has a profiling module. See an example which i used to find bottlenecks in the bittorrent code: http://24.175.39.66/~hperes/btdh_upload_only_XP1800_20minutes_runtime.txt
Well, that's not necessarily true. CPU usage on Larousse (the webserver for En:) has been very high, and our page parser is very ugly and slow. Clearly we need optimizations on both fronts.
Python has an amazing JIT/code optimizer named psyco (psyco.sf.net). Most text processing type operations are sped up to the level of pure C. I wrote a difflib module (almost identitical to our difflib.php here) in C/python api and psyco (when run on the pure python version) met it's performance.
The "reduced performance" argument is something I don't know anything about; it is possible that this is a good argument, but I'm not convinced.
Well, then test it. Throw some 10 megabyte files into the database and compare the reading performance with multiple threads to direct Apache server access.
YES. test,test,test. This is the precept of extreme (and proper) programming.
Perl-OOP is a bit ugly, but reasonably powerful. Check out perldoc perltoot for details. You'll want to look at "tie" especially, as this allows you to do some cool stuff with properties.
Mainly its just ugly. Perl is a horrible language whose time has passed. Its number one bad feature is confusion. Perl allows more unmaintainable and entropic code than most any language except brainfuck. It's development is pathetic over the last few years, the language and community have no focus.
Python is a language that is only 1 year younger than perl (1990). Its had threads since the beginning, whereas perl just obtained them last year for example... Python has no bad features, in that it only allows for low entropy code. It has a rich standard library, and is accepted widely. Read this oreilley publication http://www.onlamp.com/pub/wlg/3198 which covers 8 stories in big industry (industrial-light-and-magic ILM for example) and how after years of working with things python is the number 1 language of choice over php, perl, c, java, lisp.
There is a great wiki written in python, called moin moin, its filed based even, not database based, and i believe that this avenue is more lightweight and deserves some consideration. Yes...ive followed all wikipedia's static-file "to cache or not to cache arguments...".
There are great idioms and frameworks out there to make content management and web publishing low-entropy and maximally effective.
Namely: zope's cmf: http://cmf.zope.org/ AND twisted.woven: http://twistedmatrix.com/documents/howto/woven and of course...mod_python which is in-apache.
Heck...this wikitech mailing list (list.org) is written in python.
google, yahoo , various goverments accross the world all prefer python. Dynamically typed langauages like perl, python, and php are far more productive than anally typed java/ocaml. And archaic langauges like c++, c, fortan.
__________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com
On Sat, Jul 26, 2003 at 01:33:14AM -0700, Hunter Peress wrote: [cut]
Please, no language propaganda here. If you want to help - write code. Patches to current codebase prefered.
If your only problem is that you don't like how Perl looks like, write a module that will make it look like Python - it's perfectly possible in Perl. Or write some magic syntax highlighter to your editor.
Hunter Peress wrote:
Timwi-
Oh, *that* kind of profiling. I have done that with C++ applications under Windows, but I never thought it'd be necessary for webserver applications... Clearly, at least our *current* problem - and also the current problem of LiveJournal (sorry I keep mentioning LiveJournal, but it's the first and only other major website I've major contributed at) - is database performance, not CPU usage.
Python has a profiling module. See an example which i used to find bottlenecks in the bittorrent code: http://24.175.39.66/~hperes/btdh_upload_only_XP1800_20minutes_runtime.txt
Well, that's not necessarily true. CPU usage on Larousse (the webserver for En:) has been very high, and our page parser is very ugly and slow. Clearly we need optimizations on both fronts.
Python has an amazing JIT/code optimizer named psyco (psyco.sf.net). Most text processing type operations are sped up to the level of pure C. I wrote a difflib module (almost identitical to our difflib.php here) in C/python api and psyco (when run on the pure python version) met it's performance.
The "reduced performance" argument is something I don't know anything about; it is possible that this is a good argument, but I'm not convinced.
Well, then test it. Throw some 10 megabyte files into the database and compare the reading performance with multiple threads to direct Apache server access.
YES. test,test,test. This is the precept of extreme (and proper) programming.
Perl-OOP is a bit ugly, but reasonably powerful. Check out perldoc perltoot for details. You'll want to look at "tie" especially, as this allows you to do some cool stuff with properties.
Mainly its just ugly. Perl is a horrible language whose time has passed. Its number one bad feature is confusion. Perl allows more unmaintainable and entropic code than most any language except brainfuck. It's development is pathetic over the last few years, the language and community have no focus.
Python is a language that is only 1 year younger than perl (1990). Its had threads since the beginning, whereas perl just obtained them last year for example... Python has no bad features, in that it only allows for low entropy code. It has a rich standard library, and is accepted widely. Read this oreilley publication http://www.onlamp.com/pub/wlg/3198 which covers 8 stories in big industry (industrial-light-and-magic ILM for example) and how after years of working with things python is the number 1 language of choice over php, perl, c, java, lisp.
There is a great wiki written in python, called moin moin, its filed based even, not database based, and i believe that this avenue is more lightweight and deserves some consideration. Yes...ive followed all wikipedia's static-file "to cache or not to cache arguments...".
There are great idioms and frameworks out there to make content management and web publishing low-entropy and maximally effective.
Namely: zope's cmf: http://cmf.zope.org/ AND twisted.woven: http://twistedmatrix.com/documents/howto/woven and of course...mod_python which is in-apache.
Heck...this wikitech mailing list (list.org) is written in python.
google, yahoo , various goverments accross the world all prefer python. Dynamically typed langauages like perl, python, and php are far more productive than anally typed java/ocaml. And archaic langauges like c++, c, fortan.
I'm sorry but asm uch as I dislike perl, I would never advise replacing it with a language that has interpreted whitespace. thats just asking for a bunch of stupid little errors. Uninterpreted white space IMHO is the greatest feature of programming languages in genereal, I love those magic curly braces.... I don't have anything against python, just that single particular aspect.
Lightning
tarquin wrote:
Lightning wrote:
I love those magic curly braces....
I love the way that in perl, you can put curlies *anywhere*, and declare local vars to that curly block :)
yeah i love that too.. thats pretty slick. I allways like using namespaces and lots of curlies in c++ my teacher hated me because of that and my constant habit of running a search and replace on my code before i turned it in for grading.. I used to search and replace all variable names and rename then with names like "sdfd", "frtb", ""w4gyr" .. lol oh. he never thought it was that funny, especially afeter i stripped all the comments and indention..... i thought it was funny
Lightning
On Saturday 26 July 2003 18:37, Lightning wrote:
I'm sorry but asm uch as I dislike perl, I would never advise replacing it with a language that has interpreted whitespace. thats just asking for a bunch of stupid little errors.
Have you ever tried to program anything in Python? I have learned python a couple of month ago and since then _never_ made any error which was related to wrongly indented code. Since any serious programmer indents his code anyway it is hard to believe for me that this could be a source of errors. In fact after using python for a while I found it pretty intuitive and would like more languages to adopt it :-)
I do not know PHP good enough to compare it to Python, but I am sure that Python would be an excellent choice since it is very easy to learn, has some support for OOP and is very intuitive.
best regards, Marco
I do not know PHP good enough to compare it to Python, but I am sure that Python would be an excellent choice since it is very easy to learn, has some support for OOP and is very intuitive.
I'm sorry, but as a _personal_ preference I think interpreted whitespace is a terrible idea. Sorry if you dont agree. As or a code switch, I have no doubts that python is a great language, it has many fans.. but I see no reason to switch.. sure its easy to learn, but the current developers already know php. Not too mention php is really easy to learn and understand. I mean really, you try and tell me php is hard or obfuscated or difficult to read or undocumented. You can't argue with that, the documentation available as well as the resources available for php are great, the built in functionality is great too. While it may not yet have as much available free code as PERL there is lots of availabe code to reuse, especially for the kind of thing we are currently doing. Besides.. really I want someone to post what exactly we would gain by switching languages, and why its worth the hundreds of manhours it would take to port the code. Why can't we just spend that time improving the current code?
Lightning
On Sat, Jul 26, 2003 at 05:50:13PM -0400, Lightning wrote:
I do not know PHP good enough to compare it to Python, but I am sure that Python would be an excellent choice since it is very easy to learn, has some support for OOP and is very intuitive.
I'm sorry, but as a _personal_ preference I think interpreted whitespace is a terrible idea. Sorry if you dont agree. As or a code switch, I have no doubts that python is a great language, it has many fans.. but I see no reason to switch.. sure its easy to learn, but the current developers already know php. Not too mention php is really easy to learn and understand. I mean really, you try and tell me php is hard or obfuscated or difficult to read or undocumented. You can't argue with that, the documentation available as well as the resources available for php are great, the built in functionality is great too. While it may not yet have as much available free code as PERL there is lots of availabe code to reuse, especially for the kind of thing we are currently doing. Besides.. really I want someone to post what exactly we would gain by switching languages, and why its worth the hundreds of manhours it would take to port the code. Why can't we just spend that time improving the current code?
Lightning
While I do think that Python is a cleaner language than PHP with much better OO syntax and namespaces, I do not believe that it should be the language of choice for Wikipedia for a couple of reasons. First and foremost, there is no way to get Python working as fast as PHP, and we need every ounce of performance that we can eek out based on our current infrastructure. If we had a huge Sun box with 16GB of RAM and 8 CPUs, I wouldn't worry so much, but we don't. Furthermore, it would seem to be a terrible waste to throw away all those manhours we have already put into the PHP code. I think we would be better off cleaning up the admittedly messy PHP code and possibly adding in my C parser (if I ever get around to getting the ugly list syntax working).
On Sat, Jul 26, 2003 at 05:00:50PM -0500, Nick Reinking wrote:
I think we would be better off cleaning up the admittedly messy PHP code and possibly adding in my C parser (if I ever get around to getting the ugly list syntax working).
If someone doesn't understand this yet: NO C CODE SHOULD BE USED ON WIKIPEDIA EVER
It's suicidally insecure.
On Sun, Jul 27, 2003 at 12:07:39AM +0200, Tomasz Wegrzanowski wrote:
On Sat, Jul 26, 2003 at 05:00:50PM -0500, Nick Reinking wrote:
I think we would be better off cleaning up the admittedly messy PHP code and possibly adding in my C parser (if I ever get around to getting the ugly list syntax working).
If someone doesn't understand this yet: NO C CODE SHOULD BE USED ON WIKIPEDIA EVER
It's suicidally insecure.
That's rediculous - just because C can be insecure doesn't mean it has to be. We rely on PHP every day, which, surprise surprise, is written in C. What do you think PHP is? It's just a C program that interprets specially formatted HTML and runs a bunch of internal C functions. As such, it is just as suspect to buffer overflows and what-not as any custom written C code that we might write.
On Sat, Jul 26, 2003 at 05:13:28PM -0500, Nick Reinking wrote:
On Sun, Jul 27, 2003 at 12:07:39AM +0200, Tomasz Wegrzanowski wrote:
On Sat, Jul 26, 2003 at 05:00:50PM -0500, Nick Reinking wrote:
I think we would be better off cleaning up the admittedly messy PHP code and possibly adding in my C parser (if I ever get around to getting the ugly list syntax working).
If someone doesn't understand this yet: NO C CODE SHOULD BE USED ON WIKIPEDIA EVER
It's suicidally insecure.
That's rediculous - just because C can be insecure doesn't mean it has to be. We rely on PHP every day, which, surprise surprise, is written in C. What do you think PHP is? It's just a C program that interprets specially formatted HTML and runs a bunch of internal C functions. As such, it is just as suspect to buffer overflows and what-not as any custom written C code that we might write.
Many people thought they are wise enough not to do any such mistake, and they have been all proven wrong. Even such security paranoids as OpenBSD people.
We are using C code all the time, but this code have been checked by thousands of people, and despite this, stack and heap overflows are being found in it all the time.
Risk is too high, and benefit is too small.
Anyway, lex and yacc are available for almost all languages, that's no excuse for using C.
Many people thought they are wise enough not to do any such mistake, and they have been all proven wrong. Even such security paranoids as OpenBSD people.
We are using C code all the time, but this code have been checked by thousands of people, and despite this, stack and heap overflows are being found in it all the time.
Risk is too high, and benefit is too small.
Anyway, lex and yacc are available for almost all languages, that's no excuse for using C.
As I was saying, the code is already written in flex, which produces C anyways. In fact, I have gone out of my way to not allocate any string structures, instead relying on flex and its functions to jump around in the various strings. The only place where I actually do store strings in is the list code which is why it is taking so long and is so damned annoying.
Tomasz Wegrzanowski wrote:
On Sat, Jul 26, 2003 at 05:00:50PM -0500, Nick Reinking wrote:
I think we would be better off cleaning up the admittedly messy PHP code and possibly adding in my C parser (if I ever get around to getting the ugly list syntax working).
If someone doesn't understand this yet: NO C CODE SHOULD BE USED ON WIKIPEDIA EVER
It's suicidally insecure.
hmm i kind of agree. i like the idea of a c parser, and i love C but i just can't see it being a good idea. too much risk for overflows, etc. I mean, i have no doubt it could be done securely.. but i just dont know theres too much room for errors...
Lightning
On Sat, Jul 26, 2003 at 06:17:46PM -0400, Lightning wrote:
hmm i kind of agree. i like the idea of a c parser, and i love C but i just can't see it being a good idea. too much risk for overflows, etc. I mean, i have no doubt it could be done securely.. but i just dont know theres too much room for errors...
I should say that my parser is written in flex, which in all likelyhood is the exact same way PHP is written. No real difference between how PHP works and how a custom written C parser would work.
Nick Reinking wrote:
On Sat, Jul 26, 2003 at 06:17:46PM -0400, Lightning wrote:
hmm i kind of agree. i like the idea of a c parser, and i love C but i just can't see it being a good idea. too much risk for overflows, etc. I mean, i have no doubt it could be done securely.. but i just dont know theres too much room for errors...
I should say that my parser is written in flex, which in all likelyhood is the exact same way PHP is written. No real difference between how PHP works and how a custom written C parser would work.
that sounds a whole lot more friendly.. i like the idea more now. You know what would be REALLY cool. if you made your parser a php module.. so we could compile it into php, them it would be like... amazing fast.. yeaahhh im liking how that sounds.
Lightning
Hey. I was thinking, I hated seeing short and long pages go. I have an idea for returning them.. i dont feel like explaining in english, so i hop you understand...
ALTER TABLE `cur` ADD `art_len` MEDIUMINT UNSIGNED; ALTER TABLE `cur` ADD INDEX ( `art_len` ) ;
/////ARTICLE.PHP -- insertNewARticle()
replace :
$sql = "UPDATE cur SET cur_text='" . wfStrencode( $text ) . "',cur_comment='" . wfStrencode( $summary ) . "',cur_minor_edit={$me2}, cur_user=" . $wgUser->getID() . ",cur_timestamp='{$now}',cur_user_text='" . wfStrencode( $wgUser->getName() ) . "',cur_is_redirect={$redir}, cur_is_new=0, cur_touched='{$now}', inverse_timestamp='{$won}' " . "WHERE cur_id=" . $this->getID();
with this:
$sql = "UPDATE cur SET cur_text='" . wfStrencode( $text ) . "',cur_comment='" . wfStrencode( $summary ) . "',cur_minor_edit={$me2}, cur_user=" . $wgUser->getID() . ",cur_timestamp='{$now}',cur_user_text='" . wfStrencode( $wgUser->getName() ) . + ",art_len=".strlen($text). "',cur_is_redirect={$redir}, cur_is_new=0, cur_touched='{$now}', inverse_timestamp='{$won}' " . "WHERE cur_id=" . $this->getID();
/////SPECIALLONGPAGES.PHP ~line 21
from $sql = "SELECT cur_title, LENGTH(cur_text) AS len FROM cur " . "WHERE cur_namespace=0 AND cur_is_redirect=0 ORDER BY " . "LENGTH(cur_text) DESC LIMIT {$offset}, {$limit}";
to: $sql = "SELECT cur_title, art_len AS len FROM cur " . "WHERE cur_namespace=0 AND cur_is_redirect=0 ORDER BY " . "art_len DESC LIMIT {$offset}, {$limit}";
Whats everyone think? Yes yes I know it involved adding another column and an index.. but those pages really were usefull... I mean, at least I thought they were... and I know its kind of hackish.. but there's no reason not too try it. If we implement it early in the other wikipedias it saves the cost of doing a huge alter table.. and it lets them keep those 2 special pages even if they grow a lot like the english wiki
Lightning
On Saturday 26 July 2003 23:50, Lightning wrote:
I'm sorry, but as a _personal_ preference I think interpreted whitespace is a terrible idea. Sorry if you dont agree.
Thinking that something is not good and trying it are two different pairs of shoes. I thought that the Wikipedia idea can not work until I tried it...
As or a code switch, I have no doubts that python is a great language, it has many fans.. but I see no reason to switch.. sure its easy to learn, but the current developers already know php. Not too mention php is really easy to learn and understand. I mean really, you try and tell me php is hard or obfuscated or difficult to read or undocumented. [...]
Neither did I say anything bad about php nor did I suggest to change the code base to python. If you read my posting you'll see that I just said that I don't agree with your critisism concerning python and that I said that python is a cool programming language, that's all :-)
best regards, Marco
wikitech-l@lists.wikimedia.org