On Thu, Jul 27, 2006 at 12:05:57AM -0600, Chad Perrin wrote:
Please don't construe this statement as a veiled snipe at PHP (or Java, or any other language). It's just an observation of fact.
I should help stab someone for choosing PHP, too, for that matter -- but we work with what we've got.
Unthreaded: in a clear field, Chad, what *would* you have implemented MediaWiki in? And why?
(Thread-kill is your friendi, folks...)
Cheers, -- jra
Jay R. Ashworth wrote:
On Thu, Jul 27, 2006 at 12:05:57AM -0600, Chad Perrin wrote:
Please don't construe this statement as a veiled snipe at PHP (or Java, or any other language). It's just an observation of fact.
I should help stab someone for choosing PHP, too, for that matter -- but we work with what we've got.
Unthreaded: in a clear field, Chad, what *would* you have implemented MediaWiki in? And why?
I can't speak for other people, but I would do the parser in C (using lex/yacc) and the rest in Perl.
For the parser, I'm sure the "why" is self-evident. For the rest, the answer to "why" unfortunately includes "I don't know any Python or Ruby". All I know is that PHP is unsuitable for anything larger than a quick hack.
Timwi
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Timwi wrote:
For the parser, I'm sure the "why" is self-evident. For the rest, the answer to "why" unfortunately includes "I don't know any Python or Ruby". All I know is that PHP is unsuitable for anything larger than a quick hack.
Delighted to see that you've called Yahoo! a quick hack. ;-)
On Thu, Jul 27, 2006 at 08:40:35PM -0400, Edward Z. Yang wrote:
Timwi wrote:
For the parser, I'm sure the "why" is self-evident. For the rest, the answer to "why" unfortunately includes "I don't know any Python or Ruby". All I know is that PHP is unsuitable for anything larger than a quick hack.
Delighted to see that you've called Yahoo! a quick hack. ;-)
He said it was "unsuitable for a quick hack", not that people don't use it for anything more than a quick hack.
I (mostly) agree.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Chad Perrin wrote:
On Thu, Jul 27, 2006 at 08:40:35PM -0400, Edward Z. Yang wrote:
Timwi wrote:
For the parser, I'm sure the "why" is self-evident. For the rest, the answer to "why" unfortunately includes "I don't know any Python or Ruby". All I know is that PHP is unsuitable for anything larger than a quick hack.
Delighted to see that you've called Yahoo! a quick hack. ;-)
He said it was "unsuitable for a quick hack", not that people don't use it for anything more than a quick hack.
I (mostly) agree.
Not sure if I'm interpreting you correctly: while you're picking holes in my logic (correct on all counts), you agree with the basic jist?
On Fri, Jul 28, 2006 at 07:26:15PM -0400, Edward Z. Yang wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Chad Perrin wrote:
On Thu, Jul 27, 2006 at 08:40:35PM -0400, Edward Z. Yang wrote:
Timwi wrote:
For the parser, I'm sure the "why" is self-evident. For the rest, the answer to "why" unfortunately includes "I don't know any Python or Ruby". All I know is that PHP is unsuitable for anything larger than a quick hack.
Delighted to see that you've called Yahoo! a quick hack. ;-)
He said it was "unsuitable for a quick hack", not that people don't use it for anything more than a quick hack.
I (mostly) agree.
Not sure if I'm interpreting you correctly: while you're picking holes in my logic (correct on all counts), you agree with the basic jist?
I mostly agree that PHP is unsuitable for anything larger than a quick hack. That doesn't mean it doesn't get used for things larger than a quick hack. That's the point I was trying to make.
On Fri, Sep 01, 2006 at 05:04:15PM -0700, Jeff Carr wrote:
On 07/28/06 16:33, Chad Perrin wrote:
I mostly agree that PHP is unsuitable for anything larger than a quick hack. That doesn't mean it doesn't get used for things larger than a quick hack. That's the point I was trying to make.
Unfortunately your point is wrong.
Your argument is compelling. I yield the field to your superior debate acumen.
. . . months later.
(Timwi timwi@gmx.net):
Unthreaded: in a clear field, Chad, what *would* you have implemented MediaWiki in? And why?
I can't speak for other people, but I would do the parser in C (using lex/yacc) and the rest in Perl.
For the parser, I'm sure the "why" is self-evident. For the rest, the answer to "why" unfortunately includes "I don't know any Python or Ruby". All I know is that PHP is unsuitable for anything larger than a quick hack.
Just for the record, I probably would have done the whole thing in Java except that PHP had two advantages I couldn't ignore: one, it let me get the whole thing up and running in 2-3 weeks at a time when the existing wiki was in serious meltdown and had to be fixed quickly; second, it allowed me to replace some of the existing code incrementally for easier testing.
On Sat, Aug 26, 2006 at 11:15:15PM -0500, Lee Daniel Crocker wrote:
(Timwi timwi@gmx.net):
Unthreaded: in a clear field, Chad, what *would* you have implemented MediaWiki in? And why?
I can't speak for other people, but I would do the parser in C (using lex/yacc) and the rest in Perl.
For the parser, I'm sure the "why" is self-evident. For the rest, the answer to "why" unfortunately includes "I don't know any Python or Ruby". All I know is that PHP is unsuitable for anything larger than a quick hack.
Just for the record, I probably would have done the whole thing in Java except that PHP had two advantages I couldn't ignore: one, it let me get the whole thing up and running in 2-3 weeks at a time when the existing wiki was in serious meltdown and had to be fixed quickly; second, it allowed me to replace some of the existing code incrementally for easier testing.
Well let us all give thanks for meltdown, then. :-)
Not the person I expected a reply from, but welcome back, I guess.
Cheers, -- jr "if you were, y'know, gone" a
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Unthreaded: in a clear field, Chad, what *would* you have implemented MediaWiki in?
Dunno about Chad, but I would have implemented it in Perl, of course. With perhaps bison and Inline::* where needed.
And why?
That's a loaded question. :) More experienced people familiar with the language available for development. Namespaces. A mature database API. No php.ini mess. Unicode. Lexical variables. Real hashes. "use strict". Consistent naming, use of case, and return values. The ability to use qq{}. Perldoc[1]. Real references and data structures. Good comparison operators. XS. True object orientation.
However, PHP is what we got, and MediaWiki is pretty well written and head and shoulders above 99% of the PHP apps out there. Once I finish Postgres support for MediaWiki, I'll be converting it to Perl. Just don't hold your breath. :)
[1] As I'm writing this, www.php.net appears to be down.
- -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200607272132 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Greg Sabino Mullane wrote:
[1] As I'm writing this, www.php.net appears to be down.
Down for me too, but the mirrors are working fine. Personally speaking, I think that PHP's manual is /a lot/ better than Perl's.
Everyone knows that Mediawiki should have been written in Javascript. The current implementation is a mess.
=D
On 7/27/06, Edward Z. Yang edwardzyang@thewritingpot.com wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Greg Sabino Mullane wrote:
[1] As I'm writing this, www.php.net appears to be down.
Down for me too, but the mirrors are working fine. Personally speaking, I think that PHP's manual is /a lot/ better than Perl's. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFEyWxVqTO+fYacSNoRAiERAJ9jHW81LqeuYhbLlp3bPsfX65N2QwCfabX9 FKFUB2CkmFxHmPkYx+pjUkw= =LrmK -----END PGP SIGNATURE-----
Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
On Thu, Jul 27, 2006 at 08:03:00PM -0700, mboverload wrote:
Everyone knows that Mediawiki should have been written in Javascript. The current implementation is a mess.
Frankly, if it was written end-to-end in Javascript, I (and millions of others) would probably use something else.
On 7/27/06, Chad Perrin perrin@apotheon.com wrote:
On Thu, Jul 27, 2006 at 08:03:00PM -0700, mboverload wrote:
Everyone knows that Mediawiki should have been written in
Javascript. The
current implementation is a mess.
Frankly, if it was written end-to-end in Javascript, I (and millions of others) would probably use something else.
Don't take this as an insult, because I'm really not sure, but I was really joking. Building Mediawiki in javascript is a joke, which is impossible....right?. I mean, how would javascript write to the database. Maybe you were thinking I was talking about AJAX?
- mboverload
Don't take this as an insult, because I'm really not sure, but I was really joking. Building Mediawiki in javascript is a joke, which is impossible....right?. I mean, how would javascript write to the database. Maybe you were thinking I was talking about AJAX?
You haven't heard about bittorrent? There will be no need for the database!!!
(BTW, server side javascript, and various mutations, exists too).
On Fri, Jul 28, 2006 at 01:42:45PM +0300, Domas Mituzas wrote:
Don't take this as an insult, because I'm really not sure, but I was really joking. Building Mediawiki in javascript is a joke, which is impossible....right?. I mean, how would javascript write to the database. Maybe you were thinking I was talking about AJAX?
You haven't heard about bittorrent? There will be no need for the database!!!
(BTW, server side javascript, and various mutations, exists too).
Eww. After reading that, I think my brain needs a shower.
Chad Perrin wrote:
On Fri, Jul 28, 2006 at 01:42:45PM +0300, Domas Mituzas wrote:
(BTW, server side javascript, and various mutations, exists too).
Eww. After reading that, I think my brain needs a shower.
Modern JavaScript (okay, ECMAScript) is actually a pretty nice language. It's just the browser implementations that generally suck. But it's certainly no worse than PHP.
On Fri, Jul 28, 2006 at 06:05:31PM +0300, Ilmari Karonen wrote:
Chad Perrin wrote:
On Fri, Jul 28, 2006 at 01:42:45PM +0300, Domas Mituzas wrote:
(BTW, server side javascript, and various mutations, exists too).
Eww. After reading that, I think my brain needs a shower.
Modern JavaScript (okay, ECMAScript) is actually a pretty nice language. It's just the browser implementations that generally suck. But it's certainly no worse than PHP.
Oh, it's almost certainly better than PHP. My brain still needs a shower -- unless this means that all PHP will now magically be replaced with server-side Javascript, in which case we might have a net win.
"Domas Mituzas"wrote:
Don't take this as an insult, because I'm really not sure, but I was really joking. Building Mediawiki in javascript is a joke, which is impossible....right?. I mean, how would javascript write to the database. Maybe you were thinking I was talking about AJAX?
You haven't heard about bittorrent? There will be no need for the database!!!
Well, this was thought as a joke, but it exists! It's a mix of wiki & peer2peer. You pass to your contacts those articles you think are good and block those bad. A complete anarchy, impractical for wikipedia, we would lose our benevolent dictator ;-)
Platonides wrote:
"Domas Mituzas"wrote:
Don't take this as an insult, because I'm really not sure, but I was really joking. Building Mediawiki in javascript is a joke, which is impossible....right?. I mean, how would javascript write to the database. Maybe you were thinking I was talking about AJAX?
You haven't heard about bittorrent? There will be no need for the database!!!
Well, this was thought as a joke, but it exists! It's a mix of wiki & peer2peer. You pass to your contacts those articles you think are good and block those bad. A complete anarchy, impractical for wikipedia, we would lose our benevolent dictator ;-)
Hoi, Actually, the software for Wikipedia in a peer to peer environment has already largely been written.. The part that needs doing is the modelling of the distribution given an evolving demand for content. To do this a GRID network is available to do the emulation of the network and the algorithms involved. In order to run this emulation traffic data is needed. There is a tool that can collect our traffic data real time.. The waiting is for the implementation of all this... It can be implemented when there is a decision that such an exercise is not only of great academic interest but has also a great potential benefit.
PS this tool can also give us statistics of the traffic to our services ..
Thanks, GerardM
On 7/28/06, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, Actually, the software for Wikipedia in a peer to peer environment has already largely been written..
Really? Where can I download it?
On Fri, Jul 28, 2006 at 03:39:35AM -0700, mboverload wrote:
On 7/27/06, Chad Perrin perrin@apotheon.com wrote:
On Thu, Jul 27, 2006 at 08:03:00PM -0700, mboverload wrote:
Everyone knows that Mediawiki should have been written in
Javascript. The
current implementation is a mess.
Frankly, if it was written end-to-end in Javascript, I (and millions of others) would probably use something else.
Don't take this as an insult, because I'm really not sure, but I was really joking. Building Mediawiki in javascript is a joke, which is impossible....right?. I mean, how would javascript write to the database. Maybe you were thinking I was talking about AJAX?
I had the impression it was probably a joke, but decided to answer the hypothetical unasked question anyway -- and it would be "impossible" to write the whole thing as it currently exists in Javascript, with current implementations of Javascript, but it's not impossible to create an online encyclopedia without any server-side scripting for the website.
On Fri, Jul 28, 2006 at 01:34:31AM -0000, Greg Sabino Mullane wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Unthreaded: in a clear field, Chad, what *would* you have implemented MediaWiki in?
Dunno about Chad, but I would have implemented it in Perl, of course. With perhaps bison and Inline::* where needed.
That's actually pretty much my answer.
And why?
That's a loaded question. :) More experienced people familiar with the language available for development. Namespaces. A mature database API. No php.ini mess. Unicode. Lexical variables. Real hashes. "use strict". Consistent naming, use of case, and return values. The ability to use qq{}. Perldoc[1]. Real references and data structures. Good comparison operators. XS. True object orientation.
Those are pretty much my reasons. Are you preplagiarizing me? (ahem)
However, PHP is what we got, and MediaWiki is pretty well written and head and shoulders above 99% of the PHP apps out there. Once I finish Postgres support for MediaWiki, I'll be converting it to Perl. Just don't hold your breath. :)
Ditto that first sentence. Cheers and applause for the second. I wasn't planning to, with regard to the third.
As for original content, rather than just metoos . . .
Another option, besides Perl, that appeals to me is Ruby -- and for many of the same reasons as Perl (though it lags in some areas, such as number of available developers and volume of existing code that could be used). It also has some benefits that distinguish it from Perl, such as a far better syntax for its object model and a tendency to encourage readable code more readily (that doesn't mean you can't write equally readable Perl, just that the language's syntax tends to "encourage" it more than Perl's). Both languages have thoroughly excellent regex engines, with Perl's having perhaps a slight advantage, easily made up by Ruby's facility with iteration. On the other hand, there's the simple fact that Perl execution performance kicks butt all over Ruby, and perhaps every other high-level, reasonably dynamic, comparable language, for most purposes.
If I knew enough of some Lisp to be functional (ha ha, 'scuse the pun), I might lean in that direction as well.
I like the proposed idea of writing one or two core, high-load components in C, or even (if we're really adventurous) something better performing like Ada, though that's probably really pushing it. Since I don't much enjoy looking at C code, though, I might just ask someone else to write the C, so I guess *I* in particular might not implement any of it in C. Eh.
Perl really strikes me as the clear winner, overall, with Ruby a close second about a hair's-breadth behind it.
Chad Perrin wrote:
Perl really strikes me as the clear winner
I'm really reluctant to get involved in this discussion, but I'll pitch in here. In practice, I've found that the only way to keep shared Perl codebases from turning into a heaping pile of craptitude is to have them hacked on exclusively by people with pretty deep Perl expertise, which tends to not be the case for (most) open source projects that aren't centered around the language. Your mileage may certainly vary.
FWIW, after reaching the second to last stage of Nat Torkington's 'Seven stages of a Perl programmer,' and using Perl as a primary language for more years than I care to count, I've personally found Zen in Python.
On 7/28/06, Ivan Krstic krstic@solarsail.hcs.harvard.edu wrote:
FWIW, after reaching the second to last stage of Nat Torkington's 'Seven stages of a Perl programmer,' and using Perl as a primary language for more years than I care to count, I've personally found Zen in Python.
I googled this and found references to a seven stages by Tom Christiansen...
http://prometheus.frii.com/~gnat/yapc/2000-stages/slide1.html
According to that, you have not yet rewritten major parts of the compiler, or become on first-name terms with Larry's wife.
Steve
Steve Bennett wrote:
I googled this and found references to a seven stages by Tom Christiansen...
Gnat gave the (relatively famous) YAPC talk, though the original 'seven stages' observation was by Tom, yes.
english
Ivan Krstic krstic@solarsail.hcs.harvard.edu a écrit : Steve Bennett wrote:
I googled this and found references to a seven stages by Tom Christiansen...
Gnat gave the (relatively famous) YAPC talk, though the original 'seven stages' observation was by Tom, yes.
hello I would like to learn the English language for reason strictly professionnele
djafer3107 maatallah djafer3107@yahoo.fr a écrit : english
Ivan Krstic a écrit : Steve Bennett wrote:
I googled this and found references to a seven stages by Tom Christiansen...
Gnat gave the (relatively famous) YAPC talk, though the original 'seven stages' observation was by Tom, yes.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Moin,
On Friday 28 July 2006 08:44, Chad Perrin wrote: [snipauiteabit]
Perl really strikes me as the clear winner, overall, with Ruby a close second about a hair's-breadth behind it.
<offtopic>
I don't know much about Ruby, except that I heard Unicode support is really lacking. Which pretty much rules it out for anything serious text processing in this age :-D
However, the entire Ruby project always stuck me as a me-too-lets-reinvent-the-wheel-and-this-time-make-it-rounder project, like so many others (*cough*Perl6*cough).
Yes, Perl5 has some problems, like carrying baggage from a decade or two that nobody really needs anymore, but I am not sure that yet-another-interpreted-language (that is only 60..90% complete, undertested etc) is the real answer. It just fragments the coder base even more.
We have way too many programming languages already.
Best wishes,
Tels
- -- Signed on Fri Jul 28 12:35:59 2006 with key 0x93B84C15. Visit my photo gallery at http://bloodgate.com/photos/ PGP key on http://bloodgate.com/tels.asc or per email.
"Wo die Schoschonen schön wohnen."
On 7/28/06, Carlos angus@quovadis.com.ar wrote:
Tels wrote:
I don't know much about Ruby, except that I heard Unicode support is really lacking. Which pretty much rules it out for anything serious text processing in this age :-D
Does MediaWiki do "anything serious text processing"?
In case you didn't know, mediawiki works with several non-english language, which are more or less impossible to support unless your programming language understands Unicode.
henna
henna wrote:
On 7/28/06, Carlos angus@quovadis.com.ar wrote:
Tels wrote:
I don't know much about Ruby, except that I heard Unicode support is really lacking. Which pretty much rules it out for anything serious text processing in this age :-D
Does MediaWiki do "anything serious text processing"?
In case you didn't know, mediawiki works with several non-english language, which are more or less impossible to support unless your programming language understands Unicode.
And how is it possible now?
Tip: investigate before answering (search for 'xc0' in the code tree).
"Carlos" wrote:
In case you didn't know, mediawiki works with several non-english language, which are more or less impossible to support unless your programming language understands Unicode.
And how is it possible now?
Tip: investigate before answering (search for 'xc0' in the code tree).
MediaWiki uses the php binary strings, where you can store anything. Unicode or not.
Unicode support in PHP is also lacking, you can't safe a PHP-Script as UTF-8 or even UTF-16. But a programming language more or less does not even need to support, as unicode is designed to be compatible to older charsets. You can also handle strings as binary data, no problem then.
For PHP there is also a multibyte extension and there are some functions available which allow you to converts ISo-8859-X and UTF-8 (for XML support e.g.). I don't know how far mediawiki makes use of such features, as it uses UTF-8 only i guess - which is compatible and therefore there is no need for PHP to be "compliant" or something like that.
Carlos schrieb:
henna wrote:
On 7/28/06, Carlos angus@quovadis.com.ar wrote:
Tels wrote:
I don't know much about Ruby, except that I heard Unicode support is really lacking. Which pretty much rules it out for anything serious text processing in this age :-D
Does MediaWiki do "anything serious text processing"?
In case you didn't know, mediawiki works with several non-english language, which are more or less impossible to support unless your programming language understands Unicode.
And how is it possible now?
Tip: investigate before answering (search for 'xc0' in the code tree).
On Mon, Jul 31, 2006 at 04:46:58PM +0200, Warhog (aja Julian Fleischer) wrote:
Unicode support in PHP is also lacking, you can't safe a PHP-Script as UTF-8 or even UTF-16. But a programming language more or less does not even need to support, as unicode is designed to be compatible to older charsets. You can also handle strings as binary data, no problem then.
For PHP there is also a multibyte extension and there are some functions available which allow you to converts ISo-8859-X and UTF-8 (for XML support e.g.). I don't know how far mediawiki makes use of such features, as it uses UTF-8 only i guess - which is compatible and therefore there is no need for PHP to be "compliant" or something like that.
PHP is supposedly planning to incorporate Python's ICU, which has some reasonable Unicode support for regexen, at some point in the future. Ruby is reportedly integrating Oniguruma (a regular expression engine) by the end of the year, which will apparently provide substantial Unicode support -- though Oniguruma can be used now as an external library, of course, and someone started supporting ICU support for Ruby a while ago too (as an external library -- though of course it's an external library in Python too). Perl, of course, probably has several dozen ways to support Unicode in CPAN.
. . . but as far as I'm aware, there's no such thing as a language that provides full native Unicode support. The best we could do is use an external library, which is something you can do with Ruby anyway.
On 7/31/06, Chad Perrin perrin@apotheon.com wrote:
. . . but as far as I'm aware, there's no such thing as a language that provides full native Unicode support.
It appears PHP 6 will: http://www.zend.com/zend/week/php-unicode-design.txt.
Chad Perrin wrote:
PHP is supposedly planning to incorporate Python's ICU, which has some reasonable Unicode support for regexen, at some point in the future.
PHP already has unicode regex support, because PCRE has had it for some time and PHP just bundles that. In fact, the simplest way to split a UTF-8 string by character in PHP 4-5 with no mbstring is to do preg_match_all('/./u',...). MediaWiki uses this on occasion.
In PHP 6, they are moving to a 16-bit character type (not sure if it's UTF-16 or UCS-2), with a distinct binary string type. If "unicode semantics" are enabled, string literals will be unicode by default, and all the usual string operations would be character-wise. I dare say this would cause some backwards compatibility problems for applications such as MediaWiki.
PHP 6 requires ICU for its internal unicode support, but I'm not sure to what extent they will be providing interfaces to ICU's more complex functions. Note that ICU is not "Python's ICU", it's a library written by IBM which is natively C, C++ and Java. There is a set of swig wrappers to bind the C++ API to Python.
-- Tim Starling
On Fri, Jul 28, 2006 at 12:40:00PM +0200, Tels wrote:
On Friday 28 July 2006 08:44, Chad Perrin wrote: [snipauiteabit]
Perl really strikes me as the clear winner, overall, with Ruby a close second about a hair's-breadth behind it.
<offtopic>
I don't know much about Ruby, except that I heard Unicode support is really lacking. Which pretty much rules it out for anything serious text processing in this age :-D
"Really lacking" seems to be a bit too vehement for the current state of affairs. As far as I'm aware, regex support is excellent. Some kernel methods don't handle Unicode transformations as well as they could, though the problem isn't in Unicode support so much as in localization (handling capitalization in German and Turkish, for instance, which are a touch quirky by English language standards). I'm pretty sure Ruby isn't the only language to run into localization issues from time to time, and it may be that Ruby runs into them more often because of its wealth of convenient string operation methods that are often lacking in other languages.
However, the entire Ruby project always stuck me as a me-too-lets-reinvent-the-wheel-and-this-time-make-it-rounder project, like so many others (*cough*Perl6*cough).
1. That could be said of EVERY language. I suppose we could just rewrite MediaWiki in Assembly language, after all.
2. Ruby is quite NOT a "metoo" language. It's older than most realize (spending most of its formative years in Japan that largely gained Western recognition due to the advent of Rails does not equate to a lifespan measurable entirely in this century). It also offers a lot of capabilities and conveniences not found in many other languages. It may be the closest thing we've got to a Lisp with an imperative/OO syntax, for instance.
3. Perl 6 is an attempt to address some very real problems, not just a "reinvent and make it rounder" effort as you so easily dismiss it. The fact it's vaporware doesn't change the fact that, if it ever completes, it's likely to be an excellent and needed language. On the other hand, I suppose all we really need is a hex editor so we're not reinventing wheels.
Yes, Perl5 has some problems, like carrying baggage from a decade or two that nobody really needs anymore, but I am not sure that yet-another-interpreted-language (that is only 60..90% complete, undertested etc) is the real answer. It just fragments the coder base even more.
Did you perhaps mean "99.60..99.90%" there?
We have way too many programming languages already.
Yeah, progress sucks. Ahem.
On Fri, Jul 28, 2006 at 12:44:33AM -0600, Chad Perrin wrote:
I like the proposed idea of writing one or two core, high-load components in C, or even (if we're really adventurous) something better performing like Ada, though that's probably really pushing it. Since I don't much enjoy looking at C code, though, I might just ask someone else to write the C, so I guess *I* in particular might not implement any of it in C. Eh.
We actually do this. The diff engine exists as a PHP plugin written in C (or C++, which is nearly the same). Tim is working on another plugin that will do much faster translations of UTF-8 strings from e.g. traditional to simplified Chinese.
Regards,
jens
On Fri, Jul 28, 2006 at 06:39:12PM +0200, Jens Frank wrote:
On Fri, Jul 28, 2006 at 12:44:33AM -0600, Chad Perrin wrote:
I like the proposed idea of writing one or two core, high-load components in C, or even (if we're really adventurous) something better performing like Ada, though that's probably really pushing it. Since I don't much enjoy looking at C code, though, I might just ask someone else to write the C, so I guess *I* in particular might not implement any of it in C. Eh.
We actually do this. The diff engine exists as a PHP plugin written in C (or C++, which is nearly the same).
. . . aside from the fact that it often exhibits an at least arithmetic increase in execution time as compared with straight C.
Sorry, minor, largely pointless quibble. There's a bit of a language execution time debate going on somewhere else that has absorbed me recently. The subject is sorta stuck in my head at the moment.
On Fri, Jul 28, 2006 at 10:59:40AM -0600, Chad Perrin wrote:
. . . aside from the fact that it often exhibits an at least arithmetic increase in execution time as compared with straight C.
Sorry, minor, largely pointless quibble.
It's not pointless when you're putting out >100Mbps continuously.
Cheers, -- jra
Chad Perrin wrote:
On Fri, Jul 28, 2006 at 06:39:12PM +0200, Jens Frank wrote:
[...]
We actually do this. The diff engine exists as a PHP plugin written in C (or C++, which is nearly the same).
. . . aside from the fact that it often exhibits an at least arithmetic increase in execution time as compared with straight C.
Sorry, minor, largely pointless quibble. There's a bit of a language execution time debate going on somewhere else that has absorbed me recently. The subject is sorta stuck in my head at the moment.
Through my aborted PhD, I have a couple of years experience writing highly efficient C++ programs. The aforementioned wikidiff2 extension does no memory allocation during the computational phase, instead of substring operations it passes around start and end iterators. There are no virtual functions. The only remaining performance hit is the need to pass "this" pointers and dereference them, although that's a feature of structured C programming as well, and provides a great deal of flexibility. The performance penalty can be controlled through the extensive use of inline functions.
There are certain programming styles which are popular in C++, which do produce a significant performance hit. That's not an issue when your main goal is to produce high-performance code -- in that case you can easily avoid the pitfalls, and the remaining optimisation issues are largely the same as in C.
-- Tim Starling
On the subject of diff efficiency, which button is more expensive, Show Preview or Show Changes? I have an ultraparanoid bot which "presses" one or both before actually submitting, and it occurs to me that it would be marginally easier on the servers if the bot used the cheaper one first. (Though of course only if substantial numbers of its doublechecks failed, which they typically don't.)
Show changes has to compare two versions, so I would guess that's the bigger hog
On 7/30/06, Steve Summit scs@eskimo.com wrote:
On the subject of diff efficiency, which button is more expensive, Show Preview or Show Changes? I have an ultraparanoid bot which "presses" one or both before actually submitting, and it occurs to me that it would be marginally easier on the servers if the bot used the cheaper one first. (Though of course only if substantial numbers of its doublechecks failed, which they typically don't.) _______________________________________________ Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Steve Summit wrote:
On the subject of diff efficiency, which button is more expensive, Show Preview or Show Changes? I have an ultraparanoid bot which "presses" one or both before actually submitting, and it occurs to me that it would be marginally easier on the servers if the bot used the cheaper one first. (Though of course only if substantial numbers of its doublechecks failed, which they typically don't.)
For whole pages, generating a diff takes about 2ms plus 40ms to load the text to diff against, rendering a page takes about 800ms. So it's cheaper to use "show changes" by a factor of 20. In fact, we've been considering suppressing the "current version" display from diff pages by default, especially for large pages, since rendering the current version constitutes the vast majority of the time it takes to generate those pages. These figures are averages based on latest profiling, some pages take significantly longer than 800ms.
-- Tim Starling
On 7/30/06, Tim Starling t.starling@physics.unimelb.edu.au wrote:
For whole pages, generating a diff takes about 2ms plus 40ms to load the text to diff against, rendering a page takes about 800ms.
800ms is consistent from what I measured when I was trying to parse all past revisions.... but I thought I was doing something wrong at the time because I couldn't imagine that it was that slow.
On 7/31/06, Tim Starling t.starling@physics.unimelb.edu.au wrote:
For whole pages, generating a diff takes about 2ms plus 40ms to load the text to diff against, rendering a page takes about 800ms. So it's cheaper to use "show changes" by a factor of 20. In fact, we've been considering suppressing the "current version" display from diff pages by default, especially for large pages, since rendering the current version constitutes the vast majority of the time it takes to generate those pages. These figures are averages based on latest profiling, some pages take significantly longer than 800ms.
Now, if only the rendering all took place in JavaScript...!
:)
Steve
Tim Starling wrote:
For whole pages, generating a diff takes about 2ms plus 40ms to load the text to diff against, rendering a page takes about 800ms.
I'd gotten the impression that rendering was slower. Thanks for confirming.
So it's cheaper to use "show changes" by a factor of 20.
That much!
In fact, we've been considering suppressing the "current version" display from diff pages...
But until you do, I guess the point (i.e. which button a conscientious bot or user should press first) is moot, since the slow preview button takes 800 ms, and the fast diff button takes 40+800ms.
On 7/31/06, Steve Summit scs@eskimo.com wrote:
But until you do, I guess the point (i.e. which button a conscientious bot or user should press first) is moot, since the slow preview button takes 800 ms, and the fast diff button takes 40+800ms.
Nope, because "show changes" doesn't render the page. It just shows you the diff.
Simetrical wrote:
On 7/31/06, Steve Summit scs@eskimo.com wrote:
Tim Starling wrote:
In fact, we've been considering suppressing the "current version" display from diff pages...
But until you do, I guess the point (i.e. which button a conscientious bot or user should press first) is moot, since the slow preview button takes 800 ms, and the fast diff button takes 40+800ms.
Nope, because "show changes" doesn't render the page. It just shows you the diff.
(smacks forehead)
I fell for Tim's sly misdirection utterly. :-) He was talking about, and just there I was thinking of, the diffs run off of the history page, not the edit page.
How 'bout we get Cray to donate a supercomputer and we just have that do the diffs. =D
On 7/31/06, Steve Summit scs@eskimo.com wrote:
Simetrical wrote:
On 7/31/06, Steve Summit scs@eskimo.com wrote:
Tim Starling wrote:
In fact, we've been considering suppressing the "current version" display from diff pages...
But until you do, I guess the point (i.e. which button a conscientious bot or user should press first) is moot, since the slow preview button takes 800 ms, and the fast diff button takes 40+800ms.
Nope, because "show changes" doesn't render the page. It just shows you the diff.
(smacks forehead)
I fell for Tim's sly misdirection utterly. :-) He was talking about, and just there I was thinking of, the diffs run off of the history page, not the edit page. _______________________________________________ Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
On 8/1/06, mboverload mboverload@gmail.com wrote:
How 'bout we get Cray to donate a supercomputer and we just have that do the diffs. =D
We'll get a few and give them typewriters, then they can go ahead and write the encyclopaedia. Easier to clean up after than all those monkeys.
Hi!
Dunno about Chad, but I would have implemented it in Perl, of course. With perhaps bison and Inline::* where needed.
Do I have still time to suggest ObjectPerlthonasskelisp++#.net?
Anyway, languages/environments used by bigger web shops:
PHP (Yahoo, Wikipedia!!!) Python (parts of Google?) Java (um, around) ColdFusion (MySpace :-)
Amazon has SOA with stuff around implemented in nearly all languages... Quite nice approach is Mono with different languages inside (like.. Python engine by Microsoft ;-) As for personal preferences, I'd run away from Perl (did drop it 5 years ago), and use Python (with psyco \o/ )
BR, Domas
On Fri, Jul 28, 2006 at 10:46:36AM +0300, Domas Mituzas wrote:
Hi!
Dunno about Chad, but I would have implemented it in Perl, of course. With perhaps bison and Inline::* where needed.
Do I have still time to suggest ObjectPerlthonasskelisp++#.net?
Anyway, languages/environments used by bigger web shops:
PHP (Yahoo, Wikipedia!!!) Python (parts of Google?) Java (um, around) ColdFusion (MySpace :-)
Perl (Slashdot)
Don't forget where the term "the slashdot effect" originated.
Amazon has SOA with stuff around implemented in nearly all languages... Quite nice approach is Mono with different languages inside (like.. Python engine by Microsoft ;-) As for personal preferences, I'd run away from Perl (did drop it 5 years ago), and use Python (with psyco \o/ )
Python makes my eyes bleed. I'd be much happier with Perl -- ESPECIALLY for anything involving regexen. Holy mudder o' gob, but Python's regex syntax is a fork in the eye, comparable in heinousness to PHP's.
On 7/28/06, Domas Mituzas midom.lists@gmail.com wrote:
Hi!
Dunno about Chad, but I would have implemented it in Perl, of course. With perhaps bison and Inline::* where needed.
Do I have still time to suggest ObjectPerlthonasskelisp++#.net?
Anyway, languages/environments used by bigger web shops:
PHP (Yahoo, Wikipedia!!!) Python (parts of Google?)
google does indeed use python for stuff, as well as C if I'm not mistaken
henna
henna wrote:
google does indeed use python for stuff, as well as C if I'm not mistaken
Java, Python, C++ are the three most common languages at Google, in that order, according to Greg Stein in '05.
On Fri, Jul 28, 2006 at 10:46:36AM +0300, Domas Mituzas wrote:
Do I have still time to suggest ObjectPerlthonasskelisp++#.net?
That's sorta like the language name version of Chriskwaanzukkah, right?
Python (parts of Google?)
And this is the dog that didn't bark; I was a bit surprised that we didn't hear any more comments about Python as a possibility.
Ok, second question: how much leverage would we *lose* if we weren't in PHP? (Isn't some of our caching PHP-specific?)
Cheers, -- jra
On Fri, Jul 28, 2006 at 11:41:42AM -0400, Jay R. Ashworth wrote:
On Fri, Jul 28, 2006 at 10:46:36AM +0300, Domas Mituzas wrote:
Do I have still time to suggest ObjectPerlthonasskelisp++#.net?
That's sorta like the language name version of Chriskwaanzukkah, right?
I think that's "Chrismahannukwanzaaka", actually.
Python (parts of Google?)
And this is the dog that didn't bark; I was a bit surprised that we didn't hear any more comments about Python as a possibility.
I'm glad we didn't.
By the way . . . When I mentioned Slashdot as a Perl example, I apparently shot too low. I'd forgotten about Amazon.com, LiveJournal, IMDB, and del.icio.us, among others.
On 7/27/06, Jay R. Ashworth jra@baylink.com wrote:
Unthreaded: in a clear field, Chad, what *would* you have implemented MediaWiki in? And why?
Much of recent development and administration has focused on caching, clustering, failover, load balancing, and so on. It seems to me that the decision to use a ready-made application server like JBoss, Resin or Zope (or WebSphere if you want proprietary) or a different database server would make a greater difference for long term deployment of a very large scale wiki farm like Wikimedia than the choice of a particular programming language (though of course one may imply the other).
That being said, from the rough numbers I've seen about similarly sized sites like eBay or Amazon.com, which typically use such application server architectures, we are running on a ridiculously small amount of hardware. For instance, eBay in 2004 was running with 200 database backend servers [1] -- I don't think you'll find detailed specs for these, but according to [2] we're talking about big Sun machines. Of course, we're not even close to Amazon.com's or eBay's reliability.
Nevertheless, it seems clear that our "roll your own" approach, while more intensive in developer work, can save significantly on hardware. It's also interesting to compare Flickr's technological evolution, which is quite similar to our own: http://www.ludicorp.com/flickr/zend-talk.ppt
Is similar information available about Yahoo!'s setup?
One major downside is that such "perpetually customized" setups can become very complex and hard to replicate quite quickly.
It's also important to emphasize that _Wikimedia_ (as opposed to MediaWiki) runs on more than just PHP. In fact, there's probably not a single mainstream programming language that hasn't been used somewhere on the Wikimedia servers. Brion seems to greatly enjoy experimenting with new languages, and even MediaWiki itself comes with an OCaml extension. :-) It's certainly a rich learning environment.
Erik
[1] http://www.eweek.com/article2/0,1895,1640310,00.asp [2] http://www.sun.com/service/about/success/ebay.xml
Hi!
Much of recent development and administration has focused on caching, clustering, failover, load balancing, and so on.
This is more part of architecture, than of software used.
It seems to me that the decision to use a ready-made application server like JBoss, Resin or Zope (or WebSphere if you want proprietary) or a different database server would make a greater difference for long term deployment of a very large scale wiki farm like Wikimedia than the choice of a particular programming language (though of course one may imply the other).
I'm not aware of any huge-scale environments, that would be running on Jboss, Resin or Zope (or Websphere). Application servers with all the integration magic are required mostly for complex applications that have hundreds of thousands of developers ;)
Regarding databases - here again, architecture imposes what you use. Right now our architecture consists of:
a) Small replicated core database sets (per-language) b) Pools of replicated text storage nodes c) Pool of in-memory hash lossy store nodes
Probably we might be adding a
d) Pool of fully clustered storage, for session objects.
What we can introduce - different storage paradigms for different objects, and here we choose software that works and is easy to maintain.
That being said, from the rough numbers I've seen about similarly sized sites like eBay or Amazon.com, which typically use such application server architectures, we are running on a ridiculously
I'm not totally compatible with all enterprise software, but at least Amazon is using pretty lightweight setup with most of stuff being routed to 'services' around, SOAP, WSDL, yadda yadda. I'm not sure if they need full-blown app server for this at the front.
Nevertheless, it seems clear that our "roll your own" approach, while more intensive in developer work, can save significantly on hardware. It's also interesting to compare Flickr's technological evolution, which is quite similar to our own: http://www.ludicorp.com/flickr/zend-talk.ppt
I'm not sure it takes us longer to roll our own stuff, than it would take to leverage all the other solutions.
Is similar information available about Yahoo!'s setup?
They're quite similar to us, it is just that they have more redundancy over multiple datacenters (and have practice of serving from multiple datacenters at the same time too).
with new languages, and even MediaWiki itself comes with an OCaml extension. :-) It's certainly a rich learning environment.
Well, yes, we even have direct PHP extensions written in C++/C, and there's various outside-of-mediawiki code in boo, python, C#, perl too ;-)
On 7/28/06, Domas Mituzas midom.lists@gmail.com wrote:
I'm not aware of any huge-scale environments, that would be running on Jboss, Resin or Zope (or Websphere).
The open source ones are fairly untested in _huge_-scale environments AFAICT (then again, so was PHP until recently). But for "enterprise level" scalability, see http://www.zope.com/customers/case_studies.html http://www.jboss.com/customers/index
eBay is "powered by WebSphere" and probably a good comparison as a dynamic web application. They're still getting more traffic than we do. ;-)
I'm not totally compatible with all enterprise software, but at least Amazon is using pretty lightweight setup with most of stuff being routed to 'services' around, SOAP, WSDL, yadda yadda. I'm not sure if they need full-blown app server for this at the front.
What I do know is that they're making a bloody mess out of all those aggregated web services. A little tagging here, a little wiki there, here's some recommendations, and a wishlist and a puppy, too ...
Erik
Erik Moeller wrote:
On 7/27/06, Jay R. Ashworth jra@baylink.com wrote:
Unthreaded: in a clear field, Chad, what *would* you have implemented MediaWiki in? And why?
Much of recent development and administration has focused on caching, clustering, failover, load balancing, and so on. It seems to me that the decision to use a ready-made application server like JBoss, Resin or Zope (or WebSphere if you want proprietary) or a different database server would make a greater difference for long term deployment of a very large scale wiki farm like Wikimedia than the choice of a particular programming language (though of course one may imply the other).
One downside to the application-server approach is that, unless carefully written so use of the application server was optional, it would significantly complicate the reuse potential of MediaWiki by smaller-scale operations. Now making MediaWiki maximally reusable open-source software is a secondary goal to using it ourselves, but insofar as it's the only reliable renderer of our content, it's not entirely irrelevant.
The current setup has the nice benefit that it is almost ludicrously easy to configure: The only prerequisites are the very standard MySQL and PHP, and then you just untar the source files and run config.php.
-Mark
On Fri, Jul 28, 2006 at 05:36:38PM -0400, Delirium wrote:
The current setup has the nice benefit that it is almost ludicrously easy to configure: The only prerequisites are the very standard MySQL and PHP, and then you just untar the source files and run config.php.
. . . and that is just awesome.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Moin,
On Saturday 29 July 2006 00:41, Chad Perrin wrote:
On Fri, Jul 28, 2006 at 05:36:38PM -0400, Delirium wrote:
The current setup has the nice benefit that it is almost ludicrously easy to configure: The only prerequisites are the very standard MySQL and PHP, and then you just untar the source files and run config.php.
. . . and that is just awesome.
Indeed. It made painfully aware to me how complicated most softwareis to install, especially the one written by me. Taking a slice out of that and making software easier to install and use is now oneof my goals, and mediawiki is a role model for that.
best wishes,
tels
- -- Signed on Sat Jul 29 02:10:52 2006 with key 0x93B84C15. Visit my photo gallery at http://bloodgate.com/photos/ PGP key on http://bloodgate.com/tels.asc or per email.
"Some spammers have this warped idea that their freedom of speech is guaranteed all the way into my hard drive, but it is my firm belief that their rights end at my firewall." -- Nigel Featherston
wikitech-l@lists.wikimedia.org