On 10/26/07, Steve Sanbeg ssanbeg@ask.com wrote:
On Fri, 26 Oct 2007 15:05:44 -0400, Simetrical wrote:
On 10/26/07, Steve Sanbeg ssanbeg@ask.com wrote: That depends on a number of things. Twelve passes in C is certainly a *lot* faster than twelve passes in PHP. Remember that the difference engine used to be one of the slowest components of MediaWiki, until it was rewritten (using an identical algorithm) in C++ -- now it's far faster than rendering the exact same page.
My own experiences with perl & C haven't shown such dramatic differences, and that some operations scale linearly with the number of passes. I was assuming PHP would be similar, although I haven't benchmarked differences in language or passes for this.
It really depends on what you're doing. If you're doing some simple regex of input data, almost all the heavy lifting is done in C anyway. But the Parser is 5000 lines of PHP code, the most troublesome parts of which are called repeatedly for complicated templates. Computation tends to be between ten and a hundred times faster in C than in interpreted languages, according to various benchmarks, depending on the exact task. The differences in performance when using wikidiff2 versus the built-in diff engine aren't made up.
Of course, there would be many other possible parser optimizations. If templates inserted HTML rather than wikitext, for instance, they could be cached separately from the including articles, so that a header or infobox template wouldn't need to be rerendered every time there was a change to article content. But that would be a major change to functionality, I suspect.
The number of individual characters that are significant to wiki markup is actually fairly small. Changing it to one pass would significantly alter the language in a lot of cases. But I still think if we could do it in three or so passes it would be faster, even if we did have to deal with dozens, or even hundreds, of individual characters.
So preg_split on every significant character, and iterate through each of those? Maybe. I'm really overstepping my expertise by venturing to comment much here.
The side affect might be that large classes of those spaghetti templates become inoperable.
Which is really the idea, isn't it? It's not what I'd call a side effect, the point is to kill them.