I just doubled the speed of the PHP parser.
In my test page ([[Anime]], ~60 links with half broken), I cut the time for replaceInternalLinks from 800ms to 350ms, and the time for Article::view from 1310 to 610ms.
This was acheived by eliminating redundant calls to secureAndSplit, using static variables for constants, and catering to PHP oddities such as the fact that === is slower than ==.
Okay, you may shower me with praise now.
-- Tim Starling.
Tim-
I just doubled the speed of the PHP parser.
In my test page ([[Anime]], ~60 links with half broken), I cut the time for replaceInternalLinks from 800ms to 350ms, and the time for Article::view from 1310 to 610ms.
This was acheived by eliminating redundant calls to secureAndSplit, using static variables for constants, and catering to PHP oddities such as the fact that === is slower than ==.
Okay, you may shower me with praise now.
Very nice. Can we get this into stable ASAP?
Regards,
Erik
Tim Starling wrote:
I just doubled the speed of the PHP parser.
In my test page ([[Anime]], ~60 links with half broken), I cut the time for replaceInternalLinks from 800ms to 350ms, and the time for Article::view from 1310 to 610ms.
This was acheived by eliminating redundant calls to secureAndSplit, using static variables for constants, and catering to PHP oddities such as the fact that === is slower than ==.
Okay, you may shower me with praise now.
-- Tim Starling.
Fantastic!
-- Neil
Tim Starling wrote:
I just doubled the speed of the PHP parser.
In my test page ([[Anime]], ~60 links with half broken), I cut the time for replaceInternalLinks from 800ms to 350ms, and the time for Article::view from 1310 to 610ms.
This was acheived by eliminating redundant calls to secureAndSplit, using static variables for constants, and catering to PHP oddities such as the fact that === is slower than ==.
Okay, you may shower me with praise now.
Cool! Keep going like this, and I'll throw my C++-parser away ;-)
Magnus
On Wednesday, Oct 22, 2003, at 16:57 US/Pacific, Tim Starling wrote:
This was acheived by eliminating redundant calls to secureAndSplit, using static variables for constants, and catering to PHP oddities such as the fact that === is slower than ==.
Okay, you may shower me with praise now.
*sprinkle sprinkle sprinkle* Three cheers for Tim!
I've backported at least part of the changes to stable; it does make a difference. On a copy of [[List of China-related topics]], with 1994 broken links and 1 live one, the new code gives about a 15% increase in total page load speed -- which is a big difference considering that the page takes about 3.5 seconds to render on my 2GHz Athlon with a single request as the sole load! 16 of those page views in 60 seconds vs 70 seconds is a definite improvement. I'm sure there's more tweaking to be done...
(A note: when you're into this many links, the time it takes to load the info out of the link tables can actually be a significant chunk of the render time. More aggressive caching will hopefully render all this moot soon, though...)
Since I'm insane, I've also rewritten the replaceInternalLinks loop, which has been pissing me off for a long time. It now lets secureAndSplit do the link parsing rather than trying to do some of it itself, so the code should be easier to maintain. The rewrite doesn't seem to have made a significant impact on speed either way compared with my initial backport of Tim's bits, but the code's IMHO cleaner and I fixed some bugs while I was in there:
* initial spaces in the title of a link with a namespace are now trimmed, so [[Wikipedia:_Oops]] now maps correctly to [[Wikipedia:Oops]] instead of generating a technically illegal title with initial whitespace in cur_title. * 'media' is treated as a localizable pseudo-namespace like 'special', and can be adjusted in the language files * media links should now go into the imagelinks table and show up in the image backlinks * on the off chance someone makes a link in the form [[:media:foo.jpg]], the link won't turn into a misguided attempt to link to an article called "Media:foo.jpg". * inline language links in the form [[:fr:lien interwiki]] finally work * certain illegal links that were vanishing from the output completely are now rendered as plaintext (such as "[[ ]]")
Another behavior change that could be changed back if people don't think it's a good idea: * on links with the initial colon, the colon now isn't displayed in the default link text * 'class="internal"' removed from normal inline links, it just wastes bandwidth without doing anything useful
This is all "bug fixes" in stable... I've committed to cvs but haven't installed it just yet, but unless someone turns up problems in testing I'll install it tomorrow or so when I've got time to look it over and babysit the server for a while after it's online.
The new fixes will need merging into the dev branch along with all the other fixes...
-- brion vibber (brion @ pobox.com)
Brion Vibber wrote:
I've backported at least part of the changes to stable; it does make a difference. On a copy of [[List of China-related topics]], with 1994 broken links and 1 live one, the new code gives about a 15% increase in total page load speed -- which is a big difference considering that the page takes about 3.5 seconds to render on my 2GHz Athlon with a single request as the sole load! 16 of those page views in 60 seconds vs 70 seconds is a definite improvement. I'm sure there's more tweaking to be done...
(A note: when you're into this many links, the time it takes to load the info out of the link tables can actually be a significant chunk of the render time. More aggressive caching will hopefully render all this moot soon, though...)
Since I'm insane, I've also rewritten the replaceInternalLinks loop, which has been pissing me off for a long time. It now lets secureAndSplit do the link parsing rather than trying to do some of it itself, so the code should be easier to maintain.
That's good. I cut down the number of title-parsing operations from 4 or 5 to 2, and you got it from 2 to 1. It probably didn't impact on speed much because secureAndSplit is slower than the way replaceInternalLinks was doing it. Optimising secureAndSplit will now have a more pronounced effect.
Last night, I moved the first wfProfileIn to the top of Setup.php. It turns out loading code is taking about 30% of the profiled time (for [[Anime]] again). I was able to reduce that figure a little bit by making a few more files conditionally included. But my ISP is down so I couldn't commit it.
Maybe we should try PHPA:
http://www.php-accelerator.co.uk/
The rewrite doesn't seem to have made a significant impact on speed either way compared with my initial backport of Tim's bits, but the code's IMHO cleaner and I fixed some bugs while I was in there:
- initial spaces in the title of a link with a namespace are now
trimmed, so [[Wikipedia:_Oops]] now maps correctly to [[Wikipedia:Oops]] instead of generating a technically illegal title with initial whitespace in cur_title.
What about [[Wikipedia:__Oops]]?
- 'media' is treated as a localizable pseudo-namespace like 'special',
and can be adjusted in the language files
- media links should now go into the imagelinks table and show up in the
image backlinks
- on the off chance someone makes a link in the form [[:media:foo.jpg]],
the link won't turn into a misguided attempt to link to an article called "Media:foo.jpg".
- inline language links in the form [[:fr:lien interwiki]] finally work
- certain illegal links that were vanishing from the output completely
are now rendered as plaintext (such as "[[ ]]")
Another behavior change that could be changed back if people don't think it's a good idea:
- on links with the initial colon, the colon now isn't displayed in the
default link text
- 'class="internal"' removed from normal inline links, it just wastes
bandwidth without doing anything useful
All sounds good to me.
-- Tim Starling
On Fri, 24 Oct 2003, Tim Starling wrote:
Maybe we should try PHPA:
[fr] Nous l'utilisons déjà. :) [eo] Ni jam uzas ghin. :)
- initial spaces in the title of a link with a namespace are now
trimmed, so [[Wikipedia:_Oops]] now maps correctly to [[Wikipedia:Oops]] instead of generating a technically illegal title with initial whitespace in cur_title.
What about [[Wikipedia:__Oops]]?
[eo] Sinsekvo de spacoj jam estas unuigita, do estu: [fr] Selon ma memoire, une sequence des espaces serait unifiée en une espace:
'Wikipedia:__Oops' -> 'Wikipedia:_Oops' -> (4, 'Oops')
-- brion vibber (brion @ pobox.com)
Brion Vibber wrote:
On Fri, 24 Oct 2003, Tim Starling wrote:
Maybe we should try PHPA:
[fr] Nous l'utilisons déjà. :) [eo] Ni jam uzas ghin. :)
Ah. I should probably get that on my test system then.
- initial spaces in the title of a link with a namespace are now
trimmed, so [[Wikipedia:_Oops]] now maps correctly to [[Wikipedia:Oops]] instead of generating a technically illegal title with initial whitespace in cur_title.
What about [[Wikipedia:__Oops]]?
[eo] Sinsekvo de spacoj jam estas unuigita, do estu: [fr] Selon ma memoire, une sequence des espaces serait unifiée en une espace:
'Wikipedia:__Oops' -> 'Wikipedia:_Oops' -> (4, 'Oops')
Okay, I just did a test and it looks like your memory serves you correctly. Currently, [[Wikipedia:_____Oops]] will take you a page titled [[Wikipedia:_Oops]]. I imagine when your latest fix is implemented, it will become [[Wikipedia:Oops]].
-- ~~~~
Cool. Now that the algorithm is cleaned up, when do we rewrite it in assembler?
Louis (who is mostly kidding since the database dips wouldn't go any faster)
Tim Starling wrote:
I just doubled the speed of the PHP parser.
In my test page ([[Anime]], ~60 links with half broken), I cut the time for replaceInternalLinks from 800ms to 350ms, and the time for Article::view from 1310 to 610ms.
This was acheived by eliminating redundant calls to secureAndSplit, using static variables for constants, and catering to PHP oddities such as the fact that === is slower than ==.
Okay, you may shower me with praise now.
-- Tim Starling.
Tim Starling wrote:
Okay, you may shower me with praise now.
Dude, you rock.
I hereby decree, in my usual authoritarian and bossy manner, that today (10/31) shall forever be known as Tim Starling Day. Wikipedians of the distant future will marvel at the day when the new parsing algorithm dawned upon us. Tonight at dinner, every Wikipedian should say a toast to Tim and his many inventions.
In countries that celebrate Halloween, children will first say "Trick or Treat" and then, when they get the candy, they will say "Secure and Split" and run away, in honor of Tim's work in this area.
See also: http://en.wikipedia.org/wiki/Wikipedia%3AMagnus_Manske_Day
(Some may ask: but when is Brion Vibber day? Ah, but you should know by now: *every day* is Brion Vibber day!)
--Jimbo
wikitech-l@lists.wikimedia.org