Changing when the parsing is done? - Wikitech-l

13 Aug 2003


      I've been working off and on on a blogging script (in Perl), and I implemented 
some wikimarkup in it. Under stress tests, it became apparent that I was 
going about it the wrong way, as every time a page was viewed, the markup had 
to be parsed.
My first thought was naturally to use some sort of caching system, but that 
would take up an awful lot of space on my little server, and the speedup 
wouldn't really justify it in my case.
What I did instead was parse it when it was saved, and then if I wanted to go 
back and edit it, the script would simply "de-parse" it for presentation to 
me in the editing form. A couple lines out of my script as an example:
$body =~ s/<em><strong>(.*?)</em></strong>/''''''$1'''''/g;
$body =~ s/<a href="(.*?)">(.*?)</a>/[$1 $2]/g;
By now, I was wondering how the Wikipedia software was handling this. Turns 
out it's storing the wikimarkup, not the HTML, and parsing it in the viewing 
code.
By now it's probably obvious where I'm going with this. Could one of these 
methods (either storing a parsed and non-parsed version or the approach I 
took with "de-parsing") be used for some performance gain on Wikipedia's 
webserver?