Ivan Krstic wrote:
Brion Vibber wrote:
The parser cache iirc does have to parse _differently_ for users with different options, so I don't know how much duplication there is.
This is probably worth looking into. With duplication and constant growth working against you, 40GB isn't as much as it seems. I'm more interested in your opinion about whether the one parser hit on edit would solve the parser performance issues altogether, since it takes any guesswork out of the picture and scales very easily.
I'm sure it would help some, though I don't know how much as I don't have the hit ratio numbers.
Making it have to parse on edit _only_ would require a number of specific things though such as:
* Parsing has to be independent of user options and settings. Things like math rendering options need to alter only a later stage of output than what is cached.
* Variable substitutions like the current date and number of articles must be kept for later. Note that people like to use the date variables in links for 'X of the day' type features; this causes some niggling trouble with link consistency.
* Template substitution must similarly be delayed. If the templates are pre-parsed though it could be easy to just grab the template's parse tree and stick it in to the appropriate place. Caveat: templates have parameters, with the same kinds of problems as variable substitutions (they are often used in links)
And of course, if we change the parsing rules we need to be able to clear out the cache, either through automatic versioning or some other scheme. I tend to favor versioning and checking the cache for currentness at load time and saving it then if necessary; that's how we deal with most such things.
-- brion vibber (brion @ pobox.com)