Aquestion about templates parsing and caching

List overview All Threads
Download

newer

older

Good mediawiki custom extension...

Re: [Wikitech-l] [Mediawiki-l]...

Alex Brollo

8 Apr 2011 8 Apr '11

9:11 p.m.

I'd like to know something more about template parsing/caching for performance issues.

My question is: when a template is called, it's wikicode, I suppose, is parsed and translated into "something running" - I can't imagine what precisely, but I don't care so much about (so far :-) ). If a second call comes to the server for the same template, but with different parameters, the template is parsed again from scratch or something from previous parsing is used again, so saving a little bit of server load?

If the reply is "yes", t.i. if the "running code" of the whole template is somehow saved and cached, ready to be used again with new parameters, perhaps it could be a good idea to build templates as "librares of different templates", using the name of the template as a "library name" and a parameter as the name of "specific function"; a simple #switch could be used to use the appropriate code of that "specific function".

On the contrary, if nothing is saved, there would be good reasons to keep the template code as simple as possible, and this idea of "libraries" would be a bad one.

Alex

Show replies by date

Brion Vibber

8 Apr 8 Apr

9:37 p.m.

On Fri, Apr 8, 2011 at 2:11 PM, Alex Brollo alex.brollo@gmail.com wrote:

...

I'd like to know something more about template parsing/caching for performance issues.

My question is: when a template is called, it's wikicode, I suppose, is parsed and translated into "something running" - I can't imagine what precisely, but I don't care so much about (so far :-) ). If a second call comes to the server for the same template, but with different parameters, the template is parsed again from scratch or something from previous parsing is used again, so saving a little bit of server load?

Currently there's not really a solid intermediate parse structure in MediaWiki (something we hope to change; I'll be ramping up some documentation for the soon-to-begin mega parser redo project soon).

Approximately speaking... In the current system, the page is preprocessed into a partial preprocessor tree which identifies certain structure boundaries (for templates and function & tag-hook extensions); templates and some hooks get expanded in, then it's all basically flattened back to wikitext. Then the main parser takes over, turning the whole wikitext document into HTML output.

I believe we do locally (in-process) cache the preprocessor structure for pages and templates, so multiple use of the same template won't incur as much preprocessor work. But, the preprocessor parsing is usually one of the fastest parts of the whole parse.

If the reply is "yes", t.i. if the "running code" of the whole template is

...

somehow saved and cached, ready to be used again with new parameters, perhaps it could be a good idea to build templates as "librares of different templates", using the name of the template as a "library name" and a parameter as the name of "specific function"; a simple #switch could be used to use the appropriate code of that "specific function".

I think for the most part, it'll be preferable to only have to work with the functions that are needed, rather than fetching a large number of unneeded functions at once. Even if it's pre-parsed, loading unneeded stuff means more CPU used, more memory used, more network bandwidth used.

But being able to bundle together related things as a unit that can be distributed together would be very nice, and should be considered for future work on new templating and gadget systems.

-- brion

Daniel Friesen

10:25 p.m.

On 11-04-08 02:37 PM, Brion Vibber wrote:

...

On Fri, Apr 8, 2011 at 2:11 PM, Alex Brolloalex.brollo@gmail.com wrote:

...
I'd like to know something more about template parsing/caching for performance issues.

My question is: when a template is called, it's wikicode, I suppose, is parsed and translated into "something running" - I can't imagine what precisely, but I don't care so much about (so far :-) ). If a second call comes to the server for the same template, but with different parameters, the template is parsed again from scratch or something from previous parsing is used again, so saving a little bit of server load?

Currently there's not really a solid intermediate parse structure in MediaWiki (something we hope to change; I'll be ramping up some documentation for the soon-to-begin mega parser redo project soon).

Approximately speaking... In the current system, the page is preprocessed into a partial preprocessor tree which identifies certain structure boundaries (for templates and function& tag-hook extensions); templates and some hooks get expanded in, then it's all basically flattened back to wikitext. Then the main parser takes over, turning the whole wikitext document into HTML output.

I believe we do locally (in-process) cache the preprocessor structure for pages and templates, so multiple use of the same template won't incur as much preprocessor work. But, the preprocessor parsing is usually one of the fastest parts of the whole parse.

-- brion

I could swear we locally cache template wikitext, and save preprocessed data to the object cache. Least I think thats what I gathered last time I read the code.

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

Daniel Friesen

10:26 p.m.

On 11-04-08 02:37 PM, Brion Vibber wrote:

...

On Fri, Apr 8, 2011 at 2:11 PM, Alex Brolloalex.brollo@gmail.com wrote:

...
I'd like to know something more about template parsing/caching for performance issues.

My question is: when a template is called, it's wikicode, I suppose, is parsed and translated into "something running" - I can't imagine what precisely, but I don't care so much about (so far :-) ). If a second call comes to the server for the same template, but with different parameters, the template is parsed again from scratch or something from previous parsing is used again, so saving a little bit of server load?

Currently there's not really a solid intermediate parse structure in MediaWiki (something we hope to change; I'll be ramping up some documentation for the soon-to-begin mega parser redo project soon).

Approximately speaking... In the current system, the page is preprocessed into a partial preprocessor tree which identifies certain structure boundaries (for templates and function& tag-hook extensions); templates and some hooks get expanded in, then it's all basically flattened back to wikitext. Then the main parser takes over, turning the whole wikitext document into HTML output.

I believe we do locally (in-process) cache the preprocessor structure for pages and templates, so multiple use of the same template won't incur as much preprocessor work. But, the preprocessor parsing is usually one of the fastest parts of the whole parse.

-- brion

I could swear we locally cache template wikitext, and save preprocessed data to the object cache. Least I think thats what I gathered last time I read the code.

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

-- ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

Platonides

11:32 p.m.

Daniel Friesen wrote:

...

...
I believe we do locally (in-process) cache the preprocessor structure for pages and templates, so multiple use of the same template won't incur as much preprocessor work. But, the preprocessor parsing is usually one of the fastest parts of the whole parse.

I could swear we locally cache template wikitext, and save preprocessed data to the object cache. Least I think thats what I gathered last time I read the code.

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name%5DYes

Yes. Calling a template twice will only fetch the text once, won't increase the 'used templates' counter... Preprocessing of wikitext over a threshold is cached serialized (it's easier to reprocess if it's too small).

On the original question: The tree will be reused, but it has to be expanded again. It's not clear that you gain by using a library since you will pay the library costs on all articles using it. Templates should be kept simple (yes, enwiki is particularly bad at that).

In early 2007, eswiki implemented a library template (Plantilla:Interproyecto) which was used for adding any interwikis to sister projects. It caused server problems and got disabled by the sysadmins.

Roan Kattouw

10 Apr 10 Apr

7:59 p.m.

2011/4/9 Platonides Platonides@gmail.com:

...

Yes. Calling a template twice will only fetch the text once, won't increase the 'used templates' counter... Preprocessing of wikitext over a threshold is cached serialized (it's easier to reprocess if it's too small).

To clarify: there's an in-process cache, like Brion said, so a template that is used twice on the same page is only fetched and preprocessed once. However, this only applies to templates called with no parameters. If the template is passed parameters, this in-process cache won't be used, even if the same set of parameters is used twice.

What we store in memcached is a serialized version of the preprocessor XML tree, keyed on the MD5 hash of the wikitext input, unless it's too small, like Platonides said. This means that if the exact same input is fed to the preprocessor twice, it will do part of the work only one and cache the intermediate result.

Roan Kattouw (Catrope)

Platonides

8:32 p.m.

Roan Kattouw wrote:

...

2011/4/9 Platonides Platonides@gmail.com:

...
Yes. Calling a template twice will only fetch the text once, won't increase the 'used templates' counter... Preprocessing of wikitext over a threshold is cached serialized (it's easier to reprocess if it's too small).

To clarify: there's an in-process cache, like Brion said, so a template that is used twice on the same page is only fetched and preprocessed once. However, this only applies to templates called with no parameters. If the template is passed parameters, this in-process cache won't be used, even if the same set of parameters is used twice.

I don't think so. The preprocess-to-tree is always the same, regardless of the parameters, and it is always used. It is the expansion where parameter change. I don't see that cache for parameterless templates, maybe it's the mTplExpandCache?

Roan Kattouw

8:45 p.m.

2011/4/10 Platonides Platonides@gmail.com:

...

I don't think so. The preprocess-to-tree is always the same, regardless of the parameters, and it is always used. It is the expansion where parameter change. I don't see that cache for parameterless templates, maybe it's the mTplExpandCache?

Could be, I don't know. This is just something Tim told me like two years ago, it might not even be accurate anymore.

Roan Kattouw (Catrope)

Tim Starling

11 Apr 11 Apr

1:13 a.m.

New subject: A question about templates parsing and caching

On 11/04/11 06:32, Platonides wrote:

...

Roan Kattouw wrote:

...
2011/4/9 PlatonidesPlatonides@gmail.com:

...
Yes. Calling a template twice will only fetch the text once, won't increase the 'used templates' counter... Preprocessing of wikitext over a threshold is cached serialized (it's easier to reprocess if it's too small).

To clarify: there's an in-process cache, like Brion said, so a template that is used twice on the same page is only fetched and preprocessed once. However, this only applies to templates called with no parameters. If the template is passed parameters, this in-process cache won't be used, even if the same set of parameters is used twice.

I don't think so. The preprocess-to-tree is always the same, regardless of the parameters, and it is always used. It is the expansion where parameter change. I don't see that cache for parameterless templates, maybe it's the mTplExpandCache?

The stages are basically preprocessToObj() -> expand() -> internalParse().

preprocessToObj() is the parsing stage of the preprocessor. It is fast and easily cachable. It produces an object-based representation of the parse tree of the text of single article or template. This object representation is stored in a cache ($wgParser->mTplDomCache) which exists for the duration of a single article parse operation. It depends only on a single input string, it does not expand templates.

There is a persistent cache which stores the result of preprocessToObj() across multiple requests, however this provides only a small benefit.

expand() is slow. Its function is to take the parse tree of an article, and to expand the template invocations and parser functions that it sees to produce preprocessed wikitext.

There is a cache of the expand() step which persists for the duration of a single parse operation ($wgParser->mTplExpandCache), but it only operates on template invocations with no arguments, like {{!}}. It's possible in theory to cache the expand() results for templates with arguments, but I didn't do it because it looked like it would be difficult to efficiently hash the parse tree of the arguments in order to retrieve the correct entry from the cache. This would be a good project for future development work.

I think it's fair to constrain parser functions to require that they return the same result for the same arguments, during a single parse operation. That's all you need to do to have an effective expand() cache.

However, the benefit would be limited due to the dominance of infoboxes and navboxes which appear only once in each article. It's not guaranteed that the result of expand() will be the same when done at different times or in different articles.

internalParse() takes preprocessed wikitext and produces HTML. The final output is cached by the parser cache.

-- Tim Starling

Daniel Friesen

1:51 a.m.

New subject: A question about templates parsing and caching

On 11-04-10 06:13 PM, Tim Starling wrote:

...

[...] I think it's fair to constrain parser functions to require that they return the same result for the same arguments, during a single parse operation. That's all you need to do to have an effective expand() cache. [...] -- Tim Starling

That /might/ work nicely for #ask. However Counter, ArrayExtension, Variables, Random, etc... won't play nicely with that.

Perhaps a way for parser functions to opt-in or opt-out. So we can exclude functions that .

Side thought... why a #switch library? What happened to the old {{Foo/{{{1}}}|...}} trick?

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

Alex Brollo

7:05 a.m.

New subject: A question about templates parsing and caching

2011/4/11 Daniel Friesen lists@nadir-seen-fire.com

...

Side thought... why a #switch library? What happened to the old {{Foo/{{{1}}}|...}} trick?

Simply, {{Foo/{{{1}}}|...}} links to different pages, while {{Foo|{{{1}}}|...}} points to the same page. I had been frustrated when I tried to use Labeled Section Transclusion to build template libraries :-), that would be an excellent way to build "collection of objects" into a wiki page, both of "methods" and "attributes"... ... but #lst doesn't parse raw wiki code "from scratch". If it would (t.i.: if #lst would read wiki code "as it is", before any parsing of it, ignoring at all the code outside labelled section: t.i. ignoring noinclude, html comment tags... anything) interesting scenarios would raise.

But, if there's no performance gain with {{Foo|{{{1}}}|...}} trick, I'll use {{Foo/{{{1}}}|...}} for sure. KISS is always a good guide line. :-)

Alex

Daniel Friesen

7:49 a.m.

New subject: A question about templates parsing and caching

On 11-04-11 12:05 AM, Alex Brollo wrote:

...

2011/4/11 Daniel Friesenlists@nadir-seen-fire.com

...
Side thought... why a #switch library? What happened to the old {{Foo/{{{1}}}|...}} trick?

Simply, {{Foo/{{{1}}}|...}} links to different pages, while {{Foo|{{{1}}}|...}} points to the same page. I had been frustrated when I tried to use Labeled Section Transclusion to build template libraries :-), that would be an excellent way to build "collection of objects" into a wiki page, both of "methods" and "attributes"... ... but #lst doesn't parse raw wiki code "from scratch". If it would (t.i.: if #lst would read wiki code "as it is", before any parsing of it, ignoring at all the code outside labelled section: t.i. ignoring noinclude, html comment tags... anything) interesting scenarios would raise.

But, if there's no performance gain with {{Foo|{{{1}}}|...}} trick, I'll use {{Foo/{{{1}}}|...}} for sure. KISS is always a good guide line. :-)

Alex

Pointing to different pages is essentially the point of the trick. [[Template:Library]] = {{#ifexist:Library/{{{1}}}|{{Library/{{{1}}}|...}}|There is no library function by the name "{{{1}}}".}} [[Template:Library/a]] = Do a [[Template:Library/b]] = Do b

{{library|a}} => "Do a" {{library|b}} => "Do b"

It essentially works the same as: [[Template:Library]] = {{#switch:{{{1}}}|a=Do a|b=Do b|There is no library function by the name "{{{1}}}".}}

Except you don't create an obscenely large preprocessed hierarchy which is cloned in it's entirety to multiple places and expanded multiple times just to get access to multiple pieces of the library.

Though, when we're talking about stuff this complex... that line about using a REAL programming language comes into play... Would be nice if there was some implemented-in-php language script language we could use that would work on any wiki. I "had" a project playing around with that idea but it's dead.

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

Alex Brollo

7:59 a.m.

New subject: A question about templates parsing and caching

2011/4/11 Daniel Friesen lists@nadir-seen-fire.com

...

Though, when we're talking about stuff this complex... that line about using a REAL programming language comes into play... Would be nice if there was some implemented-in-php language script language we could use that would work on any wiki. I "had" a project playing around with that idea but it's dead.

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

Are you a wikisource contributor? If you are, I guess that you considered too this syntax, pointing to what I wrote into my last message: {{#section:Foo|{{{1}}}}} but... it refuses to run, if section {{{1}}} of page Foo is a "method" (while its runs obviuosly if section is an "attribute"). :-)

Alex

Neil Harris

11:20 a.m.

New subject: A question about templates parsing and caching

On 11/04/11 02:51, Daniel Friesen wrote:

...

On 11-04-10 06:13 PM, Tim Starling wrote:

...
[...] I think it's fair to constrain parser functions to require that they return the same result for the same arguments, during a single parse operation. That's all you need to do to have an effective expand() cache. [...] -- Tim Starling

That /might/ work nicely for #ask. However Counter, ArrayExtension, Variables, Random, etc... won't play nicely with that.

Perhaps a way for parser functions to opt-in or opt-out. So we can exclude functions that .

Side thought... why a #switch library? What happened to the old {{Foo/{{{1}}}|...}} trick?

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

I would have thought this could be done automatically, by designating templates as being "pure" or "impure" (in the sense of pure and impure functions) -- something which could be done recursively all the way down to basic parser functions, magic words, etc., which would have to be designated pure or impure by hand as part of the software implementation.

-- Neil

Andrew Garrett

10 Apr 10 Apr

11:43 p.m.

On Mon, Apr 11, 2011 at 5:59 AM, Roan Kattouw roan.kattouw@gmail.com wrote:

...

What we store in memcached is a serialized version of the preprocessor XML tree, keyed on the MD5 hash of the wikitext input, unless it's too small, like Platonides said. This means that if the exact same input is fed to the preprocessor twice, it will do part of the work only one and cache the intermediate result.

Yes, I implemented this with Tim's help to try to cut down on the CPU load caused by lots of Cite templates, IIRC. If I recall correctly, the performance benefit was not particularly substantial.

-- Andrew Garrett http://werdn.us/

Alex Brollo

11 Apr 11 Apr

12:06 a.m.

2011/4/11 Andrew Garrett agarrett@wikimedia.org

...

On Mon, Apr 11, 2011 at 5:59 AM, Roan Kattouw roan.kattouw@gmail.com wrote:

...
What we store in memcached is a serialized version of the preprocessor XML tree, keyed on the MD5 hash of the wikitext input, unless it's too small, like Platonides said. This means that if the exact same input is fed to the preprocessor twice, it will do part of the work only one and cache the intermediate result.

Yes, I implemented this with Tim's help to try to cut down on the CPU load caused by lots of Cite templates, IIRC. If I recall correctly, the performance benefit was not particularly substantial.

Ok, coming bac to my idea of building small "libraries of work-specific templates into a unique template" doesn't seems a particularly brilliant one; something that can be done only if templates merged into one are simple, and few, and only for contributor's comfort, if any. Thanks for your interest!

Alex

4846

Age (days ago)

4849

Last active (days ago)

wikitech-l@lists.wikimedia.org

15 comments

8 participants

tags (0)

participants (8)

Alex Brollo
Andrew Garrett
Brion Vibber
Daniel Friesen
Neil Harris
Platonides
Roan Kattouw
Tim Starling