On 11/29/07, werdna@svn.wikimedia.org werdna@svn.wikimedia.org wrote:
Revision: 27946 Author: werdna Date: 2007-11-29 10:42:48 +0000 (Thu, 29 Nov 2007)
Log Message:
Prevent the parser function #ifexist being used more than a certain amount on error pages. The maximum number of #ifexist queries can be set in $wgMaxIfExistCount, which is, by default, 100. Any more uses over here will default to the "else" text. Done to discourage templates like Template:highrfc-loop on enwiki, which willingly does something like 50 database queries for a template that is used for many users on one page. Clearly suboptimal from a performance perspective.
Is there any way, instead, to defer these so they can be done all at once? Tim, do you have any thoughts on whether this is reasonable? (I would suspect not, but it doesn't hurt to ask.)
Simetrical schreef:
On 11/29/07, werdna@svn.wikimedia.org werdna@svn.wikimedia.org wrote:
Revision: 27946 Author: werdna Date: 2007-11-29 10:42:48 +0000 (Thu, 29 Nov 2007)
Log Message:
Prevent the parser function #ifexist being used more than a certain amount on error pages. The maximum number of #ifexist queries can be set in $wgMaxIfExistCount, which is, by default, 100. Any more uses over here will default to the "else" text. Done to discourage templates like Template:highrfc-loop on enwiki, which willingly does something like 50 database queries for a template that is used for many users on one page. Clearly suboptimal from a performance perspective.
Is there any way, instead, to defer these so they can be done all at once? Tim, do you have any thoughts on whether this is reasonable? (I would suspect not, but it doesn't hurt to ask.)
Wow, wow, wow, ParserFunctions has a DB query in a function that can potentially be called hundreds of times? Domas would go nuts if he knew that... Ifexist should do the following:
- Add all requested titles to a LinkBatch [1] - Return a marker like marker-ifexist-0 and store the ifTrue and ifFalse values in an array - In a ParserAfterTidy hook, run LinkBatch::execute() (checks for existence) and str_replace() all markers
Roan Kattouw (Catrope)
On 29/11/2007, Roan Kattouw roan.kattouw@home.nl wrote:
Wow, wow, wow, ParserFunctions has a DB query in a function that can potentially be called hundreds of times? Domas would go nuts if he knew that...
http://commons.wikimedia.org/wiki/Image:Server-kitty.jpg
- d.
On 11/29/07, Roan Kattouw roan.kattouw@home.nl wrote:
- Add all requested titles to a LinkBatch [1]
- Return a marker like marker-ifexist-0 and store the ifTrue and ifFalse
values in an array
- In a ParserAfterTidy hook, run LinkBatch::execute() (checks for
existence) and str_replace() all markers
Of course, it can't be ParserAfterTidy: the branches can contain arbitrary wikicode. Instead, the parser must
1) (in preprocessor stage) Add each title to a LinkBatch and replace with a marker,
2) (after preprocessor stage) Resolve the LinkBatch to figure out which path to follow for each,
3) (still just after preprocessor stage) Reinvoke the preprocessor on the path that was followed to produce wikitext for the main parsing pass.
Or alternatively,
1) (in preprocessor stage) Add each title to a LinkBatch and replace with a marker,
2) (still in preprocessor stage) Preprocess *both* paths,
3) (after preprocessor stage) Resolve the LinkBatch to figure out which path to follow for each,
4) (still after preprocessor stage) Insert the already-preprocessed wikitext for that path while discarding the other one,
whichever is more efficient. Nested {{#ifexist:}} have to be considered: they'd probably work a lot better with my second plan, with steps (1) and (2) being executed together and recursively, and step (4) being executed recursively later.
Simetrical wrote:
On 11/29/07, Roan Kattouw roan.kattouw@home.nl wrote:
- Add all requested titles to a LinkBatch [1]
- Return a marker like marker-ifexist-0 and store the ifTrue and ifFalse
values in an array
- In a ParserAfterTidy hook, run LinkBatch::execute() (checks for
existence) and str_replace() all markers
Of course, it can't be ParserAfterTidy: the branches can contain arbitrary wikicode. Instead, the parser must
- (in preprocessor stage) Add each title to a LinkBatch and replace
with a marker,
- (after preprocessor stage) Resolve the LinkBatch to figure out
which path to follow for each,
- (still just after preprocessor stage) Reinvoke the preprocessor on
the path that was followed to produce wikitext for the main parsing pass.
Or alternatively,
- (in preprocessor stage) Add each title to a LinkBatch and replace
with a marker,
(still in preprocessor stage) Preprocess *both* paths,
(after preprocessor stage) Resolve the LinkBatch to figure out
which path to follow for each,
- (still after preprocessor stage) Insert the already-preprocessed
wikitext for that path while discarding the other one,
whichever is more efficient. Nested {{#ifexist:}} have to be considered: they'd probably work a lot better with my second plan, with steps (1) and (2) being executed together and recursively, and step (4) being executed recursively later.
Although your suggested implementation is still not quite how I'd do it, I think you're getting the idea of how complicated it is. I told the guy in #mediawiki who wanted to do 4000 #ifexist calls on a page that optimising this case would take days, and that he should just forget it and find some other way to do what he was doing. Unless someone comes up with some really, really valuable applications that can't be done any other way, I'm happier with just having a limit. If you have days to spend on optimisation work, I have some other ideas for where you should be spending them.
-- Tim Starling
On 11/30/07, Simetrical Simetrical+wikilist@gmail.com wrote:
Is there any way, instead, to defer these so they can be done all at once? Tim, do you have any thoughts on whether this is reasonable? (I would suspect not, but it doesn't hurt to ask.)
I thought of this, But "fix the potential denial of service" was a slightly bigger issue than "speed up the abuse of a parser function".
Wow, wow, wow, ParserFunctions has a DB query in a function that can potentially be called hundreds of times? Domas would go nuts if he knew that...
You should have heard Tim
Ifexist should do the following:
- Add all requested titles to a LinkBatch [1]
- Return a marker like marker-ifexist-0 and store the ifTrue and ifFalse
values in an array
- In a ParserAfterTidy hook, run LinkBatch::execute() (checks for
existence) and str_replace() all markers
Sounds somewhat sensible. Assuming we do want to allow users to make "does this exist?" queries for more than 100 pages...
On 11/29/07, Roan Kattouw roan.kattouw@home.nl wrote:
- Add all requested titles to a LinkBatch [1]
- Return a marker like marker-ifexist-0 and store the ifTrue and ifFalse
values in an array
- In a ParserAfterTidy hook, run LinkBatch::execute() (checks for
existence) and str_replace() all markers
Ah, of course. Strip markers. That might not play so well with extensions that want to prematurely unstrip output, but it could probably be worked out somehow, if someone wanted to try.
On 11/29/07, Andrew Garrett andrew@epstone.net wrote:
I thought of this, But "fix the potential denial of service" was a slightly bigger issue than "speed up the abuse of a parser function".
You know, to be fair, there are multiple pages *in the software* that do existence checks for many Titles individually, without LinkBatches, meaning possibly thousands of queries per page. It's not the *end of the world*, performance-wise: the queries are all very small and fast, const queries. But yes, it should be fixed, one way or another, and I can't argue that this quick fix is worse than nothing.
Sounds somewhat sensible. Assuming we do want to allow users to make "does this exist?" queries for more than 100 pages...
Why would we ever not? There's nothing abusive about it, it's a perfectly logical and reasonable use of the tool. It just happens to be somewhat inefficient at present. If someone decides to spend the time fixing that at some point, it would be nice to be able to remove the limit.
Sounds somewhat sensible. Assuming we do want to allow users to make "does this exist?" queries for more than 100 pages...
What do you think happens when [[enwp:List of English monarchs]] is rendered? There are 526 links on that page. The parser has to check each page for existence so it knows whether to render a blue link or a red one. I think I can assume the parser uses a LinkBatch for that. (It should be noted that the parser also inserts link markers that are replaced later.)
Roan Kattouw (Catrope)
wikitech-l@lists.wikimedia.org