Expensive parser function count

List overview All Threads
Download

newer

older

Ratio of interwiki bots' edits

What % of WMF is en:wp?

Alex Brollo

5 Jan 2011 5 Jan '11

8:07 p.m.

Browsing the html code of source pages, I found this statement into a html comment:

*Expensive parser function count: 0/500*

I'd like to use this statement to evaluate "lightness" of a page, mainly testing the expensiveness of templates into the page but: in your opinion, given that the best would be a 0/500 value, what are limits for a good, moderately complex, complex page, just to have a try to work about? What is a really alarming value that needs fast fixing?

And - wouldn't a good idea to display - just with a very small mark or string into a corner of the page - this datum into the page, allowing a fast feedback?

Alex

Show replies by date

Alex

5 Jan 5 Jan

10:37 p.m.

On 1/5/2011 8:07 PM, Alex Brollo wrote:

...

Browsing the html code of source pages, I found this statement into a html comment:

*Expensive parser function count: 0/500*

I'd like to use this statement to evaluate "lightness" of a page, mainly testing the expensiveness of templates into the page but: in your opinion, given that the best would be a 0/500 value, what are limits for a good, moderately complex, complex page, just to have a try to work about? What is a really alarming value that needs fast fixing?

And - wouldn't a good idea to display - just with a very small mark or string into a corner of the page - this datum into the page, allowing a fast feedback?

The expensive parser function count only counts the use of a few functions when they do a DB query, PAGESINCATEGORY, PAGESIZE, and #ifexist are the only ones I know of. While a page that uses a lot of these would likely be slow, these aren't heavily used functions, and a page might be slow even if it uses zero.

The other 3 limits: Preprocessor node count, Post-expand include size, and Template argument size are probably better for a measurement of complexity, though I don't know what a "typical" value for these might be.

-- Alex (wikipedia:en:User:Mr.Z-man)

Alex Brollo

6 Jan 6 Jan

9:04 a.m.

2011/1/6 Alex mrzmanwiki@gmail.com

...

On 1/5/2011 8:07 PM, Alex Brollo wrote:

The expensive parser function count only counts the use of a few functions when they do a DB query, PAGESINCATEGORY, PAGESIZE, and #ifexist are the only ones I know of. While a page that uses a lot of these would likely be slow, these aren't heavily used functions, and a page might be slow even if it uses zero.

The other 3 limits: Preprocessor node count, Post-expand include size, and Template argument size are probably better for a measurement of complexity, though I don't know what a "typical" value for these might be.

Thanks. I would appeciate an algoritm to evaluate those parameters together, weighting them to have a significant, sigle "heavyness index." of the page. Then, removing or adding code would easily allow to mark the critical code.

Alex ("the other one") :-)

Aryeh Gregor

3:50 p.m.

On Wed, Jan 5, 2011 at 8:07 PM, Alex Brollo alex.brollo@gmail.com wrote:

...

Browsing the html code of source pages, I found this statement into a html comment:

*Expensive parser function count: 0/500*

I'd like to use this statement to evaluate "lightness" of a page, mainly testing the expensiveness of templates into the page but: in your opinion, given that the best would be a 0/500 value, what are limits for a good, moderately complex, complex page, just to have a try to work about? What is a really alarming value that needs fast fixing?

A really alarming value that needs fast fixing would be, approximately speaking, 501 or higher. That's why the maximum is there. We don't leave fixing this kind of thing to users.

...

And - wouldn't a good idea to display - just with a very small mark or string into a corner of the page - this datum into the page, allowing a fast feedback?

No. It's only meant for debugging when you run over the limit and the page stops working. It can help you track down why the page isn't working, and isolate the templates that are causing the problem. The same goes for the other limits.

If you want to detect whether a page is rendering too slowly, just try action=purge and see how long it takes. If it takes more than a few seconds, you probably want to improve it, because that's how long it will take to render for a lot of logged-in users (parser cache hides this if you have default preferences, including for all anons). We're forced to use artificial metrics when imposing automatic limits on page rendering only because the time it takes to parse a page isn't reliable, and using it as an automatic limit would make parsing non-deterministic. For manual inspection, you should just use time to parse, not any artificial metrics.

Tim Starling

10 Jan 10 Jan

8:18 p.m.

On 07/01/11 07:50, Aryeh Gregor wrote:

...

On Wed, Jan 5, 2011 at 8:07 PM, Alex Brollo alex.brollo@gmail.com wrote:

...
Browsing the html code of source pages, I found this statement into a html comment:

*Expensive parser function count: 0/500*

I'd like to use this statement to evaluate "lightness" of a page, mainly testing the expensiveness of templates into the page but: in your opinion, given that the best would be a 0/500 value, what are limits for a good, moderately complex, complex page, just to have a try to work about? What is a really alarming value that needs fast fixing?

A really alarming value that needs fast fixing would be, approximately speaking, 501 or higher. That's why the maximum is there. We don't leave fixing this kind of thing to users.

I think the maximum was set to 100 initially, and raised to 500 due to user complaints. I'd be completely happy if users fixed all the templates that caused pages to use more than 100, then we could put the limit back down.

-- Tim

MZMcBride

8:58 p.m.

Tim Starling wrote:

...

On 07/01/11 07:50, Aryeh Gregor wrote:

...
On Wed, Jan 5, 2011 at 8:07 PM, Alex Brollo alex.brollo@gmail.com wrote:

...
Browsing the html code of source pages, I found this statement into a html comment:

*Expensive parser function count: 0/500*

I'd like to use this statement to evaluate "lightness" of a page, mainly testing the expensiveness of templates into the page but: in your opinion, given that the best would be a 0/500 value, what are limits for a good, moderately complex, complex page, just to have a try to work about? What is a really alarming value that needs fast fixing?

A really alarming value that needs fast fixing would be, approximately speaking, 501 or higher. That's why the maximum is there. We don't leave fixing this kind of thing to users.

I think the maximum was set to 100 initially, and raised to 500 due to user complaints. I'd be completely happy if users fixed all the templates that caused pages to use more than 100, then we could put the limit back down.

Doesn't it make much more sense to fix the underlying problem instead? Users shouldn't have to be concerned with the number of #ifexists on a page.

MZMcBride

Platonides

11 Jan 11 Jan

6:57 p.m.

MZMcBride wrote:

...

Doesn't it make much more sense to fix the underlying problem instead? Users shouldn't have to be concerned with the number of #ifexists on a page.

MZMcBride

Well, if someone wants to change #ifexist, they should change the parser (braceSubstitution) so that they can be done in parallel. So that if you have for instance: {{#ifexist: File:Flag of {{{1}}}.svg|<td>[[File:Flag of {{{1}}}.svg]]</td>}} {{#ifexist: File:Shield of {{{1}}}.svg|<td>[[File:Shield of {{{1}}}.svg]]</td>}}

They are performed in parallel, using one LinkBatch, instead of being two separated queries. Nested #ifexist and other cases would still need being checked separatedly, but it would substantially reduce the "#ifexist load". I think that most of them are even on the same "child level".

Alex Brollo

12 Jan 12 Jan

5:08 a.m.

2011/1/12 Platonides Platonides@gmail.com

...

MZMcBride wrote:

...
Doesn't it make much more sense to fix the underlying problem instead?

Users

...
shouldn't have to be concerned with the number of #ifexists on a page.

MZMcBride

Ok, now I feel myself much more comfortable. These my conclusions:

# I can feel myself free to test anything even if exotic. # I will pay attention to html rendering time when trying something exotic. # In the remote case that I really build something server-expensive, and such exotic thing "infects" largely wiki projects (a very remote case!), some sysop would see bad results of a bad idea and: ## will fix the parser code, if the idea is good, but the software manages it with a low level of efficience; ## will kill the idea, if the idea is server expensive and simply unuseful or wrong.

Alex

Alex Brollo

11 Jan 11 Jan

3:04 a.m.

2011/1/11 Tim Starling tstarling@wikimedia.org

...

On 07/01/11 07:50, Aryeh Gregor wrote:

...
On Wed, Jan 5, 2011 at 8:07 PM, Alex Brollo alex.brollo@gmail.com

wrote:

...
...
Browsing the html code of source pages, I found this statement into a

html

...
...
comment:

*Expensive parser function count: 0/500*

I think the maximum was set to 100 initially, and raised to 500 due to user complaints. I'd be completely happy if users fixed all the templates that caused pages to use more than 100, then we could put the limit back down.

Thanks Tim. So, implementing a simple js to show that value (and the other three data too) in small characters and into a border of the page into the page display is not completely fuzzy. As I told I hate to waste resources - any kind of them. It's a pity that those data are not saved into the xml dump. But I don't want to overload the servers just to get data about servers overloading. :-)

Just another question about resources. I can get the same result with an AJAX call or with a #lst (labeled section transclusion) call. Which one is lighter for servers in your opinion? Or - are they they more or less similar?

Alex

Aryeh Gregor

10:20 a.m.

On Tue, Jan 11, 2011 at 3:04 AM, Alex Brollo alex.brollo@gmail.com wrote:

...

Just another question about resources. I can get the same result with an AJAX call or with a #lst (labeled section transclusion) call. Which one is lighter for servers in your opinion? Or - are they they more or less similar?

Fewer HTTP requests is better, all else being equal. I don't know how LST works, but I imagine it's more efficient than doing a whole API call. (Although maybe not, for instance if the API is caching things and LST isn't.)

Overall, I'd advise you to do whatever minimizes user-visible latency. That directly improves things for your users, and is a decent proxy for server resource use. So use whichever method takes less time to fully render. This is rather more practical than trying to consult MediaWiki developers about every detail of your program's implementation, which is unlikely to be used widely enough to greatly affect server load anyway, and even if it were we couldn't necessarily give intelligent answers without knowing exactly what the program is doing and why.

Alex Brollo

10:55 a.m.

2011/1/11 Aryeh Gregor <Simetrical+wikilist@gmail.comSimetrical%2Bwikilist@gmail.com

...

Overall, I'd advise you to do whatever minimizes user-visible latency. That directly improves things for your users, and is a decent proxy for server resource use. So use whichever method takes less time to fully render. This is rather more practical than trying to consult MediaWiki developers about every detail of your program's implementation, which is unlikely to be used widely enough to greatly affect server load anyway, and even if it were we couldn't necessarily give intelligent answers without knowing exactly what the program is doing and why.

I'm already using your suggestion, today I replaced a complex test template from our village pump (replacing it with a link to a subpage, visited by interested users only) and, really, the difference in rendering the page village pump page was obviuos.

Probably the best is, to use any trick together, paying much more attention to widely used templates and frequently parsed pages than to exotic, rarely used ones. Unluckily for servers, the worse, heavier pages often are the most frequently parsed and parsed again too.. as "village pump" ones; nevertheless they are the most useful for community, so they deserve some servers load.

Nevertheless... sometime people tells me "don't us this hack, it is server overloading"... sometimes it isn't, or simply it is a undocumented, unproofed personal opinion.

Alex

Casey Brown

1:34 p.m.

On Tue, Jan 11, 2011 at 10:55 AM, Alex Brollo alex.brollo@gmail.com wrote:

...

I'm already using your suggestion, today I replaced a complex test template from our village pump (replacing it with a link to a subpage, visited by interested users only) and, really, the difference in rendering the page village pump page was obviuos.

That's good, but also keep in mind that, generally, you shouldn't worry too much about performance: http://en.wikipedia.org/wiki/WP:PERF. (Had to throw in the little disclaimer here. ;-))

-- Casey Brown Cbrown1023

Alex Brollo

2:07 p.m.

2011/1/11 Casey Brown lists@caseybrown.org

...

That's good, but also keep in mind that, generally, you shouldn't worry too much about performance: http://en.wikipedia.org/wiki/WP:PERF. (Had to throw in the little disclaimer here. ;-))

Yes I got this suggestion.... but when I try new tricks and new ideas, someone often tells me "Please stop! This is server overloading! It's terribly heavy!"; when I try to go deeper into server overload details, other tell me "Don't worry so much about performance".

This is a little confusing for a poor DIY contributor ^__^

Nevertheless, some of my ideas will spread over the *whole* it.source project by imitation and by bot activity (so that any my mistake could really have a significant effect; small... it.source is nothing when compared with the whole set of wiki projects...), and there's the risk that a little bit of my ideas could spread into other projects too, so I try to be careful.

Alex

Tim Starling

5:21 p.m.

On 12/01/11 05:34, Casey Brown wrote:

...

On Tue, Jan 11, 2011 at 10:55 AM, Alex Brollo alex.brollo@gmail.com wrote:

...
I'm already using your suggestion, today I replaced a complex test template from our village pump (replacing it with a link to a subpage, visited by interested users only) and, really, the difference in rendering the page village pump page was obviuos.

That's good, but also keep in mind that, generally, you shouldn't worry too much about performance: http://en.wikipedia.org/wiki/WP:PERF. (Had to throw in the little disclaimer here. ;-))

I've always been opposed to that policy.

-- Tim Starling

Aryeh Gregor

5:31 p.m.

On Tue, Jan 11, 2011 at 5:21 PM, Tim Starling tstarling@wikimedia.org wrote:

...

I've always been opposed to that policy.

Are you aware of the completely insane things users have sometimes established as conventions or even policies based on nonsensical server-load grounds? Like on the Dutch Wikipedia, apparently new users were routinely being told to make as few edits as possible so that the servers wouldn't run out of disk space:

http://en.wikipedia.org/wiki/Wikipedia_talk:Don%27t_worry_about_performance#...

An English Wikipedia user tried to argue for a couple of years that Wikipedia was becoming slow because of too many links between pages, and that something terrible would happen if templates didn't have fewer links in them (fortunately no one listened to him that I know of):

http://en.wikipedia.org/wiki/Wikipedia_talk:Don%27t_worry_about_performance#...

There are probably even stupider things that I don't know about. Ideally users would understand the issues involved and make intelligent decisions about server load, but in practice the policy seems to prevent a lot more harm than it causes. Users are just not going to be able to figure out what causes server load without specific instruction by sysadmins.

5:49 p.m.

On Tue, Jan 11, 2011 at 4:31 PM, Aryeh Gregor Simetrical+wikilist@gmail.com wrote:

...

On Tue, Jan 11, 2011 at 5:21 PM, Tim Starling tstarling@wikimedia.org wrote:

...
I've always been opposed to that policy.

Are you aware of the completely insane things users have sometimes established as conventions or even policies based on nonsensical server-load grounds?

*cough*

http://en.wikipedia.org/wiki/Template:Toolserver

24k revisions of pretty useless historical information.

http://en.wikipedia.org/wiki/Wikipedia:Open_proxy_detection 80k revisions

and so forth and so on.

Tim Starling

12 Jan 12 Jan

6:51 p.m.

On 12/01/11 09:31, Aryeh Gregor wrote:

...

On Tue, Jan 11, 2011 at 5:21 PM, Tim Starling tstarling@wikimedia.org wrote:

...
I've always been opposed to that policy.

Are you aware of the completely insane things users have sometimes established as conventions or even policies based on nonsensical server-load grounds?

Yes. I know that the main reason for the existence of the "don't worry about performance" page is to make such policy debates easier by elevating elitism to the status of a pseudo-policy. It means that sysadmins don't have to explain anything, they just have to say "what I say goes, see [[WP:PERF]]."

My issue with it is that it tends to discourage smart, capable users who are interested in improving server performance. Particularly in the area of template design, optimising server performance is important, and it's frequently done by users with a great amount of impact. It's not very hard. I've done it myself from time to time, but it's best done by people with a knowledge of the templates in question and the articles they serve.

Taking a few simple measures, like reducing the number of arguments in loop-style templates down to the minimum necessary, can have a huge impact on the parse time of very popular pages. I've given general tips in the past.

...

Users are just not going to be able to figure out what causes server load without specific instruction by sysadmins.

I think this is an exaggeration.

When I optimise the parse time of particular pages, I don't even use my sysadmin access. The best way to do it is to download the page with all its templates using Special:Export, and then to load it into a local wiki. Parsing large pages is typically CPU-dominated, so you can get a very good approximation without simulating the whole network. Once the page is in your local wiki, you can use whatever profiling tools you like: the MW profiler with extra sections, xdebug, gprof, etc. And you can modify the test cases very easily.

-- Tim Starling

Aryeh Gregor

7:32 p.m.

On Wed, Jan 12, 2011 at 6:51 PM, Tim Starling tstarling@wikimedia.org wrote:

...

I think this is an exaggeration.

When I optimise the parse time of particular pages, I don't even use my sysadmin access. The best way to do it is to download the page with all its templates using Special:Export, and then to load it into a local wiki.

But how do you determine which templates are causing server load problems? If we could expose enough profiling info to users that they could figure out what's causing load so that they know their optimization work is having an effect, I'd be all for encouraging them to optimize. The problem is that left to their own devices, people who have no idea what they're talking about make up nonsensical server load problems, and there's no way for even fairly technical users to figure out that these people indeed have no idea what they're talking about. If we can expose clear metrics to users, like amount of CPU time used per template, then encouraging them to optimize those specific metrics is certainly a good idea.

...

Parsing large pages is typically CPU-dominated, so you can get a very good approximation without simulating the whole network.

Templates that use a lot of CPU time cause user-visible latency in addition to server load, and WP:PERF already says it's okay to try optimizing clear user-visible problems (although it could be less equivocal about it).

Alex Brollo

13 Jan 13 Jan

9:41 a.m.

Just to give an example: i wrote a different algorithm for [[en:s:template;Loop]], naming it [[en:s:template;Loop!]] and I asked for 100 and 101 dots with them into an empty sandbox preview.

These are results:

Sandbox, empty, preview: Preprocessor node count: 35/1000000 Post-expand include size: 1858/2048000 bytes Template argument size: 450/2048000 bytes Expensive parser function count: 0/500

Sandbox, 2 calls to loop to print 100 and 101 dots, preview Preprocessor node count: 1045/1000000 Post-expand include size: 2260/2048000 bytes Template argument size: 1551/2048000 bytes Expensive parser function count: 0/500

Sandbox, 2 calls to loop! to print the same dots, preview Preprocessor node count: 193/1000000 Post-expand include size: 2300/2048000 bytes Template argument size: 680/2048000 bytes Expensive parser function count: 0/500

Is there really no useful feedback from these data? Really there's no correlation with "server load"?

Alex

Tim Starling

7:24 p.m.

On 14/01/11 01:41, Alex Brollo wrote:

...

Just to give an example: i wrote a different algorithm for [[en:s:template;Loop]], naming it [[en:s:template;Loop!]] and I asked for 100 and 101 dots with them into an empty sandbox preview.

These are results:

Sandbox, empty, preview: Preprocessor node count: 35/1000000 Post-expand include size: 1858/2048000 bytes Template argument size: 450/2048000 bytes Expensive parser function count: 0/500

Sandbox, 2 calls to loop to print 100 and 101 dots, preview Preprocessor node count: 1045/1000000 Post-expand include size: 2260/2048000 bytes Template argument size: 1551/2048000 bytes Expensive parser function count: 0/500

Sandbox, 2 calls to loop! to print the same dots, preview Preprocessor node count: 193/1000000 Post-expand include size: 2300/2048000 bytes Template argument size: 680/2048000 bytes Expensive parser function count: 0/500

Is there really no useful feedback from these data? Really there's no correlation with "server load"?

Yes, there is a correlation between all of those values and server CPU time. In particular, the much smaller preprocessor node count implies in particular implies that your version will be much faster.

However, I'm not sure how that obtained that result, since {{loop!|100|x}} just expands to {{loop|100|x}}, since it hits the default case of the #switch. When I try it, I get a preprocessor node count of 1069, not 193.

Your version is certainly much faster for a count of 5, rather than 100. For 100 repetitions of {{loop!|5|x}}, I get:

For 100 repetitions of {{loop|5|x}}, I get:

An empty preview takes 0.368s, so that implies that your version is about 17 times faster for a count of 5. That's very close to the ratio of preprocessor node counts.

This is exactly the kind of optimisation that competent editors like you can and should be doing.

-- Tim Starling

Alex Brollo

14 Jan 14 Jan

2:49 a.m.

2011/1/14 Tim Starling tstarling@wikimedia.org

...

However, I'm not sure how that obtained that result, since {{loop!|100|x}} just expands to {{loop|100|x}}, since it hits the default case of the #switch. When I try it, I get a preprocessor node count of 1069, not 193.

:-)

The {{loop!|100|x}} is deprecated; loop! only manages by itself numbers between 1 and 10, larger numbers as 100 should be obtained by nesting or appending, as suggested into the doc of the template.For back-compatibility, {{loop!|100|x}} simply calls {{loop|100|x}} instead of raising an error. So, it's not surprising that optimization is only met into the range 1..10.

Preprocessor count 193 comes from suggested syntax for 100 (1 nesting) + 101 (1 nesting and 1 appending):

{{loop!|10|{{loop!|10|x}}}}

{{loop!|10|{{loop!|10|x}}}}{{loop!|1|x}}

This call is running into en:s:Wikisource:Sandbox now, and I got same metrics from that page's html:

Nevertheless, loop is mainly an example and a test, not so a useful template. Dealing with the trouble of main metadata consistency into wikisource, deeply undermined by redundancy, our way to fix things really produces higher metrics when compared with other projects results.... but now I have a small "toolbox" of tricks to evaluate such a difference (rendering time + existing metrics).

Obviously it would be great to have better metrics for good, consistent perfomance comparison; but, as I told, I don't want to overload servers just to produce new metrics to evaluate server overload. ;-)

Alex

Brion Vibber

13 Jan 13 Jan

11:03 a.m.

On Wed, Jan 12, 2011 at 6:51 PM, Tim Starling tstarling@wikimedia.orgwrote: [snip]

...

When I optimise the parse time of particular pages, I don't even use my sysadmin access. The best way to do it is to download the page with all its templates using Special:Export, and then to load it into a local wiki. Parsing large pages is typically CPU-dominated, so you can get a very good approximation without simulating the whole network. Once the page is in your local wiki, you can use whatever profiling tools you like: the MW profiler with extra sections, xdebug, gprof, etc. And you can modify the test cases very easily.

Well, that's the entire point of WP:PERF, at least before it was elevated to acronym apotheosis. One might reword it as "optimize through science, not superstition".

You're exactly the sort of person who can and should worry about performance: you have well-developed debugging skills and significant knowledge of the system internals. By following well-understood logical processes you can very effectively identify performance bottlenecks and find either workarounds (do your template like THIS and it's faster) or fixes (if we make THIS change to the parser or database lookup code, it goes faster).

I'm going to go out on a limb though and say that most people don't themselves have the tools or skills to do that. It's not rocket science, but these are not standard-issue skills. (Maybe they should be, but that's a story for the educational system!)

The next step from thinking there's a problem is to investigate it knowledgeably, which means either *having* those skills already, *developing* them, or *finding* someone else who does.

-- brion

MZMcBride

11 Jan 11 Jan

6:40 p.m.

Tim Starling wrote:

...

On 12/01/11 05:34, Casey Brown wrote:

...
On Tue, Jan 11, 2011 at 10:55 AM, Alex Brollo alex.brollo@gmail.com wrote:

...
I'm already using your suggestion, today I replaced a complex test template from our village pump (replacing it with a link to a subpage, visited by interested users only) and, really, the difference in rendering the page village pump page was obviuos.

That's good, but also keep in mind that, generally, you shouldn't worry too much about performance: http://en.wikipedia.org/wiki/WP:PERF. (Had to throw in the little disclaimer here. ;-))

I've always been opposed to that policy.

As the person who implemented the expensive parser function count, I don't imagine anyone on this list finds your opposition surprising. I do find the view that users ought to be concerned about accidentally using too many {{#ifexist:}}s or {{PAGESIZE:}}s on a page (for example) to be a horrible approach to user experience, though.

MZMcBride

Aryeh Gregor

4:22 p.m.

On Tue, Jan 11, 2011 at 10:55 AM, Alex Brollo alex.brollo@gmail.com wrote:

...

Nevertheless... sometime people tells me "don't us this hack, it is server overloading"... sometimes it isn't, or simply it is a undocumented, unproofed personal opinion.

Ignore them. Server overload is not a problem that users are in a position to evaluate, and a lot of users get completely insane ideas about performance. There have been cases of wikis making up entire policies that were completely groundless. The performance issues you should be paying attention to are the ones that are visible to the front-end, i.e., ones that produce slowness or error messages that you can personally see. If anyone tries to tell you that you should or should not do something because of server load, point them to http://en.wikipedia.org/wiki/WP:PERF and ignore them.

(Except if they're a sysadmin. But if a performance issue becomes important enough that a sysadmin intervenes, they're not going to give you the option of ignoring them.)

5091

Age (days ago)

5099

Last active (days ago)

wikitech-l@lists.wikimedia.org

23 comments

9 participants

tags (0)

participants (9)

Alex
Alex Brollo
Aryeh Gregor
Brion Vibber
Casey Brown
MZMcBride
OQ
Platonides
Tim Starling