---------- Forwarded message ---------- From: Brion Vibber brion@wikimedia.org Date: Tue, Sep 1, 2009 at 11:48 AM Subject: Re: [Wikitech-l] how to chang {{SITENAME}} To: Wikimedia developers wikitech-l@lists.wikimedia.org
[snip]
I'd like to ask that folks leave this thread aside for the moment other than useful replies to the original poster's request about how and where to propose changing $wgSitename for ja.wikipedia.org.
If the code paths for setting up and running the parser to do brace substitution in messages were significantly faster, we wouldn't bother optimizing a few '$1 - {{SITENAME}}'s to '$1 - Wikipedia' on a few of our high-traffic sites with stable names.
Brace substitution in messages is done using a limited mode of the parser, much as it is in templates and wiki pages; avoiding braces in messages that appear on parser-cached pages means that we avoid having to initialize the parser, which can be a very noticeable win on a high-traffic site.
If anyone is interested in actually looking into the costs of Parser setup and invocation for brace replacement in messages and optimizing this code path, that would be great, but please follow up in a new thread and only post _new_ information or questions, not repeats of what's already been said.
Thanks.
-- brion vibber (brion @ wikimedia.org)
_______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Let me be the one to fork this into a new thread. We obviously need to speed up these kinds of things and a nice roundtable discussion is a great place to start.
-Chad
Duplicate thread of http://lists.wikimedia.org/pipermail/wikitech-l/2009-September/044984.html ?
Roan Kattouw (Catrope)
On Tue, Sep 1, 2009 at 1:40 PM, Roan Kattouwroan.kattouw@gmail.com wrote:
Duplicate thread of http://lists.wikimedia.org/pipermail/wikitech-l/2009-September/044984.html ?
Roan Kattouw (Catrope)
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Ah oops, hadn't noticed it'd already been forked. Ignore me then :)
-Chad
Roan Kattouw wrote:
Duplicate thread of http://lists.wikimedia.org/pipermail/wikitech-l/2009-September/044984.html ?
Not exactly, I think. Folding constant magic words when setting up the localization cache and speeding up the parser in general would be fairly orthogonal improvements, even if both would help to accomplish the same goal: less time spent in parsing system messages.
Hello!!!
If anyone is interested in actually looking into the costs of Parser setup and invocation for brace replacement in messages and optimizing this code path, that would be great, but please follow up in a new thread and only post _new_ information or questions, not repeats of what's already been said.
has anyone commenting in this or previous thread looked at profiling data for the wfMsg parser expansions (especially with WMF message arrays ;-), or are we inventing our arguments and positions here?
Domas
On Tue, Sep 1, 2009 at 2:34 PM, Domas Mituzasmidom.lists@gmail.com wrote:
Hello!!!
If anyone is interested in actually looking into the costs of Parser setup and invocation for brace replacement in messages and optimizing this code path, that would be great, but please follow up in a new thread and only post _new_ information or questions, not repeats of what's already been said.
has anyone commenting in this or previous thread looked at profiling data for the wfMsg parser expansions (especially with WMF message arrays ;-), or are we inventing our arguments and positions here?
Domas
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
I personally haven't, but if you've got some good profiling data from WMF usage of this stuff it would certainly be helpful :)
-Chad
Chad,
I personally haven't, but if you've got some good profiling data from WMF usage of this stuff it would certainly be helpful :)
Well, in simple micro-benchmarking, wfMsg("pagetitle") took 0.5ms on enwiki and 5ms on frwiki (thats _without_ any initialization overhead). First call to enwiki's message (initialization of message cache, etc) took ~4ms, on frwiki - 14ms.
Tight runs of 1000 wfMsg("pagetitle") calls ended up running for 5s on frwiki and 0.5s on enwiki.
So, now that I somewhat debunked that it is initialization-cost-only, let's move on to actual work that is being done.
For your comparison, I made two profiles for 10-count-run (unfortunately, xdebug profile would take way too much space, if I attempted to do longer run), one for optimized, no-{{ execution path: http://noc.wikimedia.org/~midom/messages-codepath/en.png
And 10x-more-expensive one: http://noc.wikimedia.org/~midom/messages-codepath/fr.png
So, apparently, if we want dynamic messages, we have to spend not just 'additional cycles' ;-)
It is much easier to read those graphs if you take a look at call counts (anything what is 10x or more is usually being executed for each wfMsg() call, whereas other stuff is initialization-only). Also do note, as I was running it in command line, AutoLoader is way higher than it should be, so setup costs are quite inflated.
Also, the base is 55%, remaining part is mediawiki framework setup (configuration, databases, etc).
Anyone, one can see, lion's share is Parser->preprocess, which doesn't do any initialization here, pure parser magic (though it probably has some revisits to magic words and Title code that could be removed). So yes, parser init adds about 5ms, so does message cache, but then every message that has {{'s adds up 5ms each - but message cache init will be much cheaper once new cdb-based code goes live.
Anyway, if anyone still thinks that individual interface messages on every page should take 5ms each to render, let me know, I can unsubscribe you from this list myself, you won't have to worry about that.
Cheers, Domas
P.S. If anyone wants to look at raw profile files, they're at http://noc.wikimedia.org/~midom/messages-codepath/
Domas Mituzas wrote:
Anyone, one can see, lion's share is Parser->preprocess, which doesn't do any initialization here, pure parser magic (though it probably has some revisits to magic words and Title code that could be removed). So yes, parser init adds about 5ms, so does message cache, but then every message that has {{'s adds up 5ms each - but message cache init will be much cheaper once new cdb-based code goes live.
Anyway, if anyone still thinks that individual interface messages on every page should take 5ms each to render, let me know, I can unsubscribe you from this list myself, you won't have to worry about that.
Nobody said /that/. What other people have objected is your position that the only way to regain them is to manually replace {{SITENAME}} on all messages.
Parser->transformMsg() could replace many well-known "{{something" and only call preprocess if there's still some "{{". It probably means refactoring getVariableValue() and some change at CoreParserFunctions or mFunctionHooks, to not make it too ugly, but it can be done.
The parser initialization is also quite expensive, but since the parser will end up being used, I think we can omit it. Otherwise, it could be moved to a separate parser class.
Hello,
What other people have objected is your position that the only way to regain them is to manually replace {{SITENAME}} on all messages.
I didn't say "manually" ;-) One could have an automated solution :)
Parser->transformMsg() could replace many well-known "{{something" and only call preprocess if there's still some "{{". It probably means refactoring getVariableValue() and some change at CoreParserFunctions or mFunctionHooks, to not make it too ugly, but it can be done.
I find it remarkable that you want to introduce that much of obfuscation for something, what has been resolved for past few years ;-) Do note, that such code would need to be maintained for all the possible cases, would probably fail with certain grammar issues in some messages, and, dear oh dear, may not really work with internationalized magicwords ;-) Is that code going to be much faster?
The parser initialization is also quite expensive, but since the parser will end up being used, I think we can omit it.
This is very wrong assumption. Parser is not being used in most article pageviews (e.g. article text is parsed on about 7% of requests to backend, of course there're quite a few Parser calls by search, diffs, etc).
BR, Domas
Domas Mituzas wrote:
Hello,
What other people have objected is your position that the only way to regain them is to manually replace {{SITENAME}} on all messages.
I didn't say "manually" ;-) One could have an automated solution :)
An admin bot "template" subster
Parser->transformMsg() could replace many well-known "{{something" and only call preprocess if there's still some "{{". It probably means refactoring getVariableValue() and some change at CoreParserFunctions or mFunctionHooks, to not make it too ugly, but it can be done.
I find it remarkable that you want to introduce that much of obfuscation for something,
The reason for moving things around is precisely for not duplicating along the already-obfuscated codepath.
what has been resolved for past few years ;-)
Last year solutions aren't the solutions of today.
Do note, that such code would need to be maintained for all the possible cases, would probably fail with certain grammar issues in some messages, and, dear oh dear, may not really work with internationalized magicwords ;-)
If it can't completly parse it, fallback to the slower preprocess()
The messages shall use the canonical namespaces. How much more magic than namespaces have those words? ;-)
Is that code going to be much faster?
Avoiding calling the parser? It'd surprise me that it wasn't. But before profiling the results, you won't trick me to give out exact numbers. :-)
The parser initialization is also quite expensive, but since the parser will end up being used, I think we can omit it.
This is very wrong assumption. Parser is not being used in most article pageviews (e.g. article text is parsed on about 7% of requests to backend, of course there're quite a few Parser calls by search, diffs, etc).
Ouch. Of course, you're right. I failed to find the most obvious case.
On Tue, Sep 1, 2009 at 2:16 PM, Domas Mituzasmidom.lists@gmail.com wrote:
Hello,
What other people have objected is your position that the only way to regain them is to manually replace {{SITENAME}} on all messages.
I didn't say "manually" ;-) One could have an automated solution :)
Parser->transformMsg() could replace many well-known "{{something" and only call preprocess if there's still some "{{". It probably means refactoring getVariableValue() and some change at CoreParserFunctions or mFunctionHooks, to not make it too ugly, but it can be done.
I find it remarkable that you want to introduce that much of obfuscation for something, what has been resolved for past few years ;-) Do note, that such code would need to be maintained for all the possible cases, would probably fail with certain grammar issues in some messages, and, dear oh dear, may not really work with internationalized magicwords ;-) Is that code going to be much faster?
The parser initialization is also quite expensive, but since the parser will end up being used, I think we can omit it.
This is very wrong assumption. Parser is not being used in most article pageviews (e.g. article text is parsed on about 7% of requests to backend, of course there're quite a few Parser calls by search, diffs, etc).
As I suggested yesterday, perhaps not very clearly, I think the sensible thing to do is to bypass the Parser on most calls to messages like "Welcome to {{SITENAME}}" by caching the post-transformed version of the message, e.g "Welcome to Wikipedia", in the MessageCache instead. This should be possible if the message has no context dependent substitutions (e.g. if the message avoids things like substitution variables and context dependent magic words like {{NAMESPACE}}).
It would mean teaching the parser to look for context dependent elements and having some way to communicate that to the MessageCache to allow it to decide whether to cache the pre-transformed or post-transformed version. It would also mean categorizing magic words to identify whether caching is allowed, but that seems like a straightforward extension of the TTL hinting already done in MagicWord. The magic word caching hints could also be used to help decide how long the post-transformed version is likely to be good for.
More importantly, it avoids the pitfalls of trying to reintroduce parser logic in transformMsg or some other preliminary step.
-Robert Rohde
Robert Rohde wrote:
As I suggested yesterday, perhaps not very clearly, I think the sensible thing to do is to bypass the Parser on most calls to messages like "Welcome to {{SITENAME}}" by caching the post-transformed version of the message, e.g "Welcome to Wikipedia", in the MessageCache instead. This should be possible if the message has no context dependent substitutions (e.g. if the message avoids things like substitution variables and context dependent magic words like {{NAMESPACE}}).
It would mean teaching the parser to look for context dependent elements and having some way to communicate that to the MessageCache to allow it to decide whether to cache the pre-transformed or post-transformed version. It would also mean categorizing magic words to identify whether caching is allowed, but that seems like a straightforward extension of the TTL hinting already done in MagicWord. The magic word caching hints could also be used to help decide how long the post-transformed version is likely to be good for.
My (somewhat vague) idea of how this could be done would be to add a new "constant-folding" mode to the parser which only does brace substitution for magic words that it knows to be essentially constant (and whose parameters, if any, can be folded down to something that contains no braces, numbered message parameters nor anything else potentially variable) and leaves everything else alone.
It's been quite some time since I last looked at the parser code in any detail, but surely that can't be *that* hard -- after all, we already have the "pre-save transform" which does something quite similar.
The localization cache can just run everything through the parser in constant-folding mode, cache the output, and then use the current check (which I think simply checks for the presence of "{{") to determine if the message needs to be reparsed when it's actually used.
(Although Roan's suggestion of also folding things like {{CURRENTYEAR}} and passing their expiration time to the cache may also be worth considering. We could then just treat e.g. {{SITENAME}} as having an infinite expiration time, and any truly uncacheable magic words as expiring immediately.)
2009/9/2 Ilmari Karonen nospam@vyznev.net:
(Although Roan's suggestion of also folding things like {{CURRENTYEAR}} and passing their expiration time to the cache may also be worth considering. We could then just treat e.g. {{SITENAME}} as having an infinite expiration time, and any truly uncacheable magic words as expiring immediately.)
When using CDB, the cache is constant. You can't do incremental updates. And what if WMF uses *single* cache for all projects? Then {{SITENAME}} wouldn't be constant anymore, and we would not get better performance where it is needed most. It could help other MediaWiki installations a bit, just how much... I don't know.
Niklas Laxström wrote:
When using CDB, the cache is constant. You can't do incremental updates. And what if WMF uses *single* cache for all projects? Then {{SITENAME}} wouldn't be constant anymore, and we would not get better performance where it is needed most. It could help other MediaWiki installations a bit, just how much... I don't know.
How could it use a single cache for all projects? Each project has its own messages...
2009/9/2 Platonides Platonides@gmail.com:
Niklas Laxström wrote:
When using CDB, the cache is constant. You can't do incremental updates. And what if WMF uses *single* cache for all projects? Then {{SITENAME}} wouldn't be constant anymore, and we would not get better performance where it is needed most. It could help other MediaWiki installations a bit, just how much... I don't know.
How could it use a single cache for all projects? Each project has its own messages...
Localisation cache caches only static content, not in-wiki customisations. On the other hand, it needs some trickery if the set of extensions differ between wikis, but should still be possible. But I'm just guessing, I don't know how they are going to set it up.
Niklas Laxström wrote:
Localisation cache caches only static content, not in-wiki customisations. On the other hand, it needs some trickery if the set of extensions differ between wikis, but should still be possible. But I'm just guessing, I don't know how they are going to set it up.
How is it going to improve the current message files?
On Wed, Sep 2, 2009 at 8:34 AM, Niklas Laxströmniklas.laxstrom@gmail.com wrote:
When using CDB, the cache is constant. You can't do incremental updates.
You can, you just have to write an entirely new database every time. This will already have to be done every time the messages change. Changes to {{SITENAME}} should be much rarer, so it shouldn't be a big burden.
And what if WMF uses *single* cache for all projects?
If the cache is only for default messages (is it?), then of course it doesn't help here. We could still use memcached.
On Wed, Sep 2, 2009 at 6:51 PM, PlatonidesPlatonides@gmail.com wrote:
How is it going to improve the current message files?
Because it would be able to read only the messages it needed and not have to execute megabytes of useless PHP code on every request, I assume.
wikitech-l@lists.wikimedia.org