(re-sending from the "right" account for this list)
Hi.
I (rather urgently) need some input from someone who understands how parser caching works. (Rob: please forward as appropriate).
tl;dr:
what is the intention behind the current implementation of ParserCache::getOptionsKey()? It's based on the page ID only, not taking into account any options. This seems to imply that all users share the same parser cache key, ignoring all options that may impact cached content. Is that correct/intended? If so, why all the trouble with ParserOutput::recordOption, etc?
Background:
We just tried to enable the use of the parser cache for wikidata, and it failed, resulting in page content being shown in random languages.
I tried to split the parser cache by user language using ParserOutput:.recordOption to include userlang in the cache key. When tested locally, and also on our test system, that seemed to work fine (which seems strange now, looking at the code of getOptionsKey()).
On the life site however, it failed.
Judging by its name, getOptionsKey should generate a key that includes all options relevant to caching page content in the parser cache. But it seems it forces the same parser cache entry for all users. Is this intended?
Possible fix:
ParserCache::getOptionsKey could delegate to ContentHandler::getOptionsKey, which could then be used to override the default behavior. Would that be a sensible approach?
And if so, would it be feasible to push out such a change before the holidays?
Thanks, Daniel
On Tue, Dec 10, 2013 at 4:22 PM, Daniel Kinzler daniel@brightbyte.de wrote:
what is the intention behind the current implementation of ParserCache::getOptionsKey()? It's based on the page ID only, not taking into account any options.
Looking at the code, ParserCache::getOptionsKey() is used to get the memc key which has a list of parser option names actually used when parsing the page. So for example, if a page uses only math and thumbsize while being parsed, the value would be array( 'math', 'thumbsize' ).
Then ParserOptions::optionsHash is used to construct a key corresponding to the actual ParserOptions object, for storing the actual parser output for that page+ParserOptions combination. In the example above, it would only use the 'math' and 'thumbsize' options to vary the key; users having the same 'math' and 'thumbsize' would get the same cached parser output even if they have different options for stubthreshold, dateformat, numberheadings, userlang, editsection, an so on. This reduces cache fragmentation.
I doubt that the ContentHandler is really going to need to override getOptionsKey; the ParserOptions options used to parse the page really shouldn't vary depending on user language or other stuff like that.
On 11/12/13 08:22, Daniel Kinzler wrote:
what is the intention behind the current implementation of ParserCache::getOptionsKey()? It's based on the page ID only, not taking into account any options. This seems to imply that all users share the same parser cache key, ignoring all options that may impact cached content. Is that correct/intended?
No, the set of options which fragment the cache is the same for all users. So if the user language is included in that set of options, then users with different languages will get different parser cache objects.
That is to say, the options key stores the list of options which vary the cache. ParserOptions::optionsHash() uses this list to form a parser output key (as in ParserCache:getParserOutputKey()) which is specific to the actual options requested.
If the parser output varies by language for some users, and not others, then you may possibly have a problem, but it doesn't sound like that is what you are doing.
We just tried to enable the use of the parser cache for wikidata, and it failed, resulting in page content being shown in random languages.
That's probably because you incorrectly used $wgLang or RequestContext::getLanguage(). The user language for the parser is the one you get from ParserOptions::getUserLangObj().
During page save, a default ParserOptions is used, with the default user language, for the purposes of link table construction. The ParserOutput thus generated will be saved into the ParserCache. So it's not correct to use the context user language during parse, this will cause pollution of the parser cache.
I tried to split the parser cache by user language using ParserOutput:.recordOption to include userlang in the cache key. When tested locally, and also on our test system, that seemed to work fine (which seems strange now, looking at the code of getOptionsKey()).
It's not necessary to call ParserOutput::recordOption(). ParserOptions::getUserLangObj() will call it for you (via onAccessCallback).
ParserCache::getOptionsKey could delegate to ContentHandler::getOptionsKey, which could then be used to override the default behavior. Would that be a sensible approach?
No.
-- Tim Starling
Am 10.12.2013 22:38, schrieb Brad Jorsch (Anomie):
Looking at the code, ParserCache::getOptionsKey() is used to get the memc key which has a list of parser option names actually used when parsing the page. So for example, if a page uses only math and thumbsize while being parsed, the value would be array( 'math', 'thumbsize' ).
Am 11.12.2013 02:35, schrieb Tim Starling:
No, the set of options which fragment the cache is the same for all users. So if the user language is included in that set of options, then users with different languages will get different parser cache objects.
Ah, right, thanks! Got myself confused there.
The thing is: we are changing what's in the list of relevant options. Before the deployment, there was nothing in it, while with the new code, the user language should be there. I suppose that means we need to purge these "pointers".
Would bumping wgCacheEpoch be sufficient for that? Note that we don't care much about puring the actual parser cache entries, we want to purge the "pointer" entries in the cache.
We just tried to enable the use of the parser cache for wikidata, and it failed, resulting in page content being shown in random languages.
That's probably because you incorrectly used $wgLang or RequestContext::getLanguage(). The user language for the parser is the one you get from ParserOptions::getUserLangObj().
Oh, thanks for that hint! Seems our code is inconsistent about this, using the language from the parser options in some places, the one from the context in others. Need to fix that!
It's not necessary to call ParserOutput::recordOption(). ParserOptions::getUserLangObj() will call it for you (via onAccessCallback).
Oh great, magic hidden information flow :)
Thanks for the info, I'll get hacking on it!
-- daniel
wikitech-l@lists.wikimedia.org