On Fri, Jul 30, 2010 at 1:32 PM, John Vandenberg jayvdb@gmail.com wrote:
So you're telling my theoretical logged-in-reader to use default prefs, or log out, when the reason they are a logged-in-reader is so they can control their preferences..!
Yep. You want features, you often pay a performance penalty. In this case the performance penalty should be reducible, or at least clearly marked, but that's a general rule anyway.
Surely there are a few common 'preference sets' which large numbers of readers use?
Changing any parser-related preference will kill page load times.
There are plenty of pages which change more than once per minute,
No pages change once per minute on average. That would be 1440 edits per day, or more than 500,000 per year. Only one page on enwiki (WP:AIAV) has more than 500,000 edits *total*, let alone per year. There were only 18 edits to WP:ANI between 17:00 and 18:00 today, just for example, which is less than one edit every three minutes. There are some times when a particular page changes many times in a minute -- like when a major event occurs and everyone rushes to update an article -- but these are rare and don't last long.
You also seem to be missing how many different possible parser cache keys there are. It's not like there are only five or ten possible versions. As I said before -- if you change your parser-related settings around a bunch, you will probably rarely or never hit parser cache except when you yourself viewed the page since it last changed. There are too many possible permutations of settings here.
however I'd expect a much higher threshold, variable based on the volume of page activity, or some other mechanism to determine whether the cached version is acceptably stale for the logged-in-reader.
There is no infrastructure required for extra stale entries. If the viewer is happy to accept the slightly stale revision for there chosen prefs, serve it. If not, reparse.
Look, this is just not a useful solution, period. It would be extremely ineffective. If you extended the permitted staleness level so much that it would be moderately effective, it would be useless, because you'd be seeing hours- or days-old articles. On the other hand, for a comparable amount of effort you could implement a solution that actually is effective, like adding an extra postprocessing stage.
On Fri, Jul 30, 2010 at 8:22 PM, jidanni@jidanni.org wrote:
Hmmm, maybe they're there amongst the "!"s below. $ lynx --source http://en.wikipedia.org/wiki/Main_Page | grep parser Expensive parser function count: 44/500
<!-- Saved in parser cache with key enwiki:pcache:idhash:15580374-0!3!0!default!1!en!4!edit=0 and timestamp 20100731001330 -->
Yes. That key is generated by the following line in includes/parser/ParserCache.php:
$key = wfMemcKey( 'pcache', 'idhash', "{$pageid}-{$renderkey}!{$hash}{$edit}{$printable}" );
The relevant bit of that, for us, is $hash, which is generated by getPageRenderingHash() in includes/User.php:
// stubthreshold is only included below for completeness, // it will always be 0 when this function is called by parsercache.
$confstr = $this->getOption( 'math' ); $confstr .= '!' . $this->getOption( 'stubthreshold' ); if ( $wgUseDynamicDates ) { $confstr .= '!' . $this->getDatePreference(); } $confstr .= '!' . ( $this->getOption( 'numberheadings' ) ? '1' : '' ); $confstr .= '!' . $wgLang->getCode(); $confstr .= '!' . $this->getOption( 'thumbsize' ); // add in language specific options, if any $extra = $wgContLang->getExtraHashOptions(); $confstr .= $extra;
So anonymous users on enwiki have math=3, stubthreshold=0 (although the comment indicates this is irrelevant somehow), date preferences = 'default', numberheadings = 1, language = 'en', thumbsize = 4. Changing any of those from the default will make you miss the parser cache on enwiki.
On Sat, Jul 31, 2010 at 12:58 PM, Daniel Kinzler daniel@brightbyte.de wrote:
This is a few years old, but I guess it's still relevant: http://brightbyte.de/page/Client-side_skins_with_XSLT I experimented a bit with ways to do all the per-user preference stuff on the client side, with XSLT.
XSLT seems a bit baroque. If the goal is to use script to avoid cache misses, why not just use plain old JavaScript? A lot more people know it, it supports progressive rendering (does XSLT?), and it's much better supported. In particular, your approach of serving something other than HTML and relying on XSLT support to transform it will seriously confuse text browsers, search engines, etc.