Hi everyone,
As you may know, we're working on an update to Pending Changes for next month. One thing we're trying to do is to improve the perceived performance: https://bugzilla.wikimedia.org/show_bug.cgi?id=25289 ("Make review load faster by speeding up display of old revisions")
The problem, in a nutshell, is that the diff page includes two parts: the diff itself, and a parsed version of the later revision. If the later revision is not the latest revision, that revision isn't cached. For complicated pages, parsing can take 20 seconds or longer.
Priyanka is the dev working on this, and the strategy she is pursuing now is to switch from immediately parsing the later revision, and instead calling the API to get a parsed version of the latest revision. What that means is that we can display the diff immediately, and then inject the diff via Javascript.
In talking to Sam this morning about this, a couple things became clear: 1. This requires action=parse, which is already on the list of APIs that generate healthy load 2. While this move should be net-neutral in CPU load, it does shift the load from general purpose Apaches to the API servers, the latter of which are more heavily loaded
Additionally, it's theoretically possible that this will actually generate more load, since it'll be easier for people to skim the history, unintentionally generating lots of (never completed) API calls.
How big of a problem is this?
Rob
2010/10/6 Rob Lanphier robla@wikimedia.org:
Priyanka is the dev working on this, and the strategy she is pursuing now is to switch from immediately parsing the later revision, and instead calling the API to get a parsed version of the latest revision. What that means is that we can display the diff immediately, and then inject the diff via Javascript.
Presumably you mean to inject the /parsed contents/ with JS?
In talking to Sam this morning about this, a couple things became clear:
- This requires action=parse, which is already on the list of APIs
that generate healthy load
For parser cache misses, yes. From you post I understand that you'd only be doing this for parser cache misses, correct?
- While this move should be net-neutral in CPU load, it does shift
the load from general purpose Apaches to the API servers, the latter of which are more heavily loaded
That could cause problems, depending on the volume. If we (and by that I mean Mark, CCed) are prepared so we can move Apaches from the general cluster to the API cluster relatively quickly, I think we should be able to rebalance the load that way.
Additionally, it's theoretically possible that this will actually generate more load, since it'll be easier for people to skim the history, unintentionally generating lots of (never completed) API calls.
This is the greater of my worries. Would it be terribly intrusive to only load the parsed page upon user interaction (e.g. using a link or button) by default, possibly with a user preference to always load it?
Roan Kattouw (Catrope)
Roan Kattouw wrote:
2010/10/6 Rob Lanphier robla@wikimedia.org:
Priyanka is the dev working on this, and the strategy she is pursuing now is to switch from immediately parsing the later revision, and instead calling the API to get a parsed version of the latest revision. What that means is that we can display the diff immediately, and then inject the diff via Javascript.
Presumably you mean to inject the /parsed contents/ with JS?
Yes.
In talking to Sam this morning about this, a couple things became clear:
- This requires action=parse, which is already on the list of APIs
that generate healthy load
For parser cache misses, yes. From you post I understand that you'd only be doing this for parser cache misses, correct?
For consistency, it may also be done for all of them. I don't know which one will they implement at the end.
- While this move should be net-neutral in CPU load, it does shift
the load from general purpose Apaches to the API servers, the latter of which are more heavily loaded
That could cause problems, depending on the volume. If we (and by that I mean Mark, CCed) are prepared so we can move Apaches from the general cluster to the API cluster relatively quickly, I think we should be able to rebalance the load that way.
Additionally, it's theoretically possible that this will actually generate more load, since it'll be easier for people to skim the history, unintentionally generating lots of (never completed) API calls.
Shouldn't an incompelte api call abort the script?
This is the greater of my worries. Would it be terribly intrusive to only load the parsed page upon user interaction (e.g. using a link or button) by default, possibly with a user preference to always load it?
Roan Kattouw (Catrope)
There's already a preference "don't show content below diffs", so once content-less diffs get that link, the default could be changed (if diffs are such a big issue, I don't know why it hasn't been already changed).
2010/10/7 Platonides platonides@gmail.com:
For consistency, it may also be done for all of them. I don't know which one will they implement at the end.
It doesn't matter much either way, because action=parse is fast for parser cache hits.
Shouldn't an incompelte api call abort the script?
I don't think it does, but I'm not sure.
There's already a preference "don't show content below diffs", so once content-less diffs get that link, the default could be changed (if diffs are such a big issue, I don't know why it hasn't been already changed).
I'm not sure that it's gonna be an issue (no way to tell without trying it out), but it makes sense to avoid wasteful requests. Changing the default of the existing preference makes sense to me.
Roan Kattouw (Catrope)
On Thu, Oct 7, 2010 at 11:16 AM, Platonides platonides@gmail.com wrote:
... (if diffs are such a big issue, I don't know why it hasn't been already changed).
Diffs aren't the problem. It's the showing of uncached parse that's the issue. And it's always been a problem, this is nothing new. IMHO, diffonly should probably be set to true by default, but it would be nice to get some pref stats / get other feedback from end-users.
I'm pretty sure that it's become a bit more obvious since the workflow of Pending Changes has people looking at more diffs than before. I could be wrong...
-Chad
mediawiki-api@lists.wikimedia.org