If this is the problem, we are in luck because there have been a lot of good improvement suggestions. But they all add complexity to the code (or database setup) and "premature optimization is the root of all kinds of evil," so if link checking isn't a bottleneck it would
be
counterproductive to spend a lot of time to try to optimize it.
I think it's a bit premature for that yet. I think the differently- rendered missing links feature is pretty critical, not just a frill. In the short term the hardware will bail us out until we can find a solution.
What I was suggesting is just that we '''test''' the performance with differently rendered links turned off, so that we know if this is a place where optimization is required. I expect it is, but I think it makes sense to check.
Even a simple hash in shared means that we are maintaining data in two places all the time and thus and makes things somewhat more complex to maintain.
I think this is a good idea -- if link checking is creating the current bottleneck. And that's why I voiced support turning off the links to see the performance impact before anybody spends time coding an optimized link checking routine.
--Mark Christensen
On Thu, May 01, 2003 at 04:10:27PM -0400, Mark Christensen wrote:
If this is the problem, we are in luck because there have been a lot of good improvement suggestions. But they all add complexity to the code (or database setup) and "premature optimization is the root of all kinds of evil," so if link checking isn't a bottleneck it would
be
counterproductive to spend a lot of time to try to optimize it.
I think it's a bit premature for that yet. I think the differently- rendered missing links feature is pretty critical, not just a frill. In the short term the hardware will bail us out until we can find a solution.
What I was suggesting is just that we '''test''' the performance with differently rendered links turned off, so that we know if this is a place where optimization is required. I expect it is, but I think it makes sense to check.
OK, I did a little testing. The backend used was wikipedia.org, so there is a small background noise caused by other users in the data. Tests were performed "logged in" as JeLuF and without being logged in. Only the "Cookie:" request header was removed. Everything else (Accept: etc) was the same.
The following User Preferences were common to all "logged in" tests:
Quickbar: floating left Rendering Math: HTML if simple else PNG Textbox: 20x60 Local time: +2 [ ] Underline Links [X] Format broken links like this [ ] Justify paragraphs [ ] Hide minor edits in RC [X] Enhanced RC [ ] Auto-number Headings [X] Remember password across sessions [X] Edit box has full width [X] Edit on double click [ ] Watch new and modified articles [ ] Mark all edits minor [ ] Show preview before edit box Nickname: empty # titles in RC: 150 Search settings Hits p. page: 40 Lines p. hit: 5 character: 50
JeLuF is an administrator.
The page fetched was [[Pope]], it has a medium number of links to other pages.
The other parameters were altered:
anonymous:
Average: 0,81 s Std dev: 0,65 s 95% Conf Int: 0,65 - 0,98 s Samples: 60
Logged in stub threshold 500 Nostalgica skin
Average: 3,57 s Std dev: 1,94 s 95% Conf Int: 2,97 - 4,17 Samples: 40
Logged in stub threshold 0 Nostalgica skin
Average: 3,26 s Std dev: 2,29 s 95% Conf Int: 2,68 - 3,83 Samples: 60
Logged in stub threshold 500 Standard skin
Average: 2,05 s Std dev: 1,07 s 95% Conf Int: 1,59 - 2,52 Samples: 20
Logged in stub threshold 0 Standard skin
Average: 2,62 s Std dev: 1,39 95% Conf Int: 2,19 - 3,05 Samples: 40
Logged in stub threshold 0 Standard skin [ ] Show hover over wiki links disabled
Average: 1,98 s Std dev: 1,21 s 95% Conf Int: 1,45 - 2,51 s Samples: 20
The tests were performed in chunks of 20. The anonymous tests were done at first, at last and in the middle of the tests. All three resulted in similar datasets.
Conclusion:
Apparently, anonymous user get served much faster than logged in ones. The stub threshold seems to be of minor effect. The 95% confidence intervals with and without stub threshold covered each other pretty much.
The Standard skin seems to be faster than the Nostalgica skin.
----
Open questions: Why are anonymous users so much faster than logged in ones? The existance of link targets is checked for both anonymous and logged in users. This can't be the difference. Are there any additional database requests? Obviously there are - User profile - Checks for stubs only for logged in users. The later apparently has only minor influence. But loading the user profile can't take that long I'm sure.
Regards,
JeLuF
wikitech-l@lists.wikimedia.org