Poor, Edmund W wrote:
Anyone know how to profile MySQL code?
Considering the amount of time Snok and I have spent writing and using
MediaWiki's profiling code, and considering the amount of time Brion has
spent on database optimisation (to great effect), you'll forgive me for
being slightly offended.
We've profiled, we've cached, we've optimised. We've discovered that CPU
load is generally the bottleneck when viewing pages, and hence spent
time on parser optimisation and several kinds of caching. We have 4 web
servers and one database server and still the web servers are heavily
loaded.
Timwi wrote:
My guess is that the slowest part of it is checking
whether a page
exists, and if it does, checking its size (if the user has set the
preference that shows stubs in a different colour), because both of
this requires a database query.
What, even with the linkscc cache and the memcached link cache? If you
say so.
Nick Pisarro wrote:
The current parser, which performs dozens of passes,
probably degrades
by the square of the file size.
Really? All the regular expressions I've seen should be possible in O(N)
time. There's no PHP loops which loop through every character, just
through certain kinds of entities such as every link. I would have
thought that 14 passes at O(N) still produces O(N). Oh well, I'm not a
computer scientist, what would I know.
Storing diffs in the 'old' table. This would
not affect performance,
except when loading or comparing old revisions, but could drastically
reduce the size of the database, which has to benefit how manageable
it is. There already is a differencing engine in the source, though
I'm not sure how reliable it is--it may also degrade by the square of
the file difference. Here too, a sequence of diffs can be merged in
one pass. Having written such code in the past, I plan to create a
write up exploring this idea. Has this idea been discussed amongst the
developers? What are the gotcha's?
See
http://meta.wikipedia.org/wiki/History_compression
Diffs have not been extensively speed tested, but seemed to be
reasonably fast when I was running my space tests. They produce good
compression in the talk page or incremental improvement situation, but
as expected, perform poorly in the edit war situation. As noted by Rob
Hooft in this post:
http://mail.wikipedia.org/pipermail/wikitech-l/2004-February/008385.html
it should be possible to improve compression in this case by trying
diffs with a few different revisions. As long as the time required per
diff is in the tens of milliseconds or less, this should be feasible.
-- Tim Starling