Tim Starling wrote:
Timwi wrote:
My guess is that the slowest part of it is checking whether a page exists, and if it does, checking its size (if the user has set the preference that shows stubs in a different colour), because both of this requires a database query.
What, even with the linkscc cache and the memcached link cache? If you say so.
I apologise if my comment was in any way offensive to you, but please do take note of the fact that (a) I said it was a guess; (b) I did mention somewhere else that I have no real idea to what extent memcached is already being used; (c) I have not attacked you, or even addressed you at all.
With that said, please may I humbly ask what "the linkscc cache" actually caches? What exactly is stored in each memcache key here?
Nick Pisarro wrote:
The current parser, which performs dozens of passes, probably degrades by the square of the file size.
Really? All the regular expressions I've seen should be possible in O(N) time. There's no PHP loops which loop through every character, just through certain kinds of entities such as every link. I would have thought that 14 passes at O(N) still produces O(N). Oh well, I'm not a computer scientist, what would I know.
Our current parser is most probably O(n), but with a high constant factor. The time complexity of an algorithm is rarely useful as a measurement of its efficiency, especially on data of approximately constant size.
Isn't history compression going to be detrimental to CPU usage rather than beneficial? I am still finding it hard to understand why so many people here feel that history compression is necessary.
Timwi