Timwi wrote:
Brion Vibber wrote:
Jeremy Dunck wrote:
According to the hardware orders page[1], a major bottleneck is page rendering.
Does any phase of this stand out? DB fetch? Text parse? Request/Response bandwidth?
Depends on the page. On most pages the biggest chunks of rendering time are in handling links. (On most pages most of the markup is links, so...)
Do we still query the database for all these links to decide whether they exist and (depending on user settings) whether they are stubs? As I recall, someone once had an idea to write a simple, light-weight daemon for this that uses a dictionary data structure (I presume they were thinking of a DAWG (directed acyclic word graph)) for this.
There are many algorithms available to generate DAWGs; I wrote one in Java once, but it's slow, you can probably find a faster one in a faster language. Or has this idea been scrapped for some reason?
I considered implementing a cache for link existence, but when we realised it could be done with a quite satisfactory speed, using the cur (now page) table only, the robustness gains convinced me that it was the best solution. External daemons necessarily bring a higher system administration overhead. Caches, especially ones which don't respect database transactions, tend to develop inconsistencies.
I think Brion meant replaceInternalLinks (7.9%) and the CPU component of replaceLinkHolders (3.2%), not the 0.75% of execution time now required for link lookup within the parser.
It's true that the addLinkObj query still takes 5.5%. In my opinion, that can be reduced to near zero by implementing batch lookup techniques ubiquitously.
-- Tim Starling