Hello Brion,
this is a very good overview of the SQL needed to show a page.
You suggest to merge several of these queries, which is an idea
to improve performance without to much changing of code.
But this idea could be moved even further: Most of our users not
only want to read one page, they read lots of them. It is possible
to make several queries only once in the lifetime of a user session
or even the lifetime of the server process.
On Sat, Nov 16, 2002 at 11:53:58PM -0800, Brion Vibber wrote:
Simply viewing an article currently involves
performing a lot of
queries. Since it's our most common event, we should perhaps be
directing optimizing effort in that direction.
(if logged in) check user info:
User::loadFromDatabase:SQL: SELECT user_name,user_password,
user_newpassword,user_email,user_options,user_rights,user_newtalk
FROM user WHERE user_id=1
This could be done once the user session starts and be kept in a session
variable. This would require us to keep a persistent session which would
imply use of additional cookies. Since we already use cookies this
should not cause any headache.
Check if page exists:
LinkCache::addLink:SQL: SELECT cur_id FROM cur
WHERE (cur_namespace=0 AND cur_title='Test:stuff')
Checking for a pages existance is a very common task. The information
needed is
- Does the page exist?
- Is it a stub?
- Is it locked?
This could be stored in shared memory. A rough estimate: there are
about 150,000 pages at the moment. If we store 64 bytes of information
per article (its name, size and lock. Something else?) this is an amount
of 9.6 MB.
Load the content:
Article::loadContent:SQL: SELECT cur_text,cur_timestamp,cur_user,
cur_counter FROM cur WHERE cur_id=19
[These last two can be combined into one query.]
Now, separately for each and every linked page:
Check if it exists:
LinkCache::addLink:SQL: SELECT cur_id FROM cur
WHERE (cur_namespace=0 AND cur_title='A_link')
(if stub threshold set) check its size and redirect status:
SQL: SELECT length(cur_text) AS x, cur_namespace, cur_is_redirect
FROM cur WHERE cur_id='96'
[Checking the size can be done with checking existence, and it may be
better to chalk up a list of all linked pages during initial parsing,
then check them all in one big query.]
This information can be fetched from the shared memory table proposed
above, too.
Check if current page is restricted for editing:
wfGetSQL:SQL: SELECT cur_restrictions FROM cur WHERE (cur_id=19)
[We should get cur_restrictions along with the rest in the initial
query.]
(if logged in) check if on watchlist:
wfGetSQL:SQL: SELECT user_watch FROM user WHERE (user_id=1)
[For users with big watchlists like me, it may be inefficient to load
and parse the entire list on every single page view. It should perhaps
either be grabbed along with the rest of the user info, or stored in a
separate table where the database can parse the gobs of strings for us.]
Check for existence of talk page (twice - sidebar and bottom bar):
LinkCache::addLink:SQL: SELECT cur_id FROM cur
WHERE (cur_namespace=1 AND cur_title='Test:stuff')
LinkCache::addLink:SQL: SELECT cur_id FROM cur
WHERE (cur_namespace=1 AND cur_title='Test:stuff')
[Should only need to do this once and store the result...]
Could be fetched from the shared memory table proposed above, too.
The code for creating and changing an article would have to update the
shared memory segment. Care has to be taken that no two users change
the memory at the same time. Since the operations to perform on the
memory are not that time-intensive a global semaphore locking
write access should be sufficient.
Regards,
JeLuF