Re: [Wikitech-l] Chat about Wikipedia performance?

30 Apr 2003

      On Tue, 2003-04-29 at 23:33, Lee Daniel Crocker wrote:
...
...
(David A. Wheeler dwheeler@dwheeler.com):

Perhaps for simple reads of the current article (cur), you

could completely skip using MySQL and use the filesystem instead.
In other words, caching.
Not necessarily; it would also be possible to keep the wiki text in
files. But I'm not sure what great benefit this would have, as you still
have to go looking up various information to render it.
...
Yes, various versions of that have been
tried and proposed, and more will be. The major hassles are (1) links,
which are displayed differently when they point to existing pages, so
a page may appear differently from one view to the next depending on
the existence of other pages,
That's not a problem; one simply invalidates the caches of all linking
pages when creating/deleting.
This is already done in order to handle browser-side caching; each
page's cur_touched timestamp is updated whenever a linked page is
created or deleted. Simply regenerate the page if cur_touched is more
recent than the cached HTML.
...
...

You could start sending out text ASAP, instead of batching it.

Many browsers start displaying text as it's available, so to
users it might _feel_ faster.
A few things (like language links) currently require parsing the entire
wikitext before we output the topbar. Hypothetically we could output the
topbar after the text and let CSS take care of its location as we do for
the sidebar, but this may be problematic (ie in case of varying vertical
size due to word wrap) and would leave users navigationally stranded
while loading.
...
...
Also, holding text in-memory
may create memory pressure that forces more useful stuff out of
memory.
Not an issue. HTML is sent out immediately after it's rendered.
Well... many passes of processing are done over the wikitext on its way
to HTML, then the whole bunch is dumped out in a chunk.
...
Things like database updates are deferred until after sending;
I'm not 100% sure how safe this is; if the user closes the connection
from their browser deliberately (after all, the page _seems_ to be done
loading, why is the icon still spinning?) or due to an automatic
timeout, does the script keep running through the end or is it halted in
between queries?
...
One things that would be nice is if the HTTP connection could be
dropped immediately after sending and before those database updates.
That's easy to do with threads in Java Servlets, but I haven't
found any way to do it with Apache/PHP.
For some things (search index updates) we use INSERT/REPLACE DELAYED
queries, whose actual action will happen at some point in the future,
taken care of for us by the database. There doesn't seem to be an
equivalent for UPDATE queries.
Hypothetically we could have an entirely separate process to perform
asynchronous updates and just shove commands at it via a pipe or shared
memory, but that's probably more trouble than it's worth.
-- brion vibber (brion @ pobox.com)

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Chat about Wikipedia performance?