[Wikipedia-l] Optimizing the Wiki parser

Uri Yanover uriyan_subscribe at yahoo.com
Sat Feb 9 07:29:04 UTC 2002


> How do you know that the current parser is bad?  Do you have numbers
> (measurements from the live website) that indicate that the regexp
> parsing is the bottleneck in the performance of today's Wikipedia
> website?  Or do you just want to write a parser?  (I know writing
> parsers can be great fun, seriously, but I think this discussion
> should focus on fixing the performance problems.)

Well, I've got that impression from what Jan Hidders said.
Judging from the recent traffic on the list, the bigger problem
is with the database connection management, though, so 
you've got a point here.

I still believe, however, that the current Wiki parser is far 
from perfection, so that it will have to be re-written some 
day. What I'd personally like to see as the first step is the 
simplification and systematization of Wiki markup (e.g. 
making all links use square brackets).

> Sometimes, a simple access to http://www.wikipedia.com/wiki/Biology
> takes 11 seconds, sometimes it takes 2 seconds.  When it takes longer,
> is it because too much CPU time is spent in regexp parsing?  How can
> we know this?  From profiling the running server?  Or is my HTTP
> request put on hold for some other reason (database locks, swapping,
> I/O waits, network congestion, ..., or too much CPU time spent on some
> other task)?  If regexp parsing really is the bottleneck, how much
> more efficient can a new parser be?  Twice as fast?  Is it possible to
> add more hardware (multiprocessor) instead?

Again, I set out from the position that the Wiki parser is
so inefficient that adding any additional markup (the 
original subject of the discussion) would be devastating
to its performance. Even though the current slowdown 
seems to originate in the database connectivity, I suspect
that parser speed will become the bottleneck some time
in the future. 

Sincerely yours,
            Uri Yanover




More information about the Wikipedia-l mailing list