Re: [Wikipedia-l] Optimizing the Wiki parser

8 Feb 2002


      Uri Yanover wrote:
...
Well, while easiness of debugging is important, it 
is still possible to write a good parser that would
How do you know that the current parser is bad?  Do you have numbers
(measurements from the live website) that indicate that the regexp
parsing is the bottleneck in the performance of today's Wikipedia
website?  Or do you just want to write a parser?  (I know writing
parsers can be great fun, seriously, but I think this discussion
should focus on fixing the performance problems.)
Sometimes, a simple access to http://www.wikipedia.com/wiki/Biology
takes 11 seconds, sometimes it takes 2 seconds.  When it takes longer,
is it because too much CPU time is spent in regexp parsing?  How can
we know this?  From profiling the running server?  Or is my HTTP
request put on hold for some other reason (database locks, swapping,
I/O waits, network congestion, ..., or too much CPU time spent on some
other task)?  If regexp parsing really is the bottleneck, how much
more efficient can a new parser be?  Twice as fast?  Is it possible to
add more hardware (multiprocessor) instead?
-- 
  Lars Aronsson
  lars@aronsson.se
  tel +46-70-7891609
  http://aronsson.se/ http://elektrosmog.nu/ http://susning.nu/

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

Re: [Wikipedia-l] Optimizing the Wiki parser