Uri Yanover wrote:
Well, while easiness of debugging is important, it is still possible to write a good parser that would
How do you know that the current parser is bad? Do you have numbers (measurements from the live website) that indicate that the regexp parsing is the bottleneck in the performance of today's Wikipedia website? Or do you just want to write a parser? (I know writing parsers can be great fun, seriously, but I think this discussion should focus on fixing the performance problems.)
Sometimes, a simple access to http://www.wikipedia.com/wiki/Biology takes 11 seconds, sometimes it takes 2 seconds. When it takes longer, is it because too much CPU time is spent in regexp parsing? How can we know this? From profiling the running server? Or is my HTTP request put on hold for some other reason (database locks, swapping, I/O waits, network congestion, ..., or too much CPU time spent on some other task)? If regexp parsing really is the bottleneck, how much more efficient can a new parser be? Twice as fast? Is it possible to add more hardware (multiprocessor) instead?