[Wikipedia-l] A data point -- page parsing on beta.wikipedia.com

Thu Jul 11 09:55:24 UTC 2002

I've been running two programmes recently, the stressbot and a new 
program, 'postit' that creates random articles

Each article contains a block of random alphabet soup, just to stress 
the system.
The random characters are chosen from a range of characters, with a 
weighting on special chars such as '&', '[', '], '<', '>' etc. to stress 
the parser.

I've just tweaked the program to use characters in the range 32-254, 
rather than the previous ASCII-only random characters.
I've noticed that this seems to speed article posting up a fair bit. The 
only reason I can think of is that the parser is taking significant CPU 
time for these pages, and the non-ASCII chars are causing some regexps 
to terminate early, saving time.

I'll do a couple more tests to try to confirm this.

Neil