[Wikipedia-l] A data point -- page parsing on beta.wikipedia.com
Neil Harris
usenet at tonal.clara.co.uk
Thu Jul 11 09:55:24 UTC 2002
I've been running two programmes recently, the stressbot and a new
program, 'postit' that creates random articles
Each article contains a block of random alphabet soup, just to stress
the system.
The random characters are chosen from a range of characters, with a
weighting on special chars such as '&', '[', '], '<', '>' etc. to stress
the parser.
I've just tweaked the program to use characters in the range 32-254,
rather than the previous ASCII-only random characters.
I've noticed that this seems to speed article posting up a fair bit. The
only reason I can think of is that the parser is taking significant CPU
time for these pages, and the non-ASCII chars are causing some regexps
to terminate early, saving time.
I'll do a couple more tests to try to confirm this.
Neil
More information about the Wikipedia-l
mailing list