Re: [Wikitech-l] Re: Chat about Wikipedia performance?

8 May 2003


      ...
(Nick Reinking nick@twoevils.org):
Couple quick questions:
When Wikitext is pulled from the database, what are the newlines?
MySQL gives back whatever you give it. We generally give it
Unix-style text with just \n, but a few browsers might add CRs.
...
Are they always \n?  If so, I can clean up the parsing a bit
and eke a bit more performance out (not a big deal).
It shouldn't hurt performance to just ignore and skip CRs.
That can be done in the lexer. You should never encounter CR-only
line ends.
...
Also, what format is the wikitext stored in the database as?
UTF-8?  UTF-16?
Some of the foreign ones use UTF-8. The English one is ISO-8859-1.
...
As far as performance goes, with what I'm handling now, with all the
.txt data files in the testsuite (x256 = 492672 lines), I'm seeing
parsing speeds of about 86600 lines/sec (in an 18KB executable).
So on a typical page of, say, 40-50 lines, that makes half a
millisecond spent in parsing. If PHP were 100 times worse, it
would account for 1/20th of a second per page fetch. Doesn't
sound like much of a problem to me, and I doubt it's 1000 times
worse.
Just curious: what does your parser do with Quotes.txt from
the test suite?
-- 
Lee Daniel Crocker lee@piclab.com http://www.piclab.com/lee/
"All inventions or works of authorship original to me, herein and past,
are placed irrevocably in the public domain, and may be used or modified
for any purpose, without permission, attribution, or notification."--LDC

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Re: Chat about Wikipedia performance?