[Wikitech-l] Re: Faster parsing

26 Feb 2004


      Tim Starling wrote:
...
Timwi wrote:
...
My guess is that the slowest part of it is checking whether a page
exists, and if it does, checking its size (if the user has set the
preference that shows stubs in a different colour), because both of
this requires a database query.
What, even with the linkscc cache and the memcached link cache? If you 
say so.
I apologise if my comment was in any way offensive to you, but please do 
take note of the fact that (a) I said it was a guess; (b) I did mention 
somewhere else that I have no real idea to what extent memcached is 
already being used; (c) I have not attacked you, or even addressed you 
at all.
With that said, please may I humbly ask what "the linkscc cache" 
actually caches? What exactly is stored in each memcache key here?
...
Nick Pisarro wrote:
...
The current parser, which performs dozens of passes, probably degrades
by the square of the file size.
Really? All the regular expressions I've seen should be possible in O(N) 
time. There's no PHP loops which loop through every character, just 
through certain kinds of entities such as every link. I would have 
thought that 14 passes at O(N) still produces O(N). Oh well, I'm not a 
computer scientist, what would I know.
Our current parser is most probably O(n), but with a high constant 
factor. The time complexity of an algorithm is rarely useful as a 
measurement of its efficiency, especially on data of approximately 
constant size.
...
See http://meta.wikipedia.org/wiki/History_compression
Isn't history compression going to be detrimental to CPU usage rather 
than beneficial? I am still finding it hard to understand why so many 
people here feel that history compression is necessary.
Timwi

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

[Wikitech-l] Re: Faster parsing