On mer, 2002-02-13 at 07:36, Magnus Manske wrote:
I just ran "ab n=10" for an atricle (with
cache turned off) and deactivated
some functions to see where the slow parts are.
Full rendering : 4.99 sec
removeHTMLtags turned off : 3.319 sec
How much time does parseContents() take?
It seems removeHTMLtags is responsible for 1/3 of the
*total* runtime, which
includes apache, php calling, and a thousand other things that can't be
avoided.
Well, it can be made much more efficient... As Jan has hinted, explode()
is a killer, and I can take that out.
So, if these HTML tags are *never* used anyway, why
can't we replace them
with < and > just prior to saving an edited article?
I just have two objections:
First, it violates the principal of least surprise; the user doesn't get
the same thing upon a re-edit that they left during the last edit. This
is particularly annoying for people who are putting complicated tables
into articles (cf. [[Beryllium]] and [[Periodic Table]]) -- if they do
one thing wrong, POOF! Half their table <tags> suddenly turn into
<tags> and instead of fixing one tiny mistake, they fix one tiny
mistake AND change a lot of <>s back into <>s.
Conclusion: bad for users.
Second, enforcing the limited subset HTML is just a part of the wiki
parsing. Doing that on save is fine, but is basically doing half the
parsing job and caching that, then doing the other half when we display
the page. Why stop there, when we could just parse the wiki-specific
code while we're at it and save the final result?
Conclusion: what exactly is our goal here? To save processing time on
page load? This is most effectively done by caching the completely
parsed version, both HTML and wiki -> HTML.
I'll be gone tomorrow until Saturday, and I doubt
I can hack it today, so
it's up to you...
-- brion vibber (brion @
pobox.com)