On mer, 2002-02-13 at 17:24, Jimmy Wales wrote:
Brion Vibber wrote:
I'm not sure what advantage would be gotten out of storing a version that has had HTML tags worked over, but still needs the wiki code converted into HTML every time we load it. We get more speed by caching the completely parsed version, or more storage savings by reparsing it every time and not storing anything but the the editable text.
It's worth noting that on the live server, I see no material difference when I turn caching on or off.
Interesting. I have to wonder whether this means caching is for some reason not working at all... It seems to be disabled and/or broken at the moment, unless someone sneaked in and fixed the other-languages bug while I wasn't looking.
I ran "ab -n 10" on a couple pages running on my test server with various states: caching on, caching off w/ no removeHTMLtags() call, caching off with the old removeHTMLtags() code, and caching off with my new as-yet unoptimized but more secure version of removeHTMLtags(). The pages per second figures from three trials each:
Beryllium (large HTML table, various other tags) * cached 2.06 2.06 2.16 * none 0.94 0.95 0.95 * old 0.90 0.90 0.89 * new 0.47 0.48 0.48
Esperanto-wiki mainpage: (a few <b>, <i>, and <font> tags) * cached 3.26 3.13 3.47 * none 1.84 1.83 1.76 * old 1.82 1.80 1.80 * new 1.58 1.62 1.58
Also, space is really cheap these days. And we're not in any immediate danger of running out of it.
Very true.
-- brion vibber (brion @ pobox.com)
On mer, 2002-02-13 at 22:20, Brion Vibber wrote:
I ran "ab -n 10" on a couple pages running on my test server with various states: caching on, caching off w/ no removeHTMLtags() call, caching off with the old removeHTMLtags() code, and caching off with my new as-yet unoptimized but more secure version of removeHTMLtags(). The pages per second figures from three trials each:
Beryllium (large HTML table, various other tags)
- cached 2.06 2.06 2.16
- none 0.94 0.95 0.95
- old 0.90 0.90 0.89
- new 0.47 0.48 0.48
Esperanto-wiki mainpage: (a few <b>, <i>, and <font> tags)
- cached 3.26 3.13 3.47
- none 1.84 1.83 1.76
- old 1.82 1.80 1.80
- new 1.58 1.62 1.58
I've eliminated most of the loops from removeHTMLtags(), which is now only a couple of percent slower than the old function which let JavaScript through. Which is to say, still at >90% the performance of not doing live HTML tag removal, but still below a completely parsed cache.
Beryllium * cached 2.20 2.13 2.15 * none 1.00 0.98 0.98 * newest 0.92 0.92 0.93
Esperanto-wiki mainpage * cached 3.50 3.25 3.49 * none 1.87 1.89 1.87 * newest 1.86 1.82 1.80
Still room to optimize though,
-- brion vibber (brion @ pobox.com)
From: "Brion Vibber" brion@pobox.com
Interesting. I have to wonder whether this means caching is for some reason not working at all... It seems to be disabled and/or broken at the moment, unless someone sneaked in and fixed the other-languages bug while I wasn't looking.
It works, otherwise we wouldn't have seen the language-link bug. I'm not the least surprised that caching doesn't help much. With large datasets like Wikipedia the bottlenecks are usually the database or programming that doesn't keep an eye on the big O time/space complexity of its behavior.
-- Jan Hidders
On ĵaŭ, 2002-02-14 at 01:33, Jan Hidders wrote:
From: "Brion Vibber" brion@pobox.com
Interesting. I have to wonder whether this means caching is for some reason not working at all... It seems to be disabled and/or broken at the moment, unless someone sneaked in and fixed the other-languages bug while I wasn't looking.
It works, otherwise we wouldn't have seen the language-link bug.
Well, I know it used to work on the live server (and the bug was visible there) and it works now on my test machine (and the bug is visible here), but right now I go to the live server and I *don't* see the language bug. Which means either Jimbo left it disabled, or something mysterious crept into the code that I don't know about.
Oh hey, I just noticed another bug! Wikilinks in the article change style (lose underline) after an other-language link has been parsed. That's probably my fault when I switched things over to the style-sheet code, I'll fix it.
I'm not the least surprised that caching doesn't help much. With large datasets like Wikipedia the bottlenecks are usually the database or programming that doesn't keep an eye on the big O time/space complexity of its behavior.
Very true. Which is why I don't think removeHTMLtags() is such a big barrier that we have to do it only on article save. :)
-- brion vibber (brion @ pobox.com)
On ĵaŭ, 2002-02-14 at 02:47, Brion Vibber wrote:
Oh hey, I just noticed another bug! Wikilinks in the article change style (lose underline) after an other-language link has been parsed. That's probably my fault when I switched things over to the style-sheet code, I'll fix it.
Got it. Also threw in the look-for-otherlanguage-links-when-cached fix, and put in the rest of the URL in $wikiInterwiki so that links to meta.wikipedia.com in the form [[m:Article Title]] work instead of directing to the mainpage there.
-- brion vibber (brion @ pobox.com)
Brion Vibber wrote:
Well, I know it used to work on the live server (and the bug was visible there) and it works now on my test machine (and the bug is visible here), but right now I go to the live server and I *don't* see the language bug. Which means either Jimbo left it disabled, or something mysterious crept into the code that I don't know about.
I left it disabled because it wasn't helping (didn't seem to be anyway) and because of the bug it introduced, although that's pretty minor. I can turn it back on.
But, like you, I'm wondering if maybe it wasn't _really_ on. I will test for that today by reproducing the bug to "prove" the caching is on.
--Jimbo
wikitech-l@lists.wikimedia.org