I've committed a feature which makes it possible to collapse long pages and load individual sections from the TOC. I've also improved section editing/viewing behavior to load subsections within a section instead of just the mother section. Long pages are collapsed when the user explicitly requests it ("Collapse page"), or when they are above a size threshold which can be set in the prefs (default 30K). I coded this primarily to address super-long pages like VfD.
The main problem I haven't been able to fix yet is that after saving a section, when coming from a section view, we return to the full article view (or collapsed if it is above the size threshold). I've experimented with passing parameters to avoid that but haven't quite found the right solution yet. If someone else wants to fix this, be my guest.
Another issue is that saving a section in a very long page still takes a fair amount of time. I haven't checked but I suspect this is because the entire links cache is updated. Surely some optimization would be possible here.
In terms of code, this isn't very clean yet as we have two quite redundant TOC generators now, one which generates from source and the other which generates from HTML. We probably should get rid of the one which generates from HTML and use the source generator for both types of TOC. This isn't entirely trivial as the TOC generation in OutputPage is entangled with other things like headline numbering and section edit links.
Obviously this feature would also benefit much from a more efficient way to store old revisions as right now a single edit in a single section on a 90K page will lead to another 90K revision in the OLD table.
On the plus side, rendering of collapsed pages is very fast as indeed only the intro has to be rendered (the headlines are also converted to HTML to make stripping of wikitags easier, but this is concatenated into a single step and thus quite unnoticeable).
Regards,
Erik
On Wed, 2004-05-12 at 15:42 +0200, Erik Moeller wrote:
I've committed a feature which makes it possible to collapse long pages and load individual sections from the TOC. I've also improved section editing/viewing behavior to load subsections within a section instead of just the mother section. Long pages are collapsed when the user explicitly requests it ("Collapse page"), or when they are above a size threshold which can be set in the prefs (default 30K). I coded this primarily to address super-long pages like VfD.
Hm, how does this integrate with caching? I would also vote for switching this feature off by default, especially for anon users (no caching for the biggest pages otherwise).
In terms of code, this isn't very clean yet as we have two quite redundant TOC generators now, one which generates from source and the other which generates from HTML. We probably should get rid of the one which generates from HTML and use the source generator for both types of TOC. This isn't entirely trivial as the TOC generation in OutputPage is entangled with other things like headline numbering and section edit links.
On the plus side, rendering of collapsed pages is very fast as indeed only the intro has to be rendered (the headlines are also converted to HTML to make stripping of wikitags easier, but this is concatenated into a single step and thus quite unnoticeable).
Instead of re-rendering pages for everyone i think it would make more sense to render the page only once and do the customizations in js/css as far as possible. Hiding sections via js or css for the browsers that support it similar to what we do with the toc currently shouldn't be too hard and would save many round-trips to the server and repeated rendering.
Cheers
Gabriel-
Hm, how does this integrate with caching? I would also vote for switching this feature off by default, especially for anon users (no caching for the biggest pages otherwise).
I don't see how this affects caching on the Squid level. If you set the size threshold to 30K, then a page will be compressed if it has more than this number of characters, but the URL will not show the &collapse=false parameter. Surely this should be treated like any other page?
Instead of re-rendering pages for everyone i think it would make more sense to render the page only once and do the customizations in js/css as far as possible. Hiding sections via js or css for the browsers that support it similar to what we do with the toc currently shouldn't be too hard and would save many round-trips to the server and repeated rendering.
The whole point of collapsing pages is not to have to send 90K to a dialup user whenever they view VfD. I fail to see how this can be achieved using JavaScript.
Regards,
Erik
On Wed, 2004-05-12 at 15:42 +0200, Erik Moeller wrote:
On the plus side, rendering of collapsed pages is very fast as indeed only the intro has to be rendered (the headlines are also converted to HTML to make stripping of wikitags easier, but this is concatenated into a single step and thus quite unnoticeable).
I've investigated the caching issue a bit- there's nil purging for the sections and the full page view, so the code would need to be changed to send no-cache headers, otherwise anon users will get old versions of the page. I've done some benchmarks of a long testpage (http://wikidev.net/Long_testpage, 267.8Kb raw, 57.8kb gzipped for normal browsers) with section folding/no caching vs. disabled section folding/caching enabled. The (not surprising) result: Section folding without caching is about 20 times slower.
Also there are two tocs on that test page, not sure if that's intentional.
Erik, can you move the section folding code to a separate branch for now? The plan is to get 1.3 ready soon, but this feature needs more work and discussion. It would certainly be a performance problem as it is.
Cheers
Gabriel Wicke
Detailed benchmark data:
Page: http://wikidev.net/Long_testpage 267.8Kb raw, 57.8kb for normal browsers (gzipped)
Section folding enabled, no caching, full page view: ab -n 100 -H 'Accept-Encoding: gzip' -c 5 http://wikidev.net:8000/Long_testpage?action=view%5C&collapse=false
Requests per second: 0.13 [#/sec] (mean)
Connection Times (ms) min mean[+/-sd] median max Connect: 0 9 48.9 0 357 Processing: 35941 37926 1048.5 37801 42632 Waiting: 35941 37922 1048.5 37772 42632 Total: 35941 37936 1051.8 37801 42632
No section folding, caching enabled (as on wp.org), full page view: ab -n 100 -H 'Accept-Encoding: gzip' -c 5 http://wikidev.net/Long_testpage
Requests per second: 2.66 [#/sec] (mean)
Connnection Times (ms) min mean[+/-sd] median max Connect: 0 10 50.6 0 373 Processing: 3 1872 8157.5 9 37526 Waiting: 3 1868 8138.5 8 37422 Total: 3 1882 8199.2 9 37573
Gabriel-
I've investigated the caching issue a bit- there's nil purging for the sections and the full page view, so the code would need to be changed to send no-cache headers, otherwise anon users will get old versions of the page.
?? When would anons get old versions of the page? I just edited a couple of sections on wikidev.net anonymously and all changes show instantly.
I've done some benchmarks of a long testpage (http://wikidev.net/Long_testpage, 267.8Kb raw, 57.8kb gzipped for normal browsers) with section folding/no caching vs. disabled section folding/caching enabled. The (not surprising) result: Section folding without caching is about 20 times slower.
Actually, section *expanding* without caching is slower. You have benchmarked with collapse=false. That will only happen if an expansion is explicitly requested, which is a dynamic operation which may well be slow. How about benchmarking with the auto-collapsing (no URL parameters) versus the old behavior (set threshold to 0)? Surely the massively smaller pages far outweigh any benefits of caching?
Re: the duplicate TOC, investigating.
Regards,
Erik
On Wed, 2004-05-12 at 22:36 +0200, Erik Moeller wrote:
?? When would anons get old versions of the page? I just edited a couple of sections on wikidev.net anonymously and all changes show instantly.
Try with another browser that has no cookies set. After editing you have a session cookie which means no squid caching so far. None of the 'content' url's are purged, so all that would remain cacheable is the toc. However, at the moment it's sending "i'm cacheable" headers for the content urls that are never purged, hence out-of-date content.
Benchmark for toc-only page (uncached, but this is purged so could be cached): Requests per second: 1.06 [#/sec] (mean)
Connnection Times (ms) min mean[+/-sd] median max Connect: 0 11 60.2 0 444 Processing: 3919 4612 316.1 4629 5247 Waiting: 3855 4606 317.7 4626 5247 Total: 3919 4623 311.8 4630 5247
A short section: ab -n 100 -H 'Accept-Encoding: gzip' -c 5 http://wikidev.net:8000/Long_testpage?action=view%5C%C2%A7ion=24%5C%C2%A7ion...
Requests per second: 0.62 [#/sec] (mean)
Connnection Times (ms) min mean[+/-sd] median max Connect: 0 10 56.5 0 423 Processing: 3999 7930 2362.7 8804 12732 Waiting: 3999 7924 2371.0 8804 12731 Total: 3999 7940 2348.1 8804 12732
Just rendering a small section is more than four times slower than getting the full page from squid (!). I don't see how this is a performance advantage if caching is available.
Was there any discussion on this feature before? The only thing i found via Google was http://meta.wikipedia.org/wiki/List_of_feature_requests#Collapsable_tree_in_... which doesn't seem to be related to this implementation.
All the skin stuff i've done so far (heavy use of css instead of rendering another page for every user/skin etc) is aiming at a higher cache hit ratio. Just increasing the cache hit ratio by 15% means halving the load on the apaches and the DB. This way to fold sections defeats it.
Cheers
Gabriel Wicke
Gabriel-
?? When would anons get old versions of the page? I just edited a couple of sections on wikidev.net anonymously and all changes show instantly.
Try with another browser that has no cookies set. After editing you have a session cookie which means no squid caching so far. None of the 'content' url's are purged, so all that would remain cacheable is the toc. However, at the moment it's sending "i'm cacheable" headers for the content urls that are never purged, hence out-of-date content.
I presume calling SquidUpdate::purge after the save (in EditPage) with the section viewing URL as parameter (multiple ones if the whole page is edited) would solve this problem? That doesn't sound very hard.
I don't see how this is a performance advantage if caching is available.
From the view of the end user, rendering speed will be negligible if load
is low, transferring speed will always matter. So under good hardware conditions, this will make a big difference in user experience.
However, I agree that the caching should be optimized. Since you are the expert in that department, I would of course appreciate any assistance. Is my described approach above correct? Can I test this without setting up squid?
Regards,
Erik
On Thu, 2004-05-13 at 01:09 +0200, Erik Moeller wrote:
I presume calling SquidUpdate::purge after the save (in EditPage) with the section viewing URL as parameter (multiple ones if the whole page is edited) would solve this problem? That doesn't sound very hard.
How many threshold settings are there and how many headlines on a page? Even if you restricted the options to 'on' and 'off', it would be > 100 purge requests to each of the squids for a simple edit, plus it would already halve the cache hit ratio.
Separate pages on the other hand will only purge the (shorter) page that's actually edited and the page they're linked to if they are deleted/added. A long overview page can still use the individual cached page content, either with a server-side version with esi:include tags or using javascript, using the same cached page content as in the single-page view. Inform yourself at http://www.esi.org/ if there's information missing.
The category feature needs some work, maybe also a separate link table to improve efficiency- i'm looking forward to see your contributions (and discuss them if you want to save time).
Cheers
Gabriel Wicke
Gabriel-
How many threshold settings are there and how many headlines on a page?
The threshold setting only exists for logged in users.
Even if you restricted the options to 'on' and 'off', it would be > 100 purge requests
How is sending a hundred URLs - perhaps 5000 bytes - to the Squid on page saves a problem?
Separate pages on the other hand will only purge the (shorter) page that's actually edited and the page they're linked to if they are deleted/added.
Separate pages are not a good solution for this particular problem, see my responses to Brion.
Regards,
Erik
On Thu, 2004-05-13 at 13:20 +0200, Erik Moeller wrote:
Gabriel-
How many threshold settings are there and how many headlines on a page?
The threshold setting only exists for logged in users.
So what do you think all this caching-for-logged-in-users-using-esi stuff is about?
Cheers
Gabriel Wicke
Erik Moeller wrote:
Actually, section *expanding* without caching is slower. You have benchmarked with collapse=false. That will only happen if an expansion is explicitly requested, which is a dynamic operation which may well be slow. How about benchmarking with the auto-collapsing (no URL parameters) versus the old behavior (set threshold to 0)? Surely the massively smaller pages far outweigh any benefits of caching?
With server-side collapsing, expansion will require a new request which will include re-transferring everything that's already there. That's an extra burden for the slow downloader, as well as requiring parsing and rendering time on the server.
With client-side collapsing, the page need be transferred only once, and sections may be opened and closed instantaneously. The data is sent compressed if the browser supports it, as most browsers do, so it will be smaller than the raw size. A large page should also compress better as a whole than several page fragments compressed separately (each with its own copy of header, footer, sidebar markup, toc, and other parts of the page). Also the page need only be parsed/rendered to HTML once, and will be served identically from the squid cache, reducing the wait latency before the page is tranferred over the net.
-- brion vibber (brion @ pobox.com)
Brion-
With server-side collapsing, expansion will require a new request which will include re-transferring everything that's already there. That's an extra burden for the slow downloader, as well as requiring parsing and rendering time on the server.
Have you tried the feature? When you click an individual section in collapsed mode, *only that section* is loaded. Thus, no retransferring (except for the page layout). Expanding the full page should only be done when you have to edit or print it as a whole. Right now the Edit link in collapsed mode goes to the top section; I thought that was a clever feature but it's probably rather annoying. If we change this you don't even need to expand to edit in full.
Regards,
Erik
Erik Moeller wrote:
With server-side collapsing, expansion will require a new request which will include re-transferring everything that's already there. That's an extra burden for the slow downloader, as well as requiring parsing and rendering time on the server.
Have you tried the feature? When you click an individual section in collapsed mode, *only that section* is loaded. Thus, no retransferring (except for the page layout).
...
Then I don't understand the point at all. Use separate pages!
-- brion vibber (brion @ pobox.com)
Brion-
Have you tried the feature? When you click an individual section in collapsed mode, *only that section* is loaded. Thus, no retransferring (except for the page layout).
Then I don't understand the point at all. Use separate pages!
The feature was inspired by the current situation on [[Wikipedia:Votes for deletion]] (en:). The page became so long that modem users complained that they could no longer effectively work with it. (Even though parts had already been split away to other pages.) So some people got clever and started using templates for each section. Now if you want to nominate an article for deletion, you have to create a template page and add an edit link to it. After the debate is over all these templates keep floating around and cannot be easily refactored because they are spread across hundreds of pages. While the problem of slow editing has been solved, the problem of slow loading still exists, and people continue to complain about it.
What are the alternatives? It has been suggested to link to the deletion debates on Talk:-subpages, but the problem with that approach is that it's no longer possible for those who *want to* to quickly view all ongoing debates on a single page.
Similar problems exist on other very long pages. The dialup users complain that they can't work with the pages properly, and the high bandwidth users complain that splitting everything up makes it hard to find and refactor.
By having a threshold defined in the user preferences, we can satisfy both parties without sacrificing usability as the template approach does. Now of course I agree that in many if not most cases, long pages should simply be split up, but as the examples of VfD (and the Village pump, and FAC, and - occasionally - RFA, and ...) show, this is not always an option. And where it is, it often takes weeks or months until someone goes to the effort.
I therefore predict that this feature will be much loved by all. The only possible negative side effect I can see is that people might be discouraged from refactoring long pages which will lead to larger revisions, but of course we should solve this problem by reorganizing revision storage, which is not urgent but should be done for 1.4 or 1.5 in any case, together with the schema redesign.
Regards,
Erik
Erik Moeller wrote:
Brion-
Then I don't understand the point at all. Use separate pages!
The feature was inspired by the current situation on [[Wikipedia:Votes for deletion]] (en:). The page became so long that modem users complained that they could no longer effectively work with it.
Ahhh, deletion. You're solving the wrong problem, then. When all you've got is a hammer...
A community deletion system with an automatically generated list linking to all deletion-nominated pages would seem to be the appropriate thing; this would eliminate the need to manually arrange a billion little debates; they could sit on the talk pages where they belonged, and the master list would manage itself.
-- brion vibber (brion @ pobox.com)
Brion-
A community deletion system with an automatically generated list linking to all deletion-nominated pages would seem to be the appropriate thing; this would eliminate the need to manually arrange a billion little debates; they could sit on the talk pages where they belonged, and the master list would manage itself.
Maybe, but you wild still have to solve the problem of showing all linked debates on the page. You can hardcode such a solution, but it will not be applicable to other cases, where things like refactoring are needed and you don't want a multitude of subpages (e.g. Village pump). I think the collapse feature is a good general solution that may well be complemented by other, more specific, functions later on.
Regards,
Erik
Erik Moeller wrote:
Brion-
Maybe, but you wild still have to solve the problem of showing all linked debates on the page.
No, because that's not a problem that needs to be solved. Most of these "debates" won't have changed at any given time, and showing them all makes it very very difficult to work through what's going on.
-- brion vibber (brion @ pobox.com)
Brion-
Maybe, but you wild still have to solve the problem of showing all linked debates on the page.
No, because that's not a problem that needs to be solved. Most of these "debates" won't have changed at any given time, and showing them all makes it very very difficult to work through what's going on.
When I go to VfD, I check for debates that are unusually long, or vere heterogenous (many deletes, many keeps). So I systematically search for controversial pages to see if any issue needs to be clarified. Others will likely search for pages where they can add "Me too" votes to seal the deal, etc. This kind of systematic browsing of VfD, as a matter of housekeeping, is only possible if all debates are visible on one page.
Regards,
Erik
I don't understand what's so hard about making VfD usable.
It is clear that some people want a short page with a list of links, which all go to Talk pages where the deletion is discussed, while other people want everything on one huge page.
So why not provide both of these pages?
The huge page can easily be created/administered by making use of the next version's ability to transclude pages from any namespace.
So, [[Wikipedia:Votes for deletion]] would look like this:
* [[Page name]] ([[Talk:Page name/delete|talk]]) * [[Asdfsdfsdf]] ([[Talk:Asdfsdfsdf/delete|talk]]) * [[Bush is gay]] ([[Talk:Bush is gay/delete|talk]]) * [[User:Idiot/Wikisex]] ([[User talk:Idiot/Wikisex/delete|talk]])
and [[Wikipedia:Votes for deletion (full)]] would look like this:
== [[Page name]] == {{Talk:Page name/delete}} == [[Asdfsdfsdf]] == {{Talk:Asdfsdfsdf/delete}} == [[Bush is gay]] == {{Talk:Bush is gay/delete}} == [[User:Idiot/Wikisex]] == {{User talk:Idiot/Wikisex/delete}}
Compared to the problems we have with our current design, the problem of keeping the two versions in sync seems marginal, and if someone cares enough, they can write a feature to automate it anyway.
Timwi
Brion Vibber wrote:
Erik Moeller wrote:
Brion-
Then I don't understand the point at all. Use separate pages!
The feature was inspired by the current situation on [[Wikipedia:Votes for deletion]] (en:). The page became so long that modem users complained that they could no longer effectively work with it.
Ahhh, deletion. You're solving the wrong problem, then. When all you've got is a hammer...
A community deletion system with an automatically generated list linking to all deletion-nominated pages would seem to be the appropriate thing; this would eliminate the need to manually arrange a billion little debates; they could sit on the talk pages where they belonged, and the master list would manage itself.
-- brion vibber (brion @ pobox.com)
Wikitech-l mailing list Wikitech-l@Wikipedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
I agree with Brion. The problem is that VfD in its current form is completely broken. The current system won't scale, and the plethora of transclusion pages is a worse cure than the original problem: at least when VfD got huge before, someone worked on it.
Brion's solution also sounds reasonable: a specialized solution for a (so far) specialized problem.
A generalization of this problem is all the other request queues for human action that will soon scale beyond the ability of people to cope with by mere editing, like cleanup, RC patrol, etc.
-- Neil
Neil Harris wrote:
A generalization of this problem is all the other request queues for human action that will soon scale beyond the ability of people to cope with by mere editing, like cleanup, RC patrol, etc.
I'll point out that "Votes for deletion" was originally a software feature. From any page you could go to the 'Vote for this page' special page, and add a comment which would be inserted in the lists on 'Votes for deletion', 'Votes for rewrite', 'Votes for NPOVing', and a couple others.
It was a little rough around the edges, but Lee didn't include the feature at all for phase 3 and the vote pages became utterly unmanageable manually. Votes for deletion survived into the present monstrosity, while the others eventually died of neglect.
-- brion vibber (brion @ pobox.com)
Brion Vibber wrote:
I'll point out that "Votes for deletion" was originally a software feature. From any page you could go to the 'Vote for this page' special page, and add a comment which would be inserted in the lists on 'Votes for deletion', 'Votes for rewrite', 'Votes for NPOVing', and a couple others.
It was a little rough around the edges, but Lee didn't include the feature at all for phase 3 and the vote pages became utterly unmanageable manually. Votes for deletion survived into the present monstrosity, while the others eventually died of neglect.
One of the other options allowed voting for a well written article. It provided the opportunity to say something good about an article.
Ec
Erik Moeller wrote:
I've also improved section editing/viewing behavior to load subsections within a section instead of just the mother section.
Where was this discussed/voted on/announced/etc.? I don't want this. I don't see how that is an "improvement". It only makes editing the first part of a section with loads of sub-sections more difficult.
Timwi
wikitech-l@lists.wikimedia.org