I'm replying to this wikipedia-l post in wikitech-l, it's more relevant here.
Brion Vibber wrote:
I'd been waiting on Tim's in-progress code to compare. Apparently there's not really anything much of that left (his work mostly transmogrified into the templatelinks temple) so I'm poking at Magnus's code now.
Salvatore's moderation feature was implemented in a similar way to Magnus' one, in that it used an extra revision ID field in the page table to point to the relevant version. Salvatore's used parameters passed back to Revision to determine whether page_latest or page_verified should be used, whereas Magnus's code operated mainly at the UI level, redirecting to a page with an oldid parameter, IIRC.
Neither of them had the structure required for efficient caching, that is, page/tag retrieval instead of page/revision retrieval. The basic problem is that tugela, which we are now using instead of memcached, has no efficient means for identifying and purging expired keys. In fact at the moment, this garbage collection is not done at all. To limit the growth of the cache under these circumstances, it's better to index the parser cache by page and tag, rather than page and revision ID. I thought that the best way to implement a tag concept, to merge Magnus's and Salvatore's features while minimising MySQL index space, would be to put the tag information in its own table.
Then there's the problem of template and link colour changes. I posted to wikitech-l about that before. Magnus's suggestion of storing the wikitext with the templates expanded at save time is a quite reasonable solution.
I stopped working on Salvatore's moderation feature when Brion implemented the semi-protection proposal put forward by the English Wikipedia. It was quite redundant -- Salvatore's feature was a form of semi-protection, a more complicated one than the one that the Wikipedians were supporting. I was even working on integrating it into the protection UI when Brion rewrote that part of the code. At that stage I still hadn't addressed the caching issue. So I salvaged what I could of my code branch (mostly the templatelinks table), and abandoned the feature. I wasn't interested enough in the stable version feature to keep working on the backend.
Perhaps the simplest solution at the moment is to put Magnus's feature live (after the necessary code cleanup), and put up with the lack of caching for a while. We've still got a bit of spare hardware capacity haven't we? The request rate for stable versions should be lower than it would have been for verified revisions. If I understand it correctly, stable revisions are not displayed by default, verified revisions would have been.
-- Tim Starling
Neither of them had the structure required for efficient caching, that is, page/tag retrieval instead of page/revision retrieval. The basic problem is that tugela, which we are now using instead of memcached, has no efficient means for identifying and purging expired keys. In fact at the moment, this garbage collection is not done at all.
It is actually done. I've purged already tugela caches because of some weird behavior when more than several gigabytes of cache are used. I'm wondering whether to return back to memcached or have some tugela improvements. We do not need gazilibytes of cache at the moment, and at 27GB or so we already have really high rates of hits.
It could be quite feasible to maintain a time index inside tugela and use it for some fifo purging. The main important thing now for site performance is to have good response times in all tiers.
BR, Domas
Tim Starling wrote:
Salvatore's moderation feature was implemented in a similar way to Magnus' one, in that it used an extra revision ID field in the page table to point to the relevant version. Salvatore's used parameters passed back to Revision to determine whether page_latest or page_verified should be used, whereas Magnus's code operated mainly at the UI level, redirecting to a page with an oldid parameter, IIRC.
My current implementation does the template expansion at the moment of setting the current version, and stores that text in its own database table. It then checks if the requested version is a stable one, and replaces the article text with the stored text.
Note that this is not the fastest way to do it, but it allows for * multiple stable versions (deactivated by default, allowing only one stable version per article) * other types of "stablilty", like stable, reviewed, thoroughly reviewed, really really reviewed, etc. :-) Both these properties were requested at some time.
Perhaps the simplest solution at the moment is to put Magnus's feature live (after the necessary code cleanup), and put up with the lack of caching for a while. We've still got a bit of spare hardware capacity haven't we? The request rate for stable versions should be lower than it would have been for verified revisions. If I understand it correctly, stable revisions are not displayed by default, verified revisions would have been.
After some hints from Brion yesterday, I changed a few things. There's probably lots more, though. :-) Brion also reported a strange effect on next/prev version links; I see that too, but it doesn't change when I turn the extension off, so the effect might not be related to the feature.
A few nice things are also still missing, like marking the stable version in the page history. Also, maybe the text should display in the sidebar instead of the sitenotice space. But the core functionality is there.
Magnus
Hi Tim, Hi Domas, Hi Magnus,
Thanks for your response about the stable version feature. How about first activating it in the German Wikipedia only to get some real life tests? Like any new feature it's not fully clear how it will be used in the community. In the German Wikipedia at least there is a strong need and a direction because of WP1.0. Categories had not been perfect from the beginning neither.
Greetings and happy coding :-) Jakob
wikitech-l@lists.wikimedia.org