I discussed some ideas a few days ago on #wikimedia-tech, here's a summary email about that.
overall goals: - preserve all wikipedia functionality - make it more resistant to scaling forces - highly modular - reduce interdependence - identify core "must have" functionality - push everything else out to the edge as far as possible - cache as much as possible at every level - design caching into the system
Five logical components I'm focusing on: - the article (all articles = the content) - a content cache - authentication server(s) - UI server(s) - squid cache(s)
As I understand the architecture today, several of these functions are currently being performed monolithically by the appservers... so in part I'm proposing a refactoring where the core functions (article, content cache, authentication) are protected from traffic by a ring of "defenses" in the UI servers and squid caches. Here's a breakdown of each layer:
ARTICLE - for storing the content The unit of content is a 3-tuple: {wikitext, red coloured links, templates} - each time I am edited: --- change my content --- if I'm a new/moved article, change colour of links in articles that reference me --- if I'm a template, change each article that uses me
- goal: --- when I'm edited, propagate those changes as *efficiently* as possible to my fellow articles --- insist that when I'm changed, directly or indirectly, I am only read *once* by the Content cache --- insist that I'm only changed by the authentication server
CONTENT CACHE - for caching articles for browsing - goal: --- I only hit an article *once* for each change to that article --- no one else ever *reads* from the articles but me
AUTHENTICATION SERVER - for authenticating users for editing - goal: --- I'm only involved when you have to be *certain* of a user's ID --- that is, first log-in, and when they submit an edit --- no one else ever *writes* to the articles but me (once I've ID'd the user)
UI SERVER - for serving up HTML pages - goal: --- for browsing, I read from content cache, add user dressing, and serve --- for submitting edits, I send them to authentication server
tricks: - could get into tricks with javascript, IFRAMEs, whatever to push work farther to the edge - could create a distributed UI server system that can be replicated and run by universities, etc.
SQUID CACHE - for especially non-logged in users - goal: --- remove browsing load from the UI server
wikitech-l@lists.wikimedia.org