I discussed some ideas a few days ago on #wikimedia-tech, here's a
summary email about that.
overall goals:
- preserve all wikipedia functionality
- make it more resistant to scaling forces
- highly modular - reduce interdependence
- identify core "must have" functionality
- push everything else out to the edge as far as possible
- cache as much as possible at every level - design caching into the
system
Five logical components I'm focusing on:
- the article (all articles = the content)
- a content cache
- authentication server(s)
- UI server(s)
- squid cache(s)
As I understand the architecture today, several of these functions
are currently being performed monolithically by the appservers... so
in part I'm proposing a refactoring where the core functions
(article, content cache, authentication) are protected from traffic
by a ring of "defenses" in the UI servers and squid caches. Here's a
breakdown of each layer:
ARTICLE
- for storing the content
The unit of content is a 3-tuple: {wikitext, red coloured links,
templates}
- each time I am edited:
--- change my content
--- if I'm a new/moved article, change colour of links in articles
that reference me
--- if I'm a template, change each article that uses me
- goal:
--- when I'm edited, propagate those changes as *efficiently* as
possible to my fellow articles
--- insist that when I'm changed, directly or indirectly, I am only
read *once* by the Content cache
--- insist that I'm only changed by the authentication server
CONTENT CACHE
- for caching articles for browsing
- goal:
--- I only hit an article *once* for each change to that article
--- no one else ever *reads* from the articles but me
AUTHENTICATION SERVER
- for authenticating users for editing
- goal:
--- I'm only involved when you have to be *certain* of a user's ID
--- that is, first log-in, and when they submit an edit
--- no one else ever *writes* to the articles but me (once I've ID'd
the user)
UI SERVER
- for serving up HTML pages
- goal:
--- for browsing, I read from content cache, add user dressing, and
serve
--- for submitting edits, I send them to authentication server
tricks:
- could get into tricks with javascript, IFRAMEs, whatever to push
work farther to the edge
- could create a distributed UI server system that can be replicated
and run by universities, etc.
SQUID CACHE
- for especially non-logged in users
- goal:
--- remove browsing load from the UI server
--
http://simonwoodside.com