Hi Rotem,
Rotem Dan wrote:
(single database may be dangerous in case it fails and
the whole site
breaks down).
Well, I'm neither a webserver expert nor a database expert, but I know
how LiveJournal does it (and I've mentioned it here in the past). They
use a number of independent webservers, and then put a "load balancer"
in front of that. This load balancer receives the requests from the
clients and distributes them evenly over the webservers. This way, if
one webservers crashes, the site will only be marginally slower until
this server is restarted. LiveJournal also has redundant load balancers
in case one of *those* crashes.
As for the database, they use MySQL replication. That means there are
several databases actually containing the same data; one of them is the
"master" (which is where data is written), and these writes are then
broadcast out to the other DBs, the "slaves". This way DB reads can be
even distributed among several servers. This, in turn, means you don't
need to keep buying bigger and better hardware, only more of it (and the
old and crap hardware can still contribute to handling some of the load).
They (LiveJournal) have even written an independent Perl module
(DBI::Role) which handles all of this (distributing DB reads across
several slaves, and even weighting them according to their performance).
Now, LiveJournal actually also use clustering, i.e. they distribute the
users' journals across several clusters (each of which has an own master
and several slaves). I'm yet undecided whether it's a good idea to do
the same with Wikipedia languages. Doing so would make multi-language
watchlists difficult. However, not doing so may make the architecture
not scalable enough. I'm not knowledgeable enough to judge this.
As for the wiki syntax parser, I wouldn't want to
rewrite
that! (I hate parsers!)
This is actually the part I'm looking forward to! (Since, as mentioned
before, I love regexps ;-) )
For a new interface, I suggest you to look at my much
thought-given
example on
http://www22.brinkster.com/rotemdan/phase4-demo-v1-1.htm
Well, no. I wasn't planning to design or integrate a new interface. I
was going to create a skin system that will allow people to create skins
liberally without needing to code. Then you could create your skin
yourself :-)
Long live CSS!
Well, looking at what many sites have achieved with CSS (including but
not limited to LiveJournal's Xcolibur scheme, which isn't yet the
default), it does seem quite impressive. However, there are a few
(probably minor) reasons I hate CSS, including but not limited to the
incapability of simply placing something at the bottom of the browser
window regardless of the main text flow, or centering a table. Also,
making a site layout browser-independent seems to be even more difficult
with CSS than with outdated HTML 2 (in my experience, anyway).
* New language files
Na-ah! No language files in /my/ implementation ;-) My translation
system has everything in the DB. Translators can change the translatable
texts using a web interface. (I specifically decided against applying
the Wiki philosophy to this because things like the "Edit this page"
link text apply to /every/ page on the /entire/ site, so it would be too
easy to upset a whole bunch of people by changing it to something
offensive.)
Another thing about language files: they shouldn't
contain whole pages!
Yes, I thought the same -- and have planned to make longer page contents
(like the explanation on "Upload file" you mentioned) wiki-like, so they
are actually the contents of the page titled "Special:Upload" (or
whatever it would be in other languages).
and not even a bit of HTML formatting!
Well, actually, I'd really rather keep /some/ simple HTML formatting
in-place. Sometimes you want to bold a single word in a sentence, but
bolding looks really ugly in Chinese and Japanese, so /they/ prefer
/not/ to do it. So I keep the simplest of formatting in, so translators
have at least /some/ control over it. (I am, at least, planning to use
HTML in translatable strings /much/ less than LiveJournal does now; they
have a /lot/ of HTML in their strings, and sometimes even BML, which is
their own mark-up language which nobody knows.)
Also: switching interface language using the
preferences (dynamically),
regardless of language of articles read/edited.
Yep, planned. Also a URL parameter to force the interface in a
particular language.
warning: that will also require the whole site to move
to UTF-8 or
similiar encoding, so I don't know if that's possible with all languages..
I am not aware of any language which has characters that are not
included in Unicode, and I'm pretty certain none of the Wikipedia
languages do. Of course, my implementation would simply use UTF-8 all
the way through, thus avoiding all annoying encoding problems.
(Well, not all of them; I've already handled the case that someone may
request a URL which contained invalid UTF-8; I assume Latin-1 in this
case, convert and redirect. But, of course, articles and all interface
stuff would be entirely UTF-8.)
Greetings,
Timwi