New subject: The big wikimedia network design/hardware discussion

19 Dec 2004


      It's quite evident that wikimedia's current network is a mess. We have 
three rather dumb (but cheap) netgear gigabit switches, that offer some 
manageability features. However, noone seems to know what's on which 
ports. The interfaces to manage them seem rather limited (I don't know - 
I don't have access to them), and the features they have lack those to 
build a large network. It's quite clear that nobody really likes these 
switches, and would like to buy other, better ones now we've run out of 
switch ports again.
As projected server count for wikimedia in dec 2005 is about 500 
servers, it's time to start properly planning the design of the network. 
While complexity of the network increases, remote manageability becomes 
more important. Most admin duties happen remotely, while only Jimbo has 
physical access to the actual hardware. This way, admins are restricted 
to telling him what to do, whenever he has time for it and is on 
location. This really delays and complicates things, so I think it would 
be good to make sure we build a network makes REMOTE management as easy 
and flexible as possible, and keep required physical changes to a minimum.
This does require switches that are more expensive than the current 
ones, and it is rather hard to justify the cost for them. Technically, 
wikimedia projects CAN run on cheap, unmanageable switches, since they 
DO the most important part of the job: switching, at gigabit speeds. 
However, with every new server and extra switch, remote manageability is 
getting harder, and consequently, the network is getting a mess. Admins 
don't know exactly what's on a port. It's barely documented, and they 
can't find out through the switch's management interfaces either. In 
case of network problems, there are hardly any graphs, logs or other 
sources of information to find out what's going on. The current setup is 
feasible when one has < 24 ports, but it gets really messy when the 
network grows...
I think we need at least layer 2 switches with basic manageability 
features. Basic as in what's basic in any medium to large company 
network these days. Some features we really could use are:
* serial ports, so we can manage them out of band through a console 
server even if the network is down
* SNMP, so we can properly graph statistics of the switch itself, and 
the individual ports. VERY helpful in case of problems...
* spanning tree (STP), especially helpful in large networks with remote 
management
* VLANs and 802.1Q support. Allows one set of switches to be used for 
multiple virtual LANs, and allows for more flexible and cost effective 
use of resources, remotely WITHOUT changes to the physical network setup
* Diagnostic information from the switch's console - port descriptions, 
port statistics, port status, mac address information, vlan assignments, 
error rates, etc
* Syslog logging, so we notice what's going on
* centralized administration, so we don't have to manually copy 
everything to each and every switch
* upgradeable firmware with long term support
* Port trunking/aggregation, for high bandwidth or redundancy needs
* IGMP/multicast support, could be helpful on a large network too
While the current netgear switches do have a few of the features 
mentioned above, it's all too limited, too restricted, and too non 
standard to be useful in a large network.
Wikimedia also needs SOME layer 3 and layer 4 features, but these are 
less important, and generally MUCH more expensive, so I don't think we 
can really justify to do this using switch hardware:
Layer 3 routing. While we (intend to) have at least two different 
vlans/networks, an external and an internal, some traffic needs to be 
routed/NATed between them. This does NOT involve actual wikimedia client 
traffic, but it does involve some traffic needed for management of the 
servers, like retrieving software updates, sending mails etc. This won't 
be a lot of traffic, and we could do this using NAT on some server, or 
for example on an LVS loadbalancing box (more on this later).
Layer 4 load balancing. Currently, load balancing between squid boxes 
happens through multiple DNS A-records, and this clearly isn't optimal. 
A true load balancing would be a lot better. There are layer 4 switches 
that support load balancing to some extent, but these are generally VERY 
expensive, $10,000 and higher. A cheaper, and probably more flexible 
alternative is a setup using multiple redundant Linux LVS (Linux Virtual 
Server) boxes. Hashar and a friend of his who has experience using LVS 
for large clusters are preparing a presentation on this.
Firewalling. I personally don't think we really need this, especially 
not once all fundamentally internal servers are on an internal vlan, but 
some think it could be useful to have a central firewall, and a layer 
3/4 switch could do this.
Personally, I think it would be good to build the wikimedia network on 
proper layer 2 switches that can support switching in a large network 
with decent manageability. Layer 3 and up (routing/NAT, load balancing, 
firewalling if needed) we can do using a redundant LVS cluster, which 
hashar is working on. This has the benefit that we don't have to spend 
overly excessive amounts of money on proprietary hardware stuff, and 
still get a very flexible and cost effective solution, with as much free 
software involved as reasonably possible.
Also important I think, is that we choose a vendor that can offer us a 
full range of products from low end to as high end as we will ever need. 
  When you have a large network, it's not feasible to work with many 
different interfaces, command sets, features and terminology on each 
switch, you'd rather want it to be reasonably consistent among different 
switches and product ranges.
We could build the network out of a nice, decent core switch (possibly 
two for redundancy), and multiple, relatively cheap access switches to 
connect servers (for example, Cisco 2948G-GE-TX). Alternatively, we 
could build a large virtual switch by stacking multiple smaller ones 
(for example, out of Cisco 3750s), but we might run against a stacking 
limit there, and these switches are generally quite a bit more expensive 
than non stackable ones. We could even build up the entire network out 
of just one very big, and very expensive modular switch which has 
hundreds of ports, but this will be very hard to make redundant 
(although these switches are pretty redundant of themselves...), and 
also involves a big initial investment.
Redundancy is something we need to think about. Of course we can buy one 
big and expensive switch, but what if it breaks? With multiple cheaper 
switches, it's more feasible to have one or two on spare.
Some servers have larger dependencies between them in terms of low 
latency and high bandwidth than others. This obviously needs to be taken 
into account while designing the network.
This needs more discussion...
I mentioned Cisco examples here, but that's only because I personally 
have experience with them, they have a whole line of product ranges, and 
prices are readily available. Of course, many other good switch vendors 
exist (Foundry, HP, Extreme, Nortell, etc...), and many could provide us 
with equivalent products we need. We have to look for alternatives as 
well...
It would especially be helpful if someone could get one of the major 
network hardware vendors to donate network hardware to us, but I think 
if that would happen, it would have to be a donor/partner for the long 
term, and not just for a single donation. We can't build a large and 
consistent network out of single, uncoordinated donations.
Comments, please!
(we could transfer this to a wiki if that's helpful...)
-- 
Mark

mark@nedworks.org