It's quite evident that wikimedia's current network is a mess. We have
three rather dumb (but cheap) netgear gigabit switches, that offer some
manageability features. However, noone seems to know what's on which
ports. The interfaces to manage them seem rather limited (I don't know -
I don't have access to them), and the features they have lack those to
build a large network. It's quite clear that nobody really likes these
switches, and would like to buy other, better ones now we've run out of
switch ports again.
As projected server count for wikimedia in dec 2005 is about 500
servers, it's time to start properly planning the design of the network.
While complexity of the network increases, remote manageability becomes
more important. Most admin duties happen remotely, while only Jimbo has
physical access to the actual hardware. This way, admins are restricted
to telling him what to do, whenever he has time for it and is on
location. This really delays and complicates things, so I think it would
be good to make sure we build a network makes REMOTE management as easy
and flexible as possible, and keep required physical changes to a minimum.
This does require switches that are more expensive than the current
ones, and it is rather hard to justify the cost for them. Technically,
wikimedia projects CAN run on cheap, unmanageable switches, since they
DO the most important part of the job: switching, at gigabit speeds.
However, with every new server and extra switch, remote manageability is
getting harder, and consequently, the network is getting a mess. Admins
don't know exactly what's on a port. It's barely documented, and they
can't find out through the switch's management interfaces either. In
case of network problems, there are hardly any graphs, logs or other
sources of information to find out what's going on. The current setup is
feasible when one has < 24 ports, but it gets really messy when the
network grows...
I think we need at least layer 2 switches with basic manageability
features. Basic as in what's basic in any medium to large company
network these days. Some features we really could use are:
* serial ports, so we can manage them out of band through a console
server even if the network is down
* SNMP, so we can properly graph statistics of the switch itself, and
the individual ports. VERY helpful in case of problems...
* spanning tree (STP), especially helpful in large networks with remote
management
* VLANs and 802.1Q support. Allows one set of switches to be used for
multiple virtual LANs, and allows for more flexible and cost effective
use of resources, remotely WITHOUT changes to the physical network setup
* Diagnostic information from the switch's console - port descriptions,
port statistics, port status, mac address information, vlan assignments,
error rates, etc
* Syslog logging, so we notice what's going on
* centralized administration, so we don't have to manually copy
everything to each and every switch
* upgradeable firmware with long term support
* Port trunking/aggregation, for high bandwidth or redundancy needs
* IGMP/multicast support, could be helpful on a large network too
While the current netgear switches do have a few of the features
mentioned above, it's all too limited, too restricted, and too non
standard to be useful in a large network.
Wikimedia also needs SOME layer 3 and layer 4 features, but these are
less important, and generally MUCH more expensive, so I don't think we
can really justify to do this using switch hardware:
Layer 3 routing. While we (intend to) have at least two different
vlans/networks, an external and an internal, some traffic needs to be
routed/NATed between them. This does NOT involve actual wikimedia client
traffic, but it does involve some traffic needed for management of the
servers, like retrieving software updates, sending mails etc. This won't
be a lot of traffic, and we could do this using NAT on some server, or
for example on an LVS loadbalancing box (more on this later).
Layer 4 load balancing. Currently, load balancing between squid boxes
happens through multiple DNS A-records, and this clearly isn't optimal.
A true load balancing would be a lot better. There are layer 4 switches
that support load balancing to some extent, but these are generally VERY
expensive, $10,000 and higher. A cheaper, and probably more flexible
alternative is a setup using multiple redundant Linux LVS (Linux Virtual
Server) boxes. Hashar and a friend of his who has experience using LVS
for large clusters are preparing a presentation on this.
Firewalling. I personally don't think we really need this, especially
not once all fundamentally internal servers are on an internal vlan, but
some think it could be useful to have a central firewall, and a layer
3/4 switch could do this.
Personally, I think it would be good to build the wikimedia network on
proper layer 2 switches that can support switching in a large network
with decent manageability. Layer 3 and up (routing/NAT, load balancing,
firewalling if needed) we can do using a redundant LVS cluster, which
hashar is working on. This has the benefit that we don't have to spend
overly excessive amounts of money on proprietary hardware stuff, and
still get a very flexible and cost effective solution, with as much free
software involved as reasonably possible.
Also important I think, is that we choose a vendor that can offer us a
full range of products from low end to as high end as we will ever need.
When you have a large network, it's not feasible to work with many
different interfaces, command sets, features and terminology on each
switch, you'd rather want it to be reasonably consistent among different
switches and product ranges.
We could build the network out of a nice, decent core switch (possibly
two for redundancy), and multiple, relatively cheap access switches to
connect servers (for example, Cisco 2948G-GE-TX). Alternatively, we
could build a large virtual switch by stacking multiple smaller ones
(for example, out of Cisco 3750s), but we might run against a stacking
limit there, and these switches are generally quite a bit more expensive
than non stackable ones. We could even build up the entire network out
of just one very big, and very expensive modular switch which has
hundreds of ports, but this will be very hard to make redundant
(although these switches are pretty redundant of themselves...), and
also involves a big initial investment.
Redundancy is something we need to think about. Of course we can buy one
big and expensive switch, but what if it breaks? With multiple cheaper
switches, it's more feasible to have one or two on spare.
Some servers have larger dependencies between them in terms of low
latency and high bandwidth than others. This obviously needs to be taken
into account while designing the network.
This needs more discussion...
I mentioned Cisco examples here, but that's only because I personally
have experience with them, they have a whole line of product ranges, and
prices are readily available. Of course, many other good switch vendors
exist (Foundry, HP, Extreme, Nortell, etc...), and many could provide us
with equivalent products we need. We have to look for alternatives as
well...
It would especially be helpful if someone could get one of the major
network hardware vendors to donate network hardware to us, but I think
if that would happen, it would have to be a donor/partner for the long
term, and not just for a single donation. We can't build a large and
consistent network out of single, uncoordinated donations.
Comments, please!
(we could transfer this to a wiki if that's helpful...)
--
Mark
mark(a)nedworks.org