On Sun, 19 Dec 2004, Mark Bergsma wrote:
It's quite evident that wikimedia's current network is a mess.
It's always qualified as "a mess". Every network ends up looking like Wiki's at some point. No matter how well planned, documented, or managed, the wiring closet will eventually look like a huricane of cobwebs.
... However, noone seems to know what's on which ports.
$79 switch or $100,000 switch... if no one documents what's plugged into which port (and where all the cables go), you won't know what's plugged into which port. Comparing MAC addresses everytime you need to know where something is attached is very time consuming and error prone. *Maintain* the documentation. That's pretty easy as there's only one monkey movin' cables.
... The interfaces to manage them seem rather limited (I don't know - I don't have access to them)
Having never used the interface (it's web based), you *really* have no room to talk about it. It works and allows people to do what they need to do. No, the switches are not "network managed" -- web interfaces alone don't count. However, they are cheap and get the job done.
... It's quite clear that nobody really likes these switches, and would like to buy other, better ones now we've run out of switch ports again.
Really? I've not seen anyone complaining about them. (I've not been sitting on IRC for awhile, tho') We're always going to run out of switch ports. *I* run out of ports in my own living room -- 'tho a $5,000 48 port 10/100/1000 managed switch would be nice, I'll stick with the $100 8 port one's from Linksys/Netgear/D-link.
... This way, admins are restricted to telling him what to do, whenever he has time for it and is on location. This really delays and complicates things, so I think it would be good to make sure we build a network makes REMOTE management as easy and flexible as possible, and keep required physical changes to a minimum.
Remote management doesn't really help here. Once a machine has been installed (and thus cabled to the required networks), rarely does one need any cables moved. The ability to virtually redesign the network is "neat", but ultimately most costly than useful. And considering the largest switches wiki is likely to ever aford are 48port models, there will always be an issue of machines connected to different switches. (trunking adds latency.)
This does require switches that are more expensive than the current ones, and it is rather hard to justify the cost for them.
And that's exactly why Wiki doesn't have a pair of Cisco 6513's. They are unnecessary.
However, with every new server and extra switch, remote manageability is getting harder, and consequently, the network is getting a mess.
Remote managability doesn't have anything to do with the mess. The mess is 100% human related -- near-constant, rapid, semi-haphazard planning. (Spiders tear down their web everyday and build a new one. Networks are more like cities, where the new one is built into/over the old.)
... It's barely documented, and they can't find out through the switch's management interfaces either.
And who's fault is the documentation? That's right, us monkeys moving cables without updating documentation. (we're all guilt of not labeling and documenting what we did in the closet.) And as you've not used the managment interface, you don't know if can show the MAC's known on each port. (Which comes back to people... do the admins know they can match up MAC addresses to tell what's on/behind each port?)
- serial ports, so we can manage them out of band through a console
server even if the network is down
Ah yes... one more set of ports to run out of. (which, btw, we are.)
- SNMP, so we can properly graph statistics of the switch itself, and
the individual ports. VERY helpful in case of problems...
SNMP... always a good thing, but it's not a enough of a justification for the cost.
- spanning tree (STP), especially helpful in large networks with remote
management
Do you even know what STP is and what it does? Yes, it's helpful in *physically* large networks where you may not know you've created a loop. If you cannot avoid creating loops in a network spanning two racks without spanning-tree, you need to put down the crimp tool, and go home; you're done.
You appear to be confusing "remote management" with "link redundancy." Spanning-tree allows one to have multiple physicals paths between switches while maintaining a single logical path between any two points. In effect, STP breaks loops: A connected to B connected to C connected back to A. And it allows for parallel links: A connected to B via 2 or more ports. STP does this by *blocking* one or more ports.
STP does not speed up the network by load sharing on parallel links. (Cisco calls that Etherchannel.) STP does not reduce latency. It's not OSPF for layer 2. STP is engineered to prevent loops; it does not find the best path between any two points -- 'tho one can spend months tuning port costs to drive STP to prefer specific links (I wouldn't do that unless I was really bored and tired of playng video games :-))
- VLANs and 802.1Q support
the current switches are vlan capable.
- Diagnostic information from the switch's console - port descriptions,
port statistics, port status, mac address information, vlan assignments, error rates, etc
all (or almost all) available from the web management interface.
- Syslog logging, so we notice what's going on
Such as? Switches generate almost no syslog traffic -- even when the network is coming apart. Very nasty things have to be happening for the switch to begin complaining. (aside from link up/down messages which are of some, but limited, value.)
- centralized administration, so we don't have to manually copy
everything to each and every switch
??? Each switch is independant. They get configured from someone typing in the config. Switches don't automatically clone themselves because they're next to another switch *grin* What you are suggesting is a management system *cough*Ciscoworks*cough* that generally costs as much as the switches they manage. As a network admin who's managed networks with dozens of cisco switches, no one uses the management system to manage the switches; we use the cli -- I even have scripts to send the same commands to a list of switches (and routers, and netblazers, anything with a telnet interface.) (And yes, I have shell scripts for SNMP archival of the running configs.)
- upgradeable firmware with long term support
Anything with a management interface is upgradable. And define "long term"? I see EOL/EOS notices from Cisco all the time. As I recall, both the 4000 and 6000 lines are EOL. The 5000/5500 line is nearing the end of hardware and software support -- having been EOL and EOS years ago.
- Port trunking/aggregation, for high bandwidth or redundancy needs
Note: this is almost *always* proprietary and restrictive. Yes, there are standards for this sort of thing, but cross-vendor trunking is still muddy water. And for some *cough*Cabletron*cough*, the parallel links must be nearly the same distance (within 1m as I recall) -- yep, it's that damned sensitive.
- IGMP/multicast support, could be helpful on a large network too
multicast is 99.999% useless to Wikimedia.
Wikimedia also needs SOME layer 3 and layer 4 features, but these are less important, and generally MUCH more expensive, so I don't think we can really justify to do this using switch hardware:
...
We could build the network out of a nice, decent core switch (possibly two for redundancy), and multiple, relatively cheap access switches to connect servers (for example, Cisco 2948G-GE-TX).
"Relatively cheap"? 100$ is a cheap switch. 400$ is a relatively cheap switch. 5000$ is not cheap - period. Yes, a 2948 is cheap_er_ than a fully loaded 6500, but it's still not cheap.
What's the functional differences between the 2948 and the existing netgear switches... 48 vs 24 ports. And that's about it. Both have remote management via web interfaces. The cisco also has a cli (telnet, ssh, and serial.) The cisco has SNMP managability and syslog capability which is nice, but not necessary and not worth the extra 4000$.
could build a large virtual switch by stacking multiple smaller ones
We're already "stacking" switches. That stacking is not what you think it is -- what it used to be: a backplane extension. Today's stack is merely a set of switches hung off another switch -- which is cascading. The 3000 series stack you point out is, in fact, a cisco proprietary firewire interlink of the switches. (thus locking in cisco hardware for ever more.)
Redundancy is something we need to think about. Of course we can buy one big and expensive switch, but what if it breaks? With multiple cheaper switches, it's more feasible to have one or two on spare.
If you buy a "bis ass switch", you buy a "bug ass support contract" to go with it. It breaks; they fix it, *quickly*. (I've never seen an entire catalyst switch die. I've seen or heard of every component short of the backplane failing... and everything except the backplane is easily and quickly replacable.)
This needs more discussion...
this gets discussed all the time...
Large vendors (Cisco, et.al.) are much more likely to donate gear to tax deductable charities. Wiki isn't one, yet. I have suggested talking to Cisco about getting some hardware donated -- cisco has alot of reclaimed hardware (from trade-ups) and referbished goodies. I don't think anyone would balk at paying for a support contract (2-3k$) for a donated 100k$ switch. It's a good tradeoff.
--Ricky