Re: [Wikitech-l] The big wikimedia network design/hardware discussion

19 Dec 2004

Ricky Beam wrote:

...
  It's quite
evident that wikimedia's current network is a mess.  
 It's always qualified as "a mess".  Every network ends up looking like
 Wiki's at some point.  No matter how well planned, documented, or
 managed, the wiring closet will eventually look like a huricane of
 cobwebs. 
Oh? My networks must be exceptions then.

...
  ... However,
noone seems to know what's on which
ports.  
 $79 switch or $100,000 switch... if no one documents what's plugged
 into which port (and where all the cables go), you won't know what's
 plugged into which port.  Comparing MAC addresses everytime you need
 to know where something is attached is very time consuming and error
 prone.  *Maintain* the documentation.  That's pretty easy as there's
 only one monkey movin' cables. 
Agree. However, certain switch features make documenting these things a 
lot easier, and therefor increase likelyhood of responsible persons 
maintaining it.

...
  ... The
interfaces to manage them seem rather limited (I don't know -
I don't have access to them)  
 Having never used the interface (it's web based), you *really* have
 no room to talk about it.  It works and allows people to do what
 they need to do.  No, the switches are not "network managed" --
 web interfaces alone don't count.  However, they are cheap and get
 the job done. 
Isn't that exactly what I stated?

...
  Really?  I've not seen anyone complaining about
them. (I've not been
 sitting on IRC for awhile, tho') 
Perhaps you should... I have.

...
  We're always going to run out of
 switch ports.  *I* run out of ports in my own living room -- 'tho
 a $5,000 48 port 10/100/1000 managed switch would be nice, I'll
 stick with the $100 8 port one's from Linksys/Netgear/D-link. 
Yes, but you are not wikimedia, and you don't have tens of servers in 
your bedroom, *AND* have to manage them from your living room without 
being able to access them physically.

...
  Remote management doesn't really help here.  Once
a machine has been
 installed (and thus cabled to the required networks), rarely does one
 need any cables moved.  The ability to virtually redesign the network
 is "neat", but ultimately most costly than useful.  And considering
 the largest switches wiki is likely to ever aford are 48port models,
 there will always be an issue of machines connected to different
 switches. (trunking adds latency.) 
Indeed. That's why it's part of the discussion.

And I certainly think vlans are a lot more useful than they are costly.

...
  This does
require switches that are more expensive than the current
ones, and it is rather hard to justify the cost for them.  
 And that's exactly why Wiki doesn't have a pair of Cisco 6513's.  They
 are unnecessary. 
Noone ever seriously talked about buying a 6513. We are talking about 
2948s, which can do a lot more and are only a bit more expensive than 
the dumb netgears only you seem to like.

...
  However, with
every new server and extra switch, remote manageability is
getting harder, and consequently, the network is getting a mess.  
 Remote managability doesn't have anything to do with the mess.  The mess
 is 100% human related -- near-constant, rapid, semi-haphazard planning.
 (Spiders tear down their web everyday and build a new one.  Networks
  are more like cities, where the new one is built into/over the old.) 
Exactly, that's exactly how it is now. Why not try to improve it? It 
clearly isn't working...

...
  And who's fault is the documentation?  That's
right, us monkeys moving
 cables without updating documentation. (we're all guilt of not labeling
 and documenting what we did in the closet.)  And as you've not used the
 managment interface, you don't know if can show the MAC's known on each
 port. (Which comes back to people... do the admins know they can match
 up MAC addresses to tell what's on/behind each port?) 
I beg your pardon. I may not have access myself, but I have talked to 
people who do (jeronim, kate, etc) a lot.

...
  * serial ports,
so we can manage them out of band through a console
server even if the network is down  
 Ah yes... one more set of ports to run out of. (which, btw, we are.) 
Indeed. Why not change THAT instead?!

...
  SNMP... always a good thing, but it's not a enough
of a justification
 for the cost. 
Not by itself, no. Together with the rest, yes.

...
  * spanning tree
(STP), especially helpful in large networks with remote
management  

 Do you even know what STP is and what it does?  Yes, it's helpful in
 *physically* large networks where you may not know you've created a
 loop.  If you cannot avoid creating loops in a network spanning two
 racks without spanning-tree, you need to put down the crimp tool, and
 go home; you're done. 
[snip]

You seem to underestimate me by a fair bit.

Spanning-tree is, aside of redundancy, /also/ useful because you can 
change parameters on one uplink, while working off the other. Serial 
console is better of course...

...
  * VLANs and
802.1Q support  
 the current switches are vlan capable. 
I never said they didn't. But we can't transition to using vlans 
remotely, in a safe way.

...
  * Diagnostic
information from the switch's console - port descriptions,
port statistics, port status, mac address information, vlan assignments,
error rates, etc  
 all (or almost all) available from the web management interface. 
Then I wonder why you didn't discover about the faulty network setup 
with the trunked links, some time ago. Surely it was all available from 
the web interface?

...
  * Syslog
logging, so we notice what's going on  
 Such as?  Switches generate almost no syslog traffic -- even when the network
 is coming apart.  Very nasty things have to be happening for the switch to
 begin complaining. (aside from link up/down messages which are of some, but
 limited, value.) 
Well I have solved quite a few nasty network problems just by looking at 
syslog messages from switches, but what do I know...

...
  * centralized
administration, so we don't have to manually copy
everything to each and every switch  
 ??? Each switch is independant.  They get configured from someone typing in
 the config.  Switches don't automatically clone themselves because they're
 next to another switch *grin*  What you are suggesting is a management
 system *cough*Ciscoworks*cough* that generally costs as much as the switches
 they manage.  As a network admin who's managed networks with dozens of
 cisco switches, no one uses the management system to manage the switches;
 we use the cli -- I even have scripts to send the same commands to a list
 of switches (and routers, and netblazers, anything with a telnet interface.)
 (And yes, I have shell scripts for SNMP archival of the running configs.) 
So you don't replicate things like accounts over RADIUS, vlan configs 
over VTP, etc? Your loss... I wasn't talking about Ciscoworks at all.

But, as I understand it you're going to write the scripts to do this 
over the web interfaces on the current switches? :) They don't have 
SNMP, they don't have a CLI either.

...
 >* upgradeable firmware with long term support

...
  Anything with a management interface is upgradable. 
And define "long term"?
 I see EOL/EOS notices from Cisco all the time.  As I recall, both the 4000
 and 6000 lines are EOL.  The 5000/5500 line is nearing the end of hardware
 and software support -- having been EOL and EOS years ago. 
I never said it'd have to be Cisco.

...
  * Port
trunking/aggregation, for high bandwidth or redundancy needs  
 Note: this is almost *always* proprietary and restrictive.  Yes, there
 are standards for this sort of thing, but cross-vendor trunking is still
 muddy water.  And for some *cough*Cabletron*cough*, the parallel links
 must be nearly the same distance (within 1m as I recall) -- yep, it's
 that damned sensitive. 
Which is just one of the reasons why I think we should stick with one 
vendor. Yes, I too would prefer open standards and compatibility, but 
that's just not where we are today.

And wikipedia *did* use trunked ports

...
  *
IGMP/multicast support, could be helpful on a large network too  
 multicast is 99.999% useless to Wikimedia. 
Almost, yes. Ganglia works over multicast IIRC. But I agree, almost 
useless, and therefor mentioned last.

...
  "Relatively cheap"?  100$ is a cheap switch.
 400$ is a relatively cheap
 switch.  5000$ is not cheap - period.  Yes, a 2948 is cheap_er_ than a
 fully loaded 6500, but it's still not cheap. 
A 2948G-GE-TX is approx. $3,500, while 2 netgears cost ~ $1,500.

...
  What's the functional differences between the 2948
and the existing netgear
 switches... 48 vs 24 ports.  And that's about it.  Both have remote management
 via web interfaces.  The cisco also has a cli (telnet, ssh, and serial.)
 The cisco has SNMP managability and syslog capability which is nice, but not
 necessary and not worth the extra 4000$. 
Extra $2,000, which makes it about twice as expensive. And yes, I 
certainly think that's worth it.

...
  We're already "stacking" switches.  That
stacking is not what you think
 it is -- what it used to be: a backplane extension.  Today's stack is
 merely a set of switches hung off another switch -- which is cascading. 
I know the difference between cascading and stacking, thank you...

...
  The 3000 series stack you point out is, in fact, a
cisco proprietary
 firewire interlink of the switches. (thus locking in cisco hardware for
 ever more.) 
Of course it's proprietary, but why is that a problem? It allows us to 
grow the "switch" along with the growing need for switch ports. And we 
can still cascade other vendor's switches, just like we do now.

...
  If you buy a "bis ass switch", you buy a
"bug ass support contract" to go
 with it.  It breaks; they fix it, *quickly*. (I've never seen an entire
 catalyst switch die.  I've seen or heard of every component short of the
 backplane failing... and everything except the backplane is easily and
 quickly replacable.) 
Fine with me, and that's why it's a listed option. I don't think it'll 
be chosen, however...

...
  This needs more
discussion...  
 this gets discussed all the time... 
Yes, but nothing ever seems to happen. And that's what I'm trying to 
change now, by request.

-- 
Mark

mark(a)nedworks.org

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] The big wikimedia network design/hardware discussion