For over 20 years, network designers and administrators have been assembling networks that, more or less, look like the diagram below. You have workgroup or top of rack (ToR)/end of rack (EoR) switches at the “Access Tier,” aggregation or core switches in the “Distribution Tier,” and then core switches that connect it all together. If you only have one core switch, and it fails, the entire network must as well be down because you don’t send any traffic outside of your workgroup (i.e. your department) or server rack.
This was all well and good when, in the server room/data centre, every server had its own connection to the nearest switch, which fed into an aggregation switch, and then into a core switch and back down the layers again in the case of server-to-server or server-to-workstation traffic. Servers and workstations rarely moved, and therefore network configurations were fairly stagnant. But with server virtualization taking hold in companies all over the world, a lot of traffic no longer travels this way.
Virtual servers/machines (VMs) have increased the amount of traffic coming from, and travelling to, each server. That means that while many switches rarely ever reached their peak throughput in the past, that throughput is no longer enough to account for 15-20 VMs all running on a single physical server, the amount of VM-to-VM traffic occurring, and so on. A 1Gbps 24-port or 48-port switch isn’t enough anymore. It certainly doesn’t have the uplink speed to handle the VM-to-VM traffic, let alone a VM that needs to move to another physical server in order to get access to more resources (CPU, RAM, etc).
If you really let loose and allow full automation and orchestration to happen on a three-tier architecture, you will quickly find bottlenecks that have very expensive solutions.
So, what to do?
Step 1 – Increase throughput capacity.
10Gbps ToR/EoR switches are quickly becoming the norm. Not only do they have greater capacity, but newer switches from the likes of Extreme Networks, Cisco, Juniper, Arista Networks, HP, and Brocade also have a lot more intelligence built into them. Port profiles are automatically migrated to the new switch VMs end up being connected to if they move to a new physical server. Virtual networks can be easily defined without relying on physical ports configurations.
Step 2 – Turn on those features.
It’s no fun having toys you can’t play with. If you’ve just refreshed your data centre switches, whether it was just the ToR/EoR switches or the core switches as well, you must begin taking advantage of all the new functionality available to you. Developers put hours and hours of effort into crafting beautiful code that will help reduce latency, and deliver applications faster and more securely. Are you really going to let them down by not using that code?!
Step 3 – Eliminate a tier.
This is one way to dramatically cut down on network latency. The classic Distribution Tier has now been replaced with additional intelligence in what was the Access Tier. Now the Access Tier should be connected directly to both (yes, both) core switches. That core switches with far more ports than in the past, but it’s really for the best. You end up with a “flatter” network (two tiers instead of three), a 10Gbps or 40Gbps core, and greater resiliency at the physical layer since each ToR/EoR switch has connections to each core switch.
What you end up with is a data centre network that looks a lot more like this:
Forgive me for using a vendor-specific image. I’m a huge fan of Extreme Networks, but am not being compensated in any way for using this image. In fact, it’s from one of Extreme’s British resellers.
So what about all those bells and whistles I alluded to? TRILL, DCB, virtual networks? I’m going to begin covering those in the next blog entry.