UCS Reverse Path Forwarding and Deja-Vu checks

UCS Fabric Interconnects are usually always run in end-host mode.  At this point in the story there really isn’t that many reasons to use switch-mode on the Fabric Interconnects.

Two checks, or features that make End Host Mode possible are Reverse Path Forwarding (RPF) checks and Deja-Vu checks.

RPF and Deja-Vu (from Cisco.com)

Reverse Path Forwarding Checks

Each server in the chassis is pinned dynamically (or you can set up pin groups and do it statically, but I don’t recommend that) to an uplink on Fabric Interconnect A and Fabric Interconnect B.  Let’s say you have 2 uplinks on port 31 and 32 of your Fabric Interconnect.  Server 1/1 (chassis 1 / blade 1)  may be pinned to port 31.  If a unicast packet is received for server 1/1 on uplink port 31, it will go through.  But if that same packet destined for server 1/1 is received on port 32, it will be dropped.  That’s because RPF checks to see if the destination for the unicast is actually forwarding its uplink traffic through that link.

Deja Vu Checks

The other check is called “Deja-Vu” .  In the Cisco documentation it says: “Server traffic received on any uplink port, except its pinned uplink port is dropped“.  That sounds a lot like RPF.  Another presentation from Cisco live states it this way: “Packet with source MAC belonging to a server received on an uplink port is dropped

An example to clear it up

VM A on server 1/1 wants to talk to VM B located somewhere else.  The Fabric Interconnects in this case are connected to a single Nexus 5500 switch.  The VM is pinned to one of the VNICs and that VNIC is pinned to go out port 31 of Fabric Interconnect A.  So what happens?

First the VM will send an ARP request.  An ARP request basically says:  I know the IP address but I want the MAC address.  (Obviously, this is in the same Layer 2 VLAN and subnet).  If Fabric Interconnect A doesn’t find the IP/MAC association in its CAM table, then it will not flood the server ports down stream.  That is something a switch would do.  The Fabric Interconnect is different.  The reason the Fabric Interconnect doesn’t send a broadcast down its server ports is because it is a source of truth and knows everyone connected on its server ports.

What it will do instead is forward the ARP request (unknown unicast) up the designated uplink (port 31).  Now the Nexus switch is a switch.  (And a very good one at that).  It will say:  ”Hey, I don’t have a CAM table entry for VM B IP/MAC so I will do what we switches do best:  Flood all the ports! (except the port that the unknown unicast/ARP request came in on)

Remember Fabric Interconnect A port 32 is connected to this same switch as port 31 where the unknown unicast (ARP request) went out.  The Nexus 5500 will send this unknown unicast to port 32 just like every other port.  But port 32 says:  Wait a minute, the source address originated from me.  Deja-vu!  So he drops the packet.

Fabric Interconnect B has two ports 31 and 32 that will also receive the unknown unicast.  If VM B is pinned to a VNIC that is pinned to port 31 on Fabric Interconnect B, he will say:  I got this!  And the packet will go through.  Port 32, however on FI-B will look at the destination MAC and say:  This is not pinned to me, so I’ll drop the packet.  That is the RPF check.

To sum it up

Deja-Vu check:  don’t receive a packet from the upstream switch that originated from me.

Reverse Path Forward Check:  don’t receive a packet if there’s no server pinned to this uplink.

Backing up UCS

Backing up UCS can be a little confusing especially since it presents you a few options.  What you may be expecting is something simple like a one button easy “Back it up” button.  But in fact, that is not the case.  And the nice thing about it is there are lots of different things you can do with backup files.
From the Admin Tab under All in UCS Manager, under the general tab, you select “Backup Configuration”

But now, we have a few choices as to how we set this up.  Now you create a backup operation

Then you are presented the below screen and now things get a little bit complicated.

Let’s go through some of these seemingly confusing options:

Admin State

This is a bit confusing.  But here’s how to think about it:  If you want to run the backup now, right this second, when you click “OK” and don’t want to wait, select “Enabled”.  Most of the time this is what you want.  If instead, you just want to save this backup operation, so that you can click it on the Backup operations list and do it, then set disabled.

Type

There are 4 different configurations that can be backed up by UCS.  All of them deal with data that lives in the Fabric Interconnect.  They are illustrated in the diagram below

 

The brim of the triangle is the Full State.  This is a binary file that can be used to backup on any system to restore the settings that this Fabric Interconnect has.  Its different than all the other types.  Its the only one that can be used for system restore.  This is usually fun to backup off your own system.  I haven’t tried putting it into the platform emulator yet, but it might be fun to try.

The three other backups are just XML files.  They’re useful for importing into other systems.  The “All Configuration” is just a fancy way of saying “System Configuration” and “Logical Configuration”.  It does both.

The System Configuration is user names, roles, and locals.  This is useful if you are installing another UCS somewhere and you want to keep the same users and locales (if you are using some type of multi-tenancy) but in that case, why aren’t you using UCS Central?  Try it, its free for up to 5 domains.  And you can do global service profiles.

The Logical Configuration is all the pools and policies, service profiles, service profile templates you would expect to be backed up.  This is pretty good to put inside the emulator to fool around with different settings you are using.  Or, if you don’t have your UCS yet and you’re waiting to order it, then you can just create the pools and policies in the emulator.  Then when the real thing comes, import the logical configuration in and you are ready to rock.

The tricky button that shows up when you select the All Configuration or the Logical Configuration is the label:  Preserve Identities. This is only on logical and all configurations because it has to do with making service profiles that are already mapped to pools retain their mapping.  This is good if you’re going to move some service profiles from one fabric interconnect domain to another and want to keep the same setup.  Otherwise, it doesn’t really matter to keep those identities.

The other options presented for how you want to back up the system is pretty self explanatory.  You can either back this up to your local machine or some other machine that has another service running like SSH, TFTP, etc.

After you’ve created a backup operation, the nice thing is that it saves it for you in a backup operations list.  When you want to actually do it, just select it, then hit admin enable and it will perform the backup.

Performing Routine Periodic Backups

But wait you say, what if I want it to periodically backup itself?

Well, that’s where you move to the next tab which is the Policy Backup & Export

Here you have the option of backing up just the binary system restore button, or the all-configuration.  The all configuration is good for backing up XML files just in case some administrator accidentally changes a bunch of configs on you.

Here you can see, My XML and binary files will be backed up every day.  (That may be a little more than you need, as things don’t usually change so much in most environments, but hey, now you have it, use it.)

When it saves to those remote files you’ll get a timestamp on the name:

full-backup.bin.2013-07-28T22-55-11.555
all-config.xml.2013-07-28T22-57-11.559

So that’s backing up the system and all the ways it can be done.  There’s a few nerd nobs, but I wanted to make sure I understood it.

The last thing to cover is import operations.  Its important to understand that you can do two different types:  A merge or replace.  With merge, if you have a MAC pool called A and it has 30 MACs already, a merge will add the new MACs to it.  (So if there are 20 in the import, you will now have 50).  With replace, you’ll now just have 20.  You can only merge XML files.

Lastly, all of this information is found here in the latest  UCS GUI Configuration Guide It was nice to gain a more solid understanding of it.  Backing up is something I go over briefly in some of my tech days I do, but this flushes it out a little better if there are any further questions.

Thanks for reading!

 

 

Nexus 1000v – A kinder gentler approach

One of the issues skeptical Server Administrators have with the 1000v is that they don’t like the management interface being subject to a virtual machine.  Even though the 1000v can be configured so that if the VSM gets disconnected/powered-off/blownup the system ports can still be forwarded.  But that is voodoo.  Most say:  Give me a simple access port so I can do my business.

I’m totally on board with this level of thinking.  After all, we don’t want any Jr. Woodchuck network engineer to be taking down our virtual management layer.  So let’s keep it simple.

In fact!  You may not want Jr. Woodchuck Networking engineer to be able to touch your production VLANs for your production VMs.  Well, here’s a solution for you:  You don’t want to do the networking, but you don’t want the networking guy to do the networking either.  So how can we make things right?  Why not just ease into it.  The diagram below, presents, the NIC level of how you can configure your ESXi hosts:

Here, is what is so great about this configuration.  The VMware administrator can use things “business as usual” with the first 6 NICs.

Management A/B teams up with vmknic0 with IP address 192.168.40.101.  This is the management interface and used to talk to vCenter.  This is not controlled by the Nexus 1000v.  Business as usual here.

IP Storage A/B teams up with vmknic1 with IP address 192.168.30.101. This is to communicate with storage devices (NFS, iSCSI).  Not controlled by Nexus 1000v.  Business as usual.

VM Traffic A/B team up.  This is a trunking interface and all kinds of VLANs pass through here.  This is controlled either by a virtual standard switch or using VMware’s distributed Virtual Switch.  Business as usual.  You as the VMware administrator don’t have to worry about anything a Jr. Woodchuck Nexus 1000v administrator might do.

Now, here’s where its all good.  With UCS you can create another vmknic2 with IP address 192.168.10.101.  This is our link that is managed by the Nexus 1000v.  In UCS we would configure this as a trunk port with all kinds of VLANs enabled over it.  This can use the same VNIC Template that the standard VM-A and VM-B used.  Same VLANs, etc.

(Aside:  Some people would be more comfortable with 8 vNICs, Then you can do vMotion over its own native VMware interface.  In my lab this is 192.168.20.101)

The difference is that this IP address 192.168.10.101 belongs on our Control & Packet VLAN.  This is a back end network that the VSM will communicate with the VEM over.  Now, the only VM kernel interface that we need to have controlled by the Nexus 1000v is the 192.168.10.101 IP address.  And this is isolated from the rest of the virtualization stack.  So if we want to move a machine over to the other virtual switch, we can do that with little problem.  A simple edit of the VMs configuration can change it back.

Now, the testing can coexist on a production environment because the VMs that are being tested are running over the 1000v.  Now you can install the VSG, DCNM, the ASA 1000v, and all that good vPath stuff, and test it out.

From the 1000v, I created a port profile called “uplink” that I assign to these two interfaces:

port-profile type ethernet uplink
vmware port-group
switchport mode trunk
switchport trunk allowed vlan 1,501-512
channel-group auto mode on mac-pinning
no shutdown
system vlan 505
state enabled

By making it a system VLAN, I make it so that this control/packet VLAN stays up. For the vmknic (192.168.10.101) I also created a port profile for control:

port-profile type vethernet L3-control
capability l3control
vmware port-group
switchport mode access
switchport access vlan 505
no shutdown
system vlan 505
state enabled

This allows me to migrate the vmknic over from being managed by VMware to being managed by the Nexus 1000v. My VSM has an IP address on the same subnet as vCenter (even though its layer 3)

n1kv221# sh interface mgmt 0 brief

——————————————————————————–
Port VRF Status IP Address Speed MTU
——————————————————————————–
mgmt0 — up 192.168.40.31 1000 1500

Interestingly enough, when I do the sh module vem command, it shows up with the management interface:

Mod Server-IP Server-UUID Server-Name
— ————— ———————————— ——————–
3 192.168.40.102 00000000-0000-0000-cafe-00000000000e 192.168.40.102
4 192.168.40.101 00000000-0000-0000-cafe-00000000000f 192.168.40.101

On the VMware side, too, it shows up with the management interface: 192.168.40.101

Even though I only migrated the 192.168.10.101 vmknic over.

This configuration works great.  It provides a nice opportunity for the networking team to get with it and start taking back control of the access layer.  And it provides the VMware/Server team a clear path to move VMs back to a network they’re more familiar with if they are not yet comfortable with the 1000v.

Let me know what you think about this set up.

Change a Fabric Interconnect into a Nexus Switch

I got a Nexus 5010 from our spare parts department.  When I booted it up, lo and behold, it thought it was a UCS Fabric Interconnect 6020!

As most people know the 6120XP is the same hardware as the Nexus 5010.  Only difference is that its spray painted green.  Well this particular model I got was gray and said it was a Nexus 5010.  So I was bound and determined to get it back.  I got pretty close, and wanted to write down the steps I took.

I’m sad to say, however, that I didn’t get it to work all the way.

Here’s what I did:

Step 1. Get TFTP server setup (This explains how to do it for a MacBook Pro)
I’m running Mountain Lion OSX. Turns out there is a default tftp server installed with it. Getting it running is pretty easy. Just run:

sudo launchctl load -F /System/Library/LaunchDaemons/tftp.plist

(Turning it off is done with:

sudo launchctl unload -F /System/Library/LaunchDaemons/tftp.plist

)
(To see if its running run:

sudo launchctl list | grep tftp,

if you see output its running, if not, its not)

From there you need to put the files you need into the /private/tftpboot/

I went to Cisco’s support page and easily found two files:
n5000-uk9.5.2.1.N1.5.bin < the software file
and
n5000-uk9-kickstart.5.2.1.N1.5.bin < the kickstart file

I had to copy them with sudo since you’re going into a privileged directory.

You should test your tftp server to make sure it works. No use yelling at the Nexus 5000 for telling you it can’t access the file.

From the command prompt on the mac:

cd ~/Desktop
tftp localhost
get n5000-uk9.5.2.1.N1.5.bin

If that works, you are in business.

Step 2. Load the Nexus 5000 (that thinks its a 6100) into the loader prompt.

When the machine started booting, I had to do

Ctrl+Shift+R

right as it was loading the UCS kickstart file. Doing this got me to a
loader>

prompt.

From here, we don’t have a lot of options. But all we need to do is set the mgmt0 interface and kickstart from our Nexus image that we have on tftp.
(Incidently, at this point I ran the dir command to see if there were any nexus images, and there wasn’t! Only UCS images. )

Here’s how we set that:

loader> set ip 192.168.1.99 255.255.255.0

Then it confirmed that this was good. Now, to load up the kickstart file:

loader> boot tftp://192.168.1.234/private/tftpboot/n5000-u9-kickstart.5.2.1.N1.5.bin
Address: 192.168.1.99
Netmask: 255.255.255.0
Server: 192.168.1.234
Gateway: 0.0.0.0
Booting: /private/tftpboot/n5000-uk9-kickstart.5.2.1.N1.5.bin console=ttyS0,960
0n8nn

the system then boots up. Does some image verification and loads into a boot prompt:

Cisco Nexus Operating System (NX-OS) Software
TAC support: http://www.cisco.com/tac
Copyright (c) 2002-2013, Cisco Systems, Inc. All rights reserved.
The copyrights to certain works contained in this software are
owned by other third parties and used and distributed under
license. Certain components of this software are licensed under
the GNU General Public License (GPL) version 2.0 or the GNU
Lesser General Public License (LGPL) Version 2.1. A copy of each
such license is available at
http://www.opensource.org/licenses/gpl-2.0.php and

http://www.opensource.org/licenses/lgpl-2.1.php

switch(boot)#

Step 3: Copy files and continue booting

Now we just need to get the files on the device.

switch(boot)# con t
switch(boot)(config)# inter mgmt 0
switch(boot)(config-if)# ip address 192.168.1.99 255.255.255.0
switch(boot)(config-if)# no shutdown
switch(boot)(config-if)# exit
switch(boot)(config)# exit
switch(boot)# copy tftp: boot flash:
switch(boot)# copy tftp: bootflash:
Enter source filename: /private/tftpboot/n5000-uk9.5.2.1.N1.5.bin
Enter hostname for the tftp server: 192.168.1.234
Trying to connect to tftp server……
Connection to server Established. Copying Started…..

At this point I went downstairs and had some chips to eat. I got back and had to wait like 15-20 min for it to copy. Shesh! Finally, when I was about to cancel it, I saw:

TFTP get operation was successful
Copy complete, now saving to disk (please wait)…

Now we need to get the kickstart file:

switch(boot)# copy tftp://192.168.1.234/n5000-uk9-kickstart.5.2.1.N1.5.bin boot flash:

So I waited some more, this one didn’t take as long.

Then I deleted a bunch of UCS files:

switch(boot)# delete bootflash:ucs-6100-k9-system.4.0.1a.N2.1.0.1036.gbin
switch(boot)# delete bootflash:cisco_nexus_1000v_certificate.pem
switch(boot)# delete bootflash:ucs-6100-k9-kickstart.4.0.1a.N2.1.0.1036.gbin
switch(boot)# delete bootflash:ucs-6100-k9-kickstart.4.0.1a.N2.1.0.1056d.gbin
switch(boot)# delete bootflash:ucs-6100-k9-system.4.0.1a.N2.1.0.1056d.gbin
switch(boot)# delete bootflash:ucs-manager-k9.1.0.0.1036.gbin
switch(boot)# delete bootflash:ucs-manager-k9.1.0.0.1056d.gbin

Then I booted the image:

switch(boot)# load n5000-uk9.5.2.1.N1.5.bin

This set me to the boot prompt again. So I hit exit:

boot switch(boot)# exit

It kept rebooting to stored images of UCS manager. So I found this command:

init system check-filesystem

From here, I repeated the operation of downloading the 2 Nexus images.  At least now it didn’t boot up into UCS Fabric Interconnect, but I could never get it to go to regular Cisco Nexus 5010.  It may be that there was something wrong with the hardware.  It certainly looks a little beat if you look at this hardware.  If nothing else, I learned a little more about the boot files in the Nexus 5000.

Cisco UCS East-West Traffic Performance.

The worst thing you can do in tech is claim something positive or negative about some technology without anything to back it up.  Ever since UCS was first brought to market, other blade vendors have been quick to point out any flaw they can find.  This is mostly because their market share of the x86 blade space has been threatened and in some cases (IBM & Dell) surpassed by UCS.

One of the claims that I’ve heard while presenting UCS is that the major flaw with the architecture makes switching between to blades inferior to the legacy architectures that other hardware vendors use.  You see, (they told me) in order for one UCS blade to communicate to another UCS blade you have to leave the chassis, go into the Fabric Interconnects (that could be all the way at the top of rack, or even in another rack), and then come back into the chassis.  This must take an eternity.

Network traffic from one blade to another in the same chassis is called “East-West” traffic because the traffic doesn’t leave the chassis.  (Picture it going sideways) where as “Nort-West” traffic is network traffic that leaves the chassis and goes out to some other end point that doesn’t reside in the chassis.  The widely held belief was that UCS was a a huge disadvantage here.

After all, every other blade chassis on the market has network switches that sit inside the chassis and *must* be able to perform faster than UCS.  For a while now, I’ve wondered how much latency that adds.  Because, frankly, I thought the same way they did.  Surely the internal wires must be faster than twinax cables.

But science, that pesky disprover of legacy traditions and beliefs, has finally come to settle the argument.  And in fact has turned the argument on its head.  The east-west traffic inside UCS is faster than the legacy chassis.

The full blog can be read here.  There’s a link to a few great papers on this site that show how the measurements done.

Plus one for the scientific method!

OpenStack Summit 2013 Food Recommendations

I’m really looking forward to the OpenStack Summit 2013 conference next week.  I have my schedule blocked off to be able to soak in as much information as I can.

Being as I live in Portland, I thought I’d put out a few recommendations of places I like to eat in case you’re around since some people asked me.  Yes, I’m probably leaving off tons of stuff.  The food carts, the grilled cheese grill, but hey, I just wanted to put together a quick list.  Feel free to invite me.

Breakfast

Waffle Window – I’ll have ice cream on my waffles for breakfast.  Thanks

Quick Lunch

Por que No – Really good carne asada tacos.  Two locations.

Bunk Sandwiches – Super good.  No space to eat inside but great to grab a great Sandwich.

Kenny and Zuke’s – People love this place.  I think its pretty good.  Big sandwich.

Dinner or Big lunches

Asian Style

Bamboo Sushi – Awesome sushi and kobe beef hamburgers.  Love this place.  Get both.

Lucky Strike – Never been here, but here its fantastic.

Italian & Mediterianian Style

Acena – Never been here but here its amazing.

Serrato - Looks good.  Can’t remember if I ate here or not.  I think I did.

Apizza Scholls – Probably the best pizza in Portland or the world.

Western Asianish

Marrakesh – Belly dancers?  Eating on the floor?  Eat with your fingers?  Yes.

East India Co. – Even my Indian friends admit you can’t even get Indian food this good in India.

Portlandish

Paley’s – Want to know the name of the chicken you are eating?  Where it grew up?

Screen Door – Southern Cuisine.  Loved it.  Don’t remember much more than that.

Castagna – Northwest Cuisine.  Good hamburgers.  Also Pigs feet if you like that too.

Up North

If you are staying in Vancouver and don’t mind a quick trip east, check out LaPella.  Pretty good.  My wife and I ate there 2 weeks ago.

Looking forward to seeing everyone!

Hacking UCS Manager to get pictures

I was reading the API for UCS manager the other day (hey, everybody has a hobby right?) and I found out a pretty cool place where the Java UCS Manager downloads the picture files.  I still haven’t found all the files (like the Fabric Interconnects and the Chassis, and IOMs) but most of the server models are found this way.  Substitute your UCS Manager IP address into the script below and it will download the pictures of the blades.  I wish I would have known this before I gathered pictures for UCS Tech Specs as these are great pictures.

#!/bin/bash
IP=10.93.234.241/pictures
wget http://$IP/blade/B230.png
wget http://$IP/blade/B230.png
wget http://$IP/blade/B440.png
wget http://$IP/blade/Blade_full_width_front.png
wget http://$IP/blade/Blade_full_width_front.png
wget http://$IP/blade/Blade_half_width_front.png
wget http://$IP/blade/Blade_half_width_front.png
wget http://$IP/blade/Blade_half_width_front_marin.png
wget http://$IP/blade/Blade_half_width_front_marin.png
wget http://$IP/blade/SfBlade.png
wget http://$IP/blade/SfBlade.png
wget http://$IP/blade/sequoia_front.png
wget http://$IP/blade/sequoia_top.png
wget http://$IP/blade/silver_creek_front.png
wget http://$IP/blade/silver_creek_top.png
wget http://$IP/blade/ucs_b200_m3_front.png
wget http://$IP/blade/ucs_b200_m3_top.png
wget http://$IP/fi/switch_psu_DC.png
wget http://$IP/rack/Alameda_1_front.png
wget http://$IP/rack/Alameda_1_top.png
wget http://$IP/rack/Alameda_2_front.png
wget http://$IP/rack/Alameda_2_top.png
wget http://$IP/rack/Alpine_M2.png
wget http://$IP/rack/Alpine_M2_front.png
wget http://$IP/rack/C220M3_front_small.png
wget http://$IP/rack/C220M3_top.png
wget http://$IP/rack/C420_front.png
wget http://$IP/rack/C420_internal.png
wget http://$IP/rack/SD1_Gen2_front.png
wget http://$IP/rack/SD1_Gen2_front.png
wget http://$IP/rack/SD1_Gen2_internal.png
wget http://$IP/rack/SD1_Gen2_internal.png
wget http://$IP/rack/san_mateo_front.png
wget http://$IP/rack/san_mateo_internal.png
wget http://$IP/rack/sl2_front.png
wget http://$IP/rack/sl2_front.png
wget http://$IP/rack/sl2_top.png
wget http://$IP/rack/sl2_top.png
wget http://$IP/rack/st_louis_1u_front.png
wget http://$IP/rack/st_louis_1u_top.png
wget http://$IP/rack/st_louis_2u_front.png
wget http://$IP/rack/st_louis_2u_top.png

Use Bash to teach kids to program

I’m going to try to teach my 9 year old to program this weekend.  My first thought was to do scratch, and that seems kind of good, but I think BASH might just be a great place to go as well.

Bash is found on the MAC and Linux which comprises every computer in our house.  (we have a few windows VMs here and there but nothing we use).  And bash is fun cause you can get familiar with the command line.  We’ll start by using pico maybe?  And do a simple hello world program first:

Program #1:

#!/bin/bash
echo “Hello World!”

That’s pretty easy, but the next fun thing is to make it ask and answer a question:

Program #2:

#!/bin/bash

echo “What is your favorite color?”
read color
echo “${color} is a nice color”

 

Program #3:

#!/bin/bash

echo “What is your favorite color?”
read color
echo “${color} is a nice color”
if [ "$color" == "red" ]
then
echo ${color} is my favorite!
fi

Program #4

#!/bin/bash
echo “Can you guess my favorite color?”
while true
do
read color
if [ "$color" == "red" ]
then
echo “You guessed it! ${color} is my favorite color!”
break
else
echo “Nope. $color is not my favorite color. Guess again!”
fi
done

We’ll probably make it ask a few other questions and then some other cool things. Might be fun!

VIFS in a UCS environment

First of all you may be asking if you stumbled upon this page:  ”What is a VIF?”.  A VIF is a Virtual interface.  In UCS, its a virtual NIC.
Let’s first examine a standard rack server.  Usually you have 2 ethernet ports on the mother board itself.  Now days, the recent servers like the C240 M3 have 4 x 1GbE onboard interfaces.  Some servers even have 2x10GbE onboard NICs.  That’s all well and good and easy to understand because you can see it physically.
Now let’s look at a UCS blade.  You can’t really see the interfaces because there are no RJ-45 cables that connect to the server.  Its all internal.  If you could see it physically, then you’d see that you could add up to 8x10Gb physical NICs per half width blade.  Just like a rack mount server comes with a fixed amount of PCI slots, a blade has built in limits as well.  But Cisco blades work a little different.  Really, there are 2 sides:  Side A and Side B, each with up to 4x10GbE physical connections.  And those 4x10GbE are port channeled together, so it looks like one big pipe depending on what cards you put in there.
With these two big pipes (that are between two 10Gb and two 40Gb) we create virtual interfaces over these that are presented to the operating system.  That’s what a VIF is.  These VIFs can be used for some really interesting things.
VIF Use Cases
  1. It can be used to present NICs to the operating system.  This makes it so that the operating system thinks it has a TON of real NICs.  The most I’ve ever seen though is 8 NICs and 2 Fibre Channel adapters.  (Did I mention that Fibre Channel counts as a VIF?)  So 10 is probably the most you would use with this configuration.
  2. It can be used to directly attach virtual machines with a UCS DVS.  This is also one version of VM-Fex.  Here, UCS Manager acts as the Virtual supervisor and the VMs get real hardware for their NICs.  They can do vMotion and all that good stuff and remain consistent.  I don’t see too many people using this, but the performance is supposed to be really good.
  3. It can be used for VMware DirectPath IO.  This is where you tie the VM directly to the hardware using VMware DirectPath IO bypass method.  (Not the same as the UCS Distributed Virtual Switch I mentioned above.)  The advantage UCS has is that  you typically cannot do vMotion when you do VMware DirectPath IO.  With UCS, you can!
  4. USNIC (future!!!)  Unified NIC is where we can present one of these virtual interfaces directly to user space and create a low latency connection in our application.  This is something that will be enabled in the future on UCS, but it means we dynamically create these and can hopefully get latencies around 2-3 microseconds.  This is great for HPC apps and I can’t wait to get performance data on this.
  5. USNIC in VMs.  (future!!!)  This is where a user space application running in a VM will have the same latency as a physical machine.  That’s right.  This is where we really get VMs doing HPC low latency connections.
So now that we know the use cases, how can you tell how many virtual interfaces or VIFs you have for each server?  Well, it depends on the hardware and the software.  You see, they all allow for growth, but some instances have limitations.  So that’s what I’m hoping to explain below.
UCS Manager Limitations and Operating Systems Limitations
For 2.1 this is found here.  For other versions of UCS manager, just search for “UCS 2.x configuration limits”.
The Maximum VIFS per UCS domain today is 2,000

The document above also shows that for ESX 5.1 its 116 per host.  The document references UPT and PTS.
UPT – Uniform Pass Thru (this is configured in VMware with direct Path IO, use case 3 as I mentioned above)
PTS – Pass through Switching (this is UCS DVS, or use case 2 as I mentioned above)
Fabric Interconnect VIF Perspective
Let’s look at it from a hardware perspective.  The ASICs used on the Fabric Interconnects determine the limits as well.
6200
The UCS Fabric Interconnect 6248 uses the “Carmel” Unified Port Controller.  There is 1 “Carmel” port ASIC for every 8 ports.  So ports 1-8 are part of the first Carmel ASIC, etc.  In general, you want the FEX (or IO Module) connected to the same Carmel.
Each Carmel ASIC allows 4096 VIFs which are equally divided into all 8 switch ports.  Therefore, 512 VIFS per port.  Since one of those VIFs is dedicated to the CIMC, that gives 511 VIFS per port.  Consider that there are 8 slots in each chassis, so you would further divide that up between the 8 blade slots, so that’s 64 max in each slot.  Some are reserved, so it ends up being 63 VIFs per slot. That’s why the equation ends up being 63*n – 2 (2 are used for management)
Cisco Fabric Interconnect 6200
Uplinks Per FEX Number of VIFs per slot
1 61
2 124
4 250
8 502
6100
The 6100 uses the Gatos port controller ASIC.  There are 4 ports managed per Gatos ASIC.
Each Gatos ASIC allows 512 VIFs or 128 VIFS per port.  (512 VIFs per ASIC / 4 ports).  Each of those 4 ports gets divided by the 8 slots.  So, 128 / 8 = 16.  However, some of those are reserved, so it ends up being only 15 VIFs per slot.   That’s why the equation of VIFs per server is 15*n – 2  (the 2 are used for management)
Cisco Fabric Interconnect 6100
Uplinks per FEX Number of VIFS per slot
1 13
2 28
4 58
8 118 (obviously requres 2208)
VIFs from the Mezz Card Perspective
The M81KR card supports up to 128 VIFs.  So you can see from above that with the 6100 and 2104/2204/2208 its not the bottle neck.
The VIC 1280 which can be placed into the M1 and M2 servers can do up to 256 VIFs.
Hopefully that clarified VIFs a little and where the bottle necks are.  Its important to note as well that I/O modules don’t limit VIFs.  They’re just passthrough devices.

Teaching Kids to Program

I get asked a lot from different parents about teaching their kids to write computer programs.  ”What is a good way to get started?” , “How did you get into it?”.  As my oldest child is now 9 I’ve been frequently asking myself the same question.  I feel it is very important that young people know how to write code.  I feel that years from now people will look back on those who couldn’t write basic computer programs the same way we look back to those who can’t write a simple letter.

Much of my thinking has been confirmed and augmented by a Ted Talk I watched this week by Mitch Resnick.  In his talk, he affirms that just because people can code doesn’t mean we expect them to all be professional computer scientists or developers.  We don’t expect all people who learn how to write to become novelists or journalists.  Its just a basic skill that is needed in our day and age.

With his program “Scratch” that him and his team has made I think I’ve found the answer I was looking for.  I got home last night and downloaded it onto our family iMac.  It sits right in the kitchen and got my 9 year old and 6 year old started on it.  We started out with a picture of a “sprite”, or in our case, the default picture of a kitten.  We then created “controls” such as: “When I press the spacebar”.  Then underneath the control we did things like “change color” or move 10.  (the 10 is 10 pixels, but kids don’t really know that yet).  My kids would then keep pressing the space bar.  That’s when we introduced the “Forever” loop to them.  Amazing!  In just a quick 10 min, they understood loops and making things happen.

I’m hoping to do more with this and my kids.  I don’t want them to think of computer programming as dry and boring, but rather a creative medium for doing really cool things.  I am thankful for the people at MIT for making this possible.