QoS and Jumbo Frames on the Nexus 5500

Nexus 5548 UP

 

I’ve had the fortunate opportunity to have two Nexus 5548UPs in my lab to help test upgrade problems for one of my customers.  Its been great to have some gear to play with and really try to understand how it all works together.

One of the issues I’ve run up against in the past (and you may have too) is configuring jumbo frames on network switches.  Jumbo frames enable more bytes to be sent more efficiently through the data center.  The default maximum transmission unit size (MTU) on nearly all networks and server NICs is 1500 bytes.  That means if you want to send more traffic, then you need to send more frames.  When you increase the packet size to 9000 bytes then you send less headers, less frames, and more data.  The result is that it is supposed to decrease CPU load.  The adverse effects are that you may end up with higher latency and you may end up configuring a lot of end points.  That can get really complicated.

As an example of configuring jumbo frames in a data center consider all the endpoints that has to be configured:

  • Operating System:  The NIC must be set to MTU 9000.  On VMware, the vSwitch has to be set to this as well as the VMs if they will be supporting jumbo frames.
  • On UCS, the vNIC has to be mapped to a jumbo frames QoS policy.
  • The ports on the network switch must have jumbo frames enabled on the uplink.
  • The ports on the storage controller must have jumbo frames enabled.
  • The storage network interfaces must have jumbo frames configured.

So you can see there is a great deal of orchestration between several teams in the data center.  Everybody has to know what they are doing.

You can test if your jumbo frames are enabled on Linux by sending a simple ping:

1
ping -M do -s 8972 <node>

If that goes through without errors, then congratulations!  You have jumbo frames enabled from node to node.

Jumbo Frames on the Nexus 5500

There is a guide on Cisco’s web page that talks about enabling Jumbo frames.  But to do it, you have to do things like policy maps and class maps.  I’ve often thought:  Why is this so hard to do?  It seems like just an easy command would be more sufficient.

The reason goes back to the standard trade offs engineers make.  “Make it Easy” vs. “Make it Flexible”.   You can’t really have flexibility without more nerd knobs to turn.  Most of this is probably more applicable to application traffic.

The other problem I always think of:  Why can’t you just apply the MTU to the interface?  With the Nexus 5500 you don’t typically set it to the interface.  But can you?  Kind of.

QoS Class Map and Policy Maps

The architecture of Nexus 5500s is a bit different than that of Catalyst switches.  The buffering is done more on the ingress ports.  (The ports where traffic enters).  As such the first thing you’ll want to do is “tag” or “classify” the traffic that comes into the port.  You do this with a class-map command of type QoS.  How do you identify traffic?

The most common way to match traffic is with either an IP Access List, or by the protocol.  Let’s do it:

1
2
3
4
5k-top# conf
Enter configuration commands, one per line. End with CNTL/Z.
n5k-top(config)# class-map myTraffic
n5k-top(config-cmap-qos)# match protocol iscsi

Here we’re just matching iSCSI traffic. That’s one that we might want to do Jumbo Frames on. But we could also do something for IP addresses. Let’s say that all hosts on the 172.20.0.0/24 network should have jumbo frames. That would make sense if this network was for storage (NFS, iSCSI or whatever).

To do that we would use an access list:

1
2
n5k-top(config)# ip access-list jumbo-list
n5k-top(config-acl)# permit ip 127.20.0.0/24 any

Now we can put that on our QoS group:

1
2
3
4
n5k-top(config-acl)# class-map type qos myTraffic
n5k-top(config-cmap-qos)# no match protocol iscsi
n5k-top(config-cmap-qos)# match access-group name jumbo-list
n5k-top(config-cmap-qos)# sh class-map type qos myTraffic
1
2
3
4
Type qos class-maps
===================
class-map type qos match-all myTraffic
match access-group name jumbo-list

 

Great! Now we need to put this into one of the qos groups that we can work on. There’s a lot more we can do, but for simplicity, we’ll just put this into qos group 2.  That is a good place to be since this traffic could be NFS or iSCSI and may be more important than normal traffic.  We need to put it in a qos group because there are 2 other QoS types that we haven’t talked about, and those QoS types classify by qos-group numbers.   Here’s the other two:

network-qos: This is the QoS type that allows for Jumbo Frames (mtu), multicast, pause-no drop, and other settings that show how a packet gets through the network.  The shape of it and how it reacts.  The others do more for what happens when a packet enters or leaves the switch.

queuing: This is the last type of QoS and does things like allocate how much bandwidth certain traffic gets as it goes through the network.  By default, 100 percent of the bandwidth goes to the default class.

So now we know:  We have 3 types of QoS settings and each of these settings requires a class-map (to tag the traffic) and a policy-map: what to do with all the traffic when it comes in.  With policy-maps on each of these classes, you’ll associate multiple class-maps with behaviors.  Finally, once we have these policy-maps for each of the three types of classes, we assign this to the system QoS.

Let’s finish the qos type.  So far we’ve only made a class-map for it.  But now, we want a policy-map.  We may have three types of traffic:  Default traffic, myTraffic, and fcoe.  FCoE and the default traffic are there “by default”.  So let’s make a policy-map with those traffic types:

1
2
3
4
5
n5k-top(config-cmap-nq)# policy-map type qos myQoS-policy
n5k-top(config-pmap-qos)# class type qos myTraffic
n5k-top(config-pmap-c-qos)# set qos-group 2
n5k-top(config-pmap-c-qos)# class type qos class-fcoe
n5k-top(config-pmap-c-qos)# set qos-group 1

There, now we have three traffic lanes marked. The default was put in there for us. Check it out:

1
n5k-top(config-pmap-c-qos)# sh policy-map type qos myQoS-policy

Type qos policy-maps
====================

policy-map type qos myQoS-policy
class type qos myTraffic
set qos-group 2
class type qos class-fcoe
set qos-group 1
class type qos class-default
set qos-group 0

Ok, now we want to set our jumbo frames. That means we need to create a network-qos type of QoS. First we have to mark what we want. Our only options are to classify by qos-groups for the network-qos type of QoS. So let’s create a class-map:

1
2
n5k-top(config-cmap-que)# class-map type network-qos myTraffic
n5k-top(config-cmap-nq)# match qos-group 2

Notice that I kept the name the same in the network-qos type of QoS as I did for the qos type of QoS. This makes things a bit easier. Now we just need to create a policy-map using this class-map, as well as the fcoe class-map (the fcoe class-map is created by default)

1
2
3
4
5
6
7
n5k-top(config-cmap-nq)# policy-map type network-qos myNetwork-QoS-policy
n5k-top(config-pmap-nq)# class type network-qos myTraffic
n5k-top(config-pmap-nq-c)# mtu 9216
n5k-top(config-pmap-nq-c)# class type network-qos class-fcoe
n5k-top(config-pmap-nq-c)# pause no-drop
n5k-top(config-pmap-nq-c)# mtu 2158
n5k-top(config-pmap-nq-c)# sh policy-map type network-qos myNetwork-QoS-policy

Type network-qos policy-maps
===============================

policy-map type network-qos myNetwork-QoS-policy
class type network-qos myTraffic

mtu 9216
class type network-qos class-fcoe

pause no-drop
mtu 2158
class type network-qos class-default

mtu 1500
multicast-optimize

Ok! 2 down, 1 to go. The queueing policy. We just need to split these loads. First we need to mark the traffic. Again, we can only use qos-groups to do that. So this time we’ll create a queuing type of traffic. We’ll call the name the same as we did the last two class-maps:

1
2
n5k-top(config-pmap-c-qos)# class-map type queuing myTraffic
n5k-top(config-cmap-que)# match qos-group 2

Now that we have that, lets associate it to a policy map. (just like the others). Here we can just split it in half. We don’t really know what our traffic will be. (Maybe you do in your network). So we’re just going to give most of it to our jumbo frame traffic (50%), and then we’ll give 25% to the other two types (fcoe and default).

1
2
3
4
5
6
7
8
n5k-top(config-pmap-nq-c)# policy-map type queuing myQueuing-policy
n5k-top(config-pmap-que)# class type queuing myTraffic
n5k-top(config-pmap-c-que)# bandwidth percent 50
n5k-top(config-pmap-c-que)# class type queuing class-fcoe
n5k-top(config-pmap-c-que)# bandwidth percent 25
n5k-top(config-pmap-c-que)# class type queuing class-default
n5k-top(config-pmap-c-que)# bandwidth percent 25
n5k-top(config-pmap-c-que)# sh policy-map type queuing myQueuing-policy

Type queuing policy-maps
========================

policy-map type queuing myQueuing-policy
class type queuing myTraffic
bandwidth percent 50
class type queuing class-fcoe
bandwidth percent 25
class type queuing class-default
bandwidth percent 25

Boom! Just like that, we now have our three QoS policies created. One for type qos called myQos-policy. One for type network-qos called myNetwork-QoS-policy. And finally, the one we just created for type queuing called myQueuing-policy.

What’s left? Now we just need to apply this to the system:

1
2
3
4
n5k-top(config-sys-qos)# service-policy type qos input myQoS-policy
n5k-top(config-sys-qos)# service-policy type network-qos myNetwork-QoS-policy
n5k-top(config-sys-qos)# service-policy type queuing input myQueuing-policy
n5k-top(config-sys-qos)# service-policy type queuing output myQueuing-policy

Notice for the Queuing policy we applied it twice. Once to the input and once to the output. The QoS type is only applied to the input, because that’s where traffic comes in and is marked. The network-qos effects input and output but should be the same for all, so we only configure it once.

Well, hopefully that wasn’t too confusing.  You can obviously see the power in this.  If we had voice, video, and other types of traffic coming through here, we could add it to our policy-maps after we tag it.  With QoS, the settings should be the same on all switches in the datacenter.  If you are running a VPC, you’ll have to make sure that the other switch has the same set up, otherwise you’ll get a Type two error:

1
2
3
n5k-top(config-sys-qos)# sh vpc
Legend:
(*) - local vPC is down, forwarding via vPC peer-link

vPC domain id : 1
Peer status : peer adjacency formed ok
vPC keep-alive status : peer is alive
Configuration consistency status : success
Per-vlan consistency status : success
Type-2 consistency status : failed
Type-2 inconsistency reason : QoSMgr Network QoS configuration incompatible
vPC role : primary
Number of vPCs configured : 3
Peer Gateway : Disabled
Dual-active excluded VLANs : –
Graceful Consistency Check : Enabled
Auto-recovery status : Disabled

vPC Peer-link status
———————————————————————
id Port Status Active vlans
— —- —— ————————————————–
1 Po1 up 1,500

vPC status
—————————————————————————-
id Port Status Consistency Reason Active vlans
—— ———– —— ———– ————————– ———–
10 Po10 up success success 1,100,500
20 Po20 up success success 1,100,500
80 Po80 up success success 1,100,500

See? Nobody wants a type 2 error. Apply settings throughout the data center!

Hope this helps someone struggling with getting jumbo frames on a Nexus 5k.

Configure VMware from scratch without Windows

One of the things that bugs me about vCenter (still) is that it is still very tied to the Windows operating system.  You have to have Windows to set it up and trying to go about without Windows is still somewhat difficult.  In my lab I’m trying to get away from doing Windows.  I have xCAT installed to PXEboot UCS Blades to do what I want.  Its great, and its automated.  But when I installed 8 nodes to be ESXi hosts I quickly realized I needed vCenter to demonstrate this and use this as others would.

That requires vCenter.  VMware has had the vCenter appliance out for a few years now.  It runs on SLES and comes preconfigured.  The only problem is installing it when you have no vCenter client because today those clients are only made for the Windows Operating system.  How to get around this?

ovftool was the thing I found that did the job for me.  I found the link by reading the ever prolific Virtual Ghetto post on deploying ovftool on ESXi.  Since I had Linux, installing ovftool on the ESXi host wasn’t necessary for me.  Instead I just installed it on my Linux server (with some trouble since it deploys this stub and you have to make sure you don’t modify the file).

I ran the command:

ovftool -ds=NFS1 VMware-vCenter-Server-Appliance-5.0.5201-1476389_OVF10.ova vi://root:password@node01

After that, I watched my DHCP server and saw that it gave the vCenter appliance the IP address of 172.20.200.1.  Hopefully you have DHCP or you might be hosed.

Then after finding the docs, I intuitively opened my web browser to https://172.20.200.1:5480. (everyone knows that port number right?) I then logged in with user ‘root’ and password ‘vmware’ and started the auto setup.  After changing the IP address and restarting the appliance I was pretty golden.

Once configured, log into the appliance at https://172.20.1.101:9443/vsphere-client/ and then be stoked that you have flash player already installed and that it works.  Oh you didn’t have flash player installed on your linux server?  That sucks, I didn’t either.  Guess that’s another hoop we have to jump through. But wait, then you find that Flash 11.2.0 is the last Flash that has been released for Linux.  Guess what?  VMware requires Flash version 11.5.  Nice.

https://communities.vmware.com/message/2319263

At this point I just copied a Windows VM that I had laying around and started managing it from there.  The moral of the story is that you can’t do a Windows free VMware environment.  Sure, I could have done fancy scripting and managed it all remotely with some of their tools, but if I’m going to be doing all that, why should I pay for VMware?  I’d be better off just doing straight native KVM.  YMMV.

FCoE with UCS C-Series

I have in my lab a C210 that I want to turn into an FCoE target storage.  I’ll write more on that in another post.  The first challenge was to get it up with FCoE.  Its attached to a pair of Nexus 5548s.  I installed RedHat Linux 6.5 on the C210 and booted up.  The big issue I had was that even though RedHat Linux 6.5 comes with the fnic and enic drivers, the FCoE never happened.  It wasn’t until I installed the updated drivers from Cisco that I finally saw a flogi.  But there were other tricks that you had to do to make the C210 actually work with FCoE.

C210 CIMC

The first part to start is looking in the CIMC (with the machine powered on) and configure the vHBAs. From the GUI go to:

Server -> Inventory

Then on the work pane, the ‘Network Adapters’ tab, then down below select vHBAs.  Here you will see two vHBAs by default.  From here you have to set the VLAN that the vHBA will go over.  Clicking the ‘Properties’ on the interface you have to select the VLAN.  I set the MAC address to ‘AUTO’ based on a TAC case I looked at, but this never persisted.  From there I entered the VLAN.  VLAN 10 for the first interface and VLAN 20 for the second interface.  This VLAN 10 matches the FCoE VLAN and VSAN that I created on the Nexus 5548.  On the other Nexus I creed VLAN 20 to match FCoE VLAN 20 and VSAN 20.

This then seemed to require a reboot of the Linux Server for the VLANs to take effect.  In hindsight this is something I probably should have done first.

RedHat Linux 6.5

This needs to have the Cisco drivers for the fnic.  You might want to install the enic drivers as well.  I got these from cisco.com.  I used the B series drivers and it was a 1.2GB file that I had to download all to get a 656KB driver package.  I installed the kmod-fnic-1.6.0.6-1 RPM.  I had a customer who had updated to a later kernel and he had to install the kernel-devel rpm and recompile the driver.  After it came up, it worked for him.

With the C210 I wanted to bond the 10Gb NICs into a vPC.  So I did an LACP bond with Linux.  This was done as follows:

Created file: /etc/modprobe.d/bond.conf

alias bond0 bonding
options bonding mode=4 miimon=100 lacp_rate=1

Created file: /etc/sysconfig/network-scripts/ifcfg-bond0

DEVICE=bond0
IPADDR=172.20.1.1
ONBOOT=yes
NETMASK=255.255.0.0
STARTMODE=onboot
MTU=9000

Edited the /etc/sysconfig/network-scripts/ifcfg-eth2

DEVICE=eth2
MASTER=bond0
SLAVE=yes
HWADDR=58:8D:09:0F:14:BE
TYPE=Ethernet
UUID=8bde8c1f-926f-4960-87ff-c0973f5ef921
ONBOOT=yes
NM_CONTROLLED=no
BOOTPROTO=none

Edited the /etc/sysconfig/network-scripts/ifcfg-eth3

DEVICE=eth3
MASTER=bond0
SLAVE=yes
HWADDR=58:8D:09:0F:14:BF
TYPE=Ethernet
UUID=6e2e7493-c1a1-4164-9215-04f0584b338c
ONBOOT=yes
NM_CONTROLLED=no
BOOTPROTO=none

Next restart the network and you should have a bond. You may need to restart this after you configure the Nexus 5548 side.

service network restart

Nexus 5548 Top
Log in and create VPCs and stuff.  Also don’t forget to do the MTU 9000 system class.  I use this for jumbo frames in the data center.

policy-map type network-qos jumbo
class type network-qos class-default
mtu 9216
multicast-optimize
system qos
service-policy type network-qos jumbo

One thing that drives me crazy is that you can’t do sh int po 4 to see that the MTU is 9000. From the documents, you have to do

sh queuing int po 4

to see that your jumbo frames are enabled.

The C210 is attached to ethernet port 1 on each of the switches.  Here’s the Ethernet configuration:

The ethernet:

interface Ethernet1/1
switchport mode trunk
switchport trunk allowed vlan 1,10
spanning-tree port type edge trunk
channel-group 4

The port channel:

interface port-channel4
switchport mode trunk
switchport trunk allowed vlan 1,10
speed 10000
vpc 4

As you can see VLAN 10 is the VSAN. We need to create the VSAN info for that.

feature fcoe
vsan database
vsan 10
vlan 10
fcoe vsan 10

Finally, we need to create the vfc for the interface:

interface vfc1
bind interface Ethernet1/1
switchport description Connection to NFS server FCoE
no shutdown
vsan database
vsan 10 interface vfc1

Nexus 5548 Bottom
The other Nexus is similar configuration.  The difference is that instead of VSAN 10, VLAN 10, we use VSAN20, VLAN 20 and bind the FCoE to VSAN 20.  In the SAN world, we don’t cross the streams.  You’ll see that the VLANS are not the same in the two switches.

Notice that in the below configuration, VLAN 20 nor 10 is defined for through the peer link so you’ll only see VLAN 1 enabled on the vPC:

N5k-bottom# sh vpc consistency-parameters interface po 4

Legend:
Type 1 : vPC will be suspended in case of mismatch

Name Type Local Value Peer Value
————- —- ———————- ———————–
Shut Lan 1 No No
STP Port Type 1 Default Default
STP Port Guard 1 None None
STP MST Simulate PVST 1 Default Default
mode 1 on on
Speed 1 10 Gb/s 10 Gb/s
Duplex 1 full full
Port Mode 1 trunk trunk
Native Vlan 1 1 1
MTU 1 1500 1500
Admin port mode 1
lag-id 1
vPC card type 1 Empty Empty
Allowed VLANs – 1 1
Local suspended VLANs – – -

But on the individual nodes you’ll see that the VLAN is enabled in the VPC. VLAN 10 is carrying storage traffic.

# sh vpc 4

vPC status
—————————————————————————-
id Port Status Consistency Reason Active vlans
—— ———– —— ———– ————————– ———–
4 Po4 up success success 1,10

Success?

How do you know you succeeded?

N5k-bottom# sh flogi database
——————————————————————————–
INTERFACE VSAN FCID PORT NAME NODE NAME
——————————————————————————–
vfc1 10 0x2d0000 20:00:58:8d:09:0f:14:c1 10:00:58:8d:09:0f:14:c1

Total number of flogi = 1.

You’ll see the login. If not, then try restarting the interface on the Linux side. You should see a different WWPN in each Nexus. Another issue you might have is that the VLANS may be mismatched, so make sure you have the right node on the right server.

Let me know how it worked for you!

ACI Tech Specs

Cisco’s new Application Centric Infrastructure play and the introduction of the Nexus 9000 seems to have turned all this SDN talk in a new direction. Remember  Novell NetWare?  You see, before operating systems came with networking you would buy NetWare software so that your PCs could talk on the network.  But soon operating systems started including the network stack in the core operating systems.  So instead of buying Microsoft Windows 3.1 and NetWare, you just bought Windows 95 and you had all the networking built in.

That’s sort of what’s happened with the Nexus 9000.  Instead of buying a network switch and then buying an SDN component, you just buy the Nexus 9000 and it comes with SDN like capability and so much more!  I’m pretty happy with what I’ve seen with the Nexus 9000 and what it promises to deliver.  Its still a ways off.  The Nexus 9000s today run in “stand alone” mode, which means the whole SDN portion of it is not there.  However, its still a very cool platform and when it comes I’m hoping it will be very intuitive and simple to deploy complex networks.

But as cool as the Nexus 9000 series is, that’s not the point of this blog.  The point is to talk about a new iOS app that I have just submitted to Apple for evaluation.  Its called ACI Tech Specs.

Here’s a few things about this app.  Its a lot like UCS Tech Specs in terms of what it does.  But its written from the ground up for iOS 7.  Its the most complete software project I’ve ever done in my spare time and uses all the modern programming techniques and libraries.  In fact, this will be the bases of the next iteration of UCS Tech Specs (which I’ll try to get out by the end of January).  I’m hoping it will be even more useful than any of the other projects I’ve done.  Its more flexible, more responsive, and looks better, cleaner than any place else where you can get this information.

Why You’ll Love it

The size is tiny compared to UCS tech specs.  You’ll be able to download it over your mobile connection instead of requiring a wifi.  That was the trouble with UCS Tech Specs: The app was too big.  It was big because it had tons of pictures bundled in.  This one, no pictures are there.  Its on demand downloaded.

Another reason you’ll love it is that changes take place instantly.  No more updating your library by going to the obscure ‘i’ button and clicking ‘update library’.  The app checks for updates every time you open the page.  Its demand driven. This way if you email me telling me I forgot something, misspelled something, or that you’d really like to get more information on a certain part of the products, then I can go to my back end and do it, and it will just show up.

If you are offline and have gone to a page (e.g: you’re in airplane mode) then the data will all be there.  The data is stored on the phone (including pictures) but downloaded as the app progresses.

Why you might not like it

I’m trying to get better at figuring out how the app is being used.  As such, I am tracking you.  But not NSA style.  You see, what happens is when you install my app, it puts a random ID inside  your phone’s folder.  That random ID allows me to uniquely identify your device.  (Not you, or nothing else).  I’ll also be capturing what type of device you have.  Then I’ll be storing that.  I won’t be storing anything like names, passwords, etc.  I do this because I want to know how many unique devices are using the app and what types of devices they are.  If I find out more people are using iPads than iPhones, then I’ll redouble my efforts in the iPad.  (This first release will only do iPhone).  So hopefully that doesn’t make you too nervous.  If I do lose the data then all people will have are random strings and model types.  They’ll know nothing else about you.

You might not also like that an Internet connection will be required to get the app going.  The first time you open different pages, you’ll find that it will update.  If you’re on an airplane and you haven’t opened up a certain page, then that page will appear blank until you open it again with an internet connection.  (Cellularly or Wirelessly).  I made this choice because I wanted more people to download it without requiring wifi (so the image wasn’t so huge) and because I feel like I’m usually always connected anyway and most of my users fit my type of user profile.  (But maybe people will hate it and we’ll see.)  I did spend a lot of time working on caching the data and synchronizing seamlessly so any updates I make to the core server will show up on the app.

I’m also curious to see what type of scaling problems I run into.  I’ve got one server running a Ruby on Rails application on the back end that serves up the JSON that is consumed by the iOS app.   If the server goes down then nobody will get an update.  So if you open up the app and you are staring at a blank page, let me know and I’ll see what’s going on with my server.  I think I’m really going to have to scale this out and that may be the next huge task I tackle on this.  Its running on a friends server at xmission and I may need to migrate to AWS or something if I have issues with scaling.

What’s Next?

The backend still needs a lot more data.  I’ll be putting in more information about the platforms as they become available.  The nice thing is that in my role, its my job to be up to date on the latest Nexus 9000 products, so expect this app to be pretty much up to date along with the next UCS Tech Specs when it comes out.

Also, there will be a native Android application that I’ll release on Google Play hopefully mid 2014 at the latest.  This will be the first time I’ve written an Android application, so thanks for hanging in there.  That is part of the reason I spent so much time on the back end.  In fact the whole development process from creating the back end to the iOS client I spent 70% of the time on the back end.  The iOS client only took about 2 months to write where as the back end I spent all of summer and had to modify it as I wrote the app.

Finally, I would like to add a  live Twitter feed on it to see all ACI related posts, create some more interactiveness, but I think the rest of 2014 will be focused on Android, scaling, and seeing if this thing floats.  I hope you love it!  If not, let me know what sucks: vallard@benincosa.com  I’m all ears.

 

 

Changing UCS IP addresses

I have a UCS lab machine that I sometimes take to different locations for proof of concept work.  One of the things I regularly have to do is change the password and hostname.  Here’s how you do it on the command line:

KCTest-A# scope fabric-interconnect a
KCTest-A /fabric-interconnect # set out-of-band ip 10.1.1.23 netmask 255.255.255.0 gw 10.1.1.1
Warning: When committed, this change may disconnect the current CLI session
KCTest-A /fabric-interconnect* # scope fabric-interconnect b
KCTest-A /fabric-interconnect* # set out-of-band ip 10.1.1.24 netmask 255.255.255.0 gw 10.1.1.1
Warning: When committed, this change may disconnect the current CLI session
KCTest-A /fabric-interconnect* # scope system
KCTest-A /system* # set virtual-ip 10.1.1.25
KCTest-A /system* # set name ccielab
KCTest-A /system* # commit-buffer

 

It’s great because you can change all the IP addresses on each server, the virtual server, and the hostname in one shot.

Source of docs

 

1000v in and out of vCenter

I was setting up the Nexus 1110 (aka: virtual service appliance, aka: VSA) with one of our best customers and as we were doing it the appliance rebooted never to come up again without completely reinstalling the firmware from the remote media.  Most of this was probably my fault because I didn’t follow the docs exactly, and I think we can now move forward, but it made me realize I hadn’t written down an important way to reconnect to an orphaned 1000v from a new virtual supervisor module (VSM).
Here’s the situation:  When you lose the 1000v that is connecting into vCenter, there is no way to remove the virtual distributed switch (VDS or DVS) that the 1000v presented to vCenter.  You can remove hosts from the DVS but you can’t get rid of that switch.
In the above picture, there is my DVS.  If I try to remove it, I get the following error:
In my case, I didn’t want to get rid of it, I just wanted to reconnect a new VSM that I created with the same name.  But this operation can be used to remove the 1000v DVS from vCenter as well.
So here’s how you do it:
Adopt an  Orphaned Nexus 1000v DVS
Install a VSM.  I usually do mine manually, so that it doesn’t try to register with vCenter or one of the hosts.  Don’t do any configuration, other than an IP address.  Just get it so that you can log in.  Once you can log in, if you did create an SVS connection you’ll need to disconnect.  In mine, I made an svs connection and called it venter.  To disconnect from vCenter and erase the svs connection run:
# config
# svs connection vcenter
# no connect
# exit
# no svs connection venter
Trivia: What does SVS stand for?  “Service Virtual Switch
Step 2.  Change the hostname to match what is in vCenter
Looking at the error picture above, you can see there is a folder named nexus1000v with a DVS named nexus1000v.  To make vCenter think that this new 1000v is the same one, we need to change the name to match what is in vCenter
nexus1000v-a(config)# conf
nexus1000v-a(config)# hostname nexus1000v
nexus1000v(config)#
Step 3.  Build SVS Connection
Since we destroyed (or never built) the SVS connection in step 1, we’ll need to build one and try to connect.  The SVS connection should have the same name as the one you created when you first made you SVS.  So if you called your SVS ‘vCenter’, or ‘VCENTER’, or ‘VMware’ then you’ll need to name it the same thing.  I named mine ‘vcenter’ so that’s what I use.  Similarly, you’ll have to create the datacenter-name the same as what you had before.
nexus1000v(config)# svs connection vcenter
nexus1000v(config-svs-conn)# remote ip address 10.93.234.91 port 80
nexus1000v(config-svs-conn)# vmware dvs datacenter-name Lucky Lab
nexus1000v(config-svs-conn)# protocol vmware-vim
nexus1000v(config-svs-conn)# max-ports 8192
nexus1000v(config-svs-conn)# admin user n1kUser
nexus1000v(config-svs-conn)# connect
ERROR:  [VMware vCenter Server 5.0.0 build-455964] Cannot create a VDS of extension key Cisco_Nexus_1000V_1169242977 that is different than that of the login user session Cisco_Nexus_1000V_125266846. The extension key of the vSphere Distributed Switch (dvsExtensionKey) is not the same as the login session’s extension key (sessionExtensionKey)..
Notice that when I tried to connect I got an error.  This is because the extension key in my Nexus 1000v (that was created when it was installed) doesn’t match what the old one is.  The nice thing, is I can actually change that, and that is how I make this new 1000v take over the other one.

Step 4.  Change the extension key to match what is in vCenter.
To see what the current extension-key is (or the offending key is) run the following command:
nexus1000v(config-svs-conn)# show vmware vc extension-key
Extension ID: Cisco_Nexus_1000V_125266846
That is the one we need to change.  You can see the extension-key that vCenter wants from the error message we saw in the previous step.  In the previous step it showed that the extension key we wanted was ‘Cisco_Nexus_1000V_1169242977′.  So we need to make our extension-key on the 1000v match that.  No problem:
nexus1000v(config-svs-conn)# no connect
nexus1000v(config-svs-conn)# exit
nexus1000v(config)# no svs connection vcenter
nexus1000v(config)# vmware vc extension-key Cisco_Nexus_1000V_1169242977

Now we should be able to connect and run things as before.

Step 5. (Optional) Remove the 1000v

If you’re just trying to remove the 1000v because you had that orphaned one sitting around, we simply disconnect now from vCenter

nexus1000v(config)# svs connection vcenter
nexus1000v(config-svs-conn)# no connect
nexus1000v(config-svs-conn)# connect
nexus1000v(config-svs-conn)# no vmware dvs
This will remove the DVS from the vCenter Server and any associated port-groups. Do you really want to proceed(yes/no)? [yes] yes

Now, the orphaned Nexus 1000v is gone. If you want to remove it from your vCenter plugins then you will have to navigate the managed object browser and remove the extension key. Not a big deal. By opening a web browser to the host that manages vCenter (e.g.: http://10.93.234.91 ) then you can “Browse objects managed by vSphere”. From there go to “content” then “Extension Manager”. To unregister the 1000v plugin, select “UnregisterExtension” and enter in the vCenter Extension key. This will be the same extension key that you used in step 4. (In our example: Cisco_Nexus_1000V_1169242977 )

Hope that helps!

Cloud Computing: How Do I Get There?

This post comes from a talk that I’ll be presenting on at the Pacific Northwest Digital Government Summit Conference on October 2nd, 2013.

History shows us that those that embrace technology and change survive while those that resist and stick with “business as usual” get left behind.  If we have the technology and we don’t use it to make IT look like magic, then we’re probably doing it wrong. (Read “The Innovator’s Dilemma” and Clarke’s Three Law.)

I’ll be talking mainly about private cloud today, but many of these ideas can be taken into the public cloud as well.

Optimizing ROI on your Technology

My friend tells a story about when his wife first started using an iPhone.  To get directions on a map she’d open up Safari and go to http://maps.google.com.  To check Facebook she would open Safari and go to http://facebook.com.  To check her mail she’d open up Safari again and navigate to http://gmail.com.  You get the idea.

She was still getting great use of her iPhone.  She could now do things she could never do before.  But there was a big part she was missing out on.  She wasn’t using the App ecosystem that makes all of these things easier and delivers a richer experience.

Today, most organizations have virtualization in the data center.  Because of this IT is able to do things they’ve never been able to do before.  They’re shrinking their server footprints to once unimaginable levels saving money in capital and management costs.  I’ve been in many data centers  where people proudly point to where rows of racks have been consolidated to one UCS domain with only a few blades.  Its pretty cool and very impressive.

But they’re missing something as big as the App Store.  They’re missing out on the APIs.  This is where ROI is not being optimized in the data center in a big way.

IT is shifting (or has shifted) to a DevOps model. DevOps means that your IT infrastructure team is more tightly aligned with your developers/application people.  This is a management perspective.  But from a trenches perspective, the operations team is now turning into programmers.  Programmers of the data center.  The guy that manages the virtual environment, the guy who adds VLANs to switches, or the guy who creates another storage LUN: they’re all being told to automate and program what they do.

The group now treats the IT infrastructure like an application that is constantly adding features and doing bug fixes.

The programming of the IT infrastructure isn’t done in compiled languages like Java, C, or C++.  Its done in interpreted languages like Python, Ruby, Bash,  Powershell, etc.  But the languages alone don’t get you there.  You need a framework.  This is where things like Puppet or Chef come into play.  In fact, you even can look at it like you’re programming a data center operating system.  This is where OpenStack provides you a framework to develop your data center operating system.  Its analogous to the Web Application development world.  Twitter was originally developed in Ruby using a framework called Ruby on Rails.  (Twitter has since moved off Ruby on Rails).

Making this shift gives you unprecedented speed, agility, and standardization.  Those that don’t do it, will find their constituents looking elsewhere for IT services that can be delivered faster and cheaper.

The IT assembly line

Its hard for people to think of their IT professionals as assembly line workers.  After all, they are doing complex things like installing servers, configuring networks, and updating firmware.  These are CCIEs, VCPs, and Storage Gurus.  But that’s actually what people in the trenches are:  Workers of the virtual Assembly line.  IT managers should look at the way work enters the assembly line, understand the bottlenecks, and track how long it takes to get things through the line.  Naturally, there are exceptions that crop up.  But for the most part, the work required to deliver applications to the business are repetitive tasks.  They’re just complicated, multi-step, repetitive tasks.

To start with, we need to look at the common requests that come in:  Creating new servers, deploying new applications, delivering a new test environment.  Whatever it is, management really needs to understand how it gets done, and look at it like the manufacturing foreman sitting above the plant, looking down and watching a physical product make its way through.  Observe which processes are in place, where they are being side stepped, or where they don’t exist at all.

As an example, consider all the steps required to deploy a server.  It may look something like the flowchart below:

That sure looks like an assembly line to me.  If you can view work that enters the infrastructure like an assembly line, you can start measuring how long it takes for certain activities to get done.  Then you can figure out ways to optimize.

Standardization of the Infrastructure

Manufacturing lines optimize throughput by standardizing processes and equipment.  When I hear VMware tell everybody that “the hardware doesn’t matter”, I take exception.  It matters.  A lot.  Just like your virtualization software matters.  Cisco and other hardware venders come from it the opposite direction and say “the hypervisor doesn’t matter, we’ll support them all”.  What all parties are really telling you is that they want you to standardize on them.  All parties are trying to prove their value in a private cloud situation.

What an organization will standardize on depends on a lot of things: Budget, skill set of Admins, Relationship with vendors and consultants, etc.  In short, when considering the holy trinity of the data center: Servers, Storage, & Networking it usually gets into a religious discussion.

But whatever you do, the infrastructure needs to be robust.  This is why the emergence of Converged Infrastructures like Vblocks, FlexPods, and other reference architectures have become popular.  The  “One-Piece-At-A-Time” accidental/cobbled architecture is not a good play.

Consider the analogy that a virtualized workload is cargo on a Semi Truck.  Do you want that truck running over a 6 lane solid government highway like I-5 or do you want that stuff traveling at 60mph down a rinky bridge?

This?

Or This?

Similarly, if your virtualization team doesn’t have strong Linux skills, you probably don’t want them running OpenStack on KVM.  That’s why VMware and Hyper-V are so popular.  Its a lot easier for most people’s skill level.

What to Standardize On?

While the choice of infrastructure standardization is a religious one, there are role models we can look to when deciding.  Start out by looking at the big boys, or the people you aspire to be when you grow up.  Who are the big boys that are running a world class IT as a service infrastructure?  AWS, RackSpace, Yahoo, Google, Microsoft, Facebook, right?

What are they standardizing on?  Chances are its not what your organization is doing.  Instead of VMware, Cisco, IBM, HP, Dell, EMC, NetApp, etc, they’re using open source, building their own servers, and using their own distributed filesystems.  They do this because they have a large investment in their DevOps team that is able to put these things together.

A State organization that has already standardized on a FlexPod or Vblock with VMware is not going to throw away what they’ve done and start over just so they can match what the big boys do.  However, as they move forward, perhaps they can make future decisions based on emulating these guys.

Standardize Processes

The missing part is standardizing the processes once the infrastrucutre is in place.  Standardization is tedious because it involves looking at every detail of how things are done.  One of my customers has a repository of documentation they use every time they need to do something to their infrastructure.  For example, 2 weeks ago we added new blade servers to the UCS.  He pulled out the document and we walked through it.  There were still things we modified in the documentation, but for the most part the steps were exact.

Unfortunately, this was only one part of the process.  The Networking team had their own way of keeping notes (or not at all) on how to do things.  So the processes were documented in separate places.  What the IT manager needs to do is make sure they understand how the processes (or work centers) are put together and how long each one takes.

The manager should be able to have their own master process plan to be able to track work through the system.  (The system being the different individuals doing the work).  This is what is meant by “work flow”.  Even if they just do this by hand or as is commonly done with a Gantt chart, there should be some understanding.

Each job that comes in, should get its own workflow, or Gantt Chart, and entered into something like a Kanban board.  Once you understand this for the common requests, you can see how many one offs there are.

Whether these requests are for public cloud or private cloud, there is still a workflow.  It is an iterative process that may not be complete the first few times it is done, but over time will become better.  There is a great book called “The Phoenix Project” that talks about how the IT staff starts to standardize and work together between development and operations to get their processes better.  These ideas are based off an earlier business classic called “The Goal”

Automate the Processes

Once the processes are known we turn our assembly line into programmers of the processes.  I used to worked as a consulting engineer to help deploy High Performance Computing clusters.  On several occasions the RFPs required that the cluster be able to be deployed from scratch in less than 1 hour.  From bare metal, to running jobs.  We created scripts that would go through and deploy the OS, customize the user libraries, and even set up a job queuing system.  It was pretty amazing to see 1,200 bare metal rack mount servers do that.  When we would leave, if the customer had problems with a server then they could replace it, plug it in, and walk away.  The system would self provision.

While that was a complicated process and still is, it is still simpler than what virtualization has done to the management of the data center.  We never had to mess with the network once it was set up.  Workflows for a new development environment are pretty common and require provisioning several VMs with private networks and their own storage.  However, the same method of scripting the infrastructure can still be applied.  It just needs to be orchestrated.

Automate and Orchestrate with a Framework

Back when we did HPC systems, we used an open source management tool called xCAT.  That was the framework by which we managed the datacenter.  The tool had capabilities but really what it gave us was a framework to insert our customizations or our processes that were specific for each site.  The tool was an enabler of the solution, not the solution itself.

Today there are lots of “enterprise” private cloud management tools.  In fact, any company that wants to sell a “Private Cloud”  will have its own tool.  VMware vCloud Director, HP Cloud System, IBM Cloudburst, Cisco UCS Director, etc.  All of these products, regardless of how they are sold should be regarded as frameworks for automating your processes.

At a recent VMUG, the presenter asked “How many people are using vCloud Director or any other cloud orchestration tool?”  Nobody raised their hand.  Based on what I’ve seen its because most organizations haven’t yet standardized their IT processes.  There is no need for orchestration if you don’t know what you’re orchestrating.

Usually each framework will come with a part or all of what Cisco calls the “10 domains of cloud” which may include: A self service portal, chargeback/showback, service catalog, security, etc.  If you are using a public cloud, you are using their framework.

Once you select one, you’ll need to get the operations teams (network, storage, compute, virtualization) to sign off and use the tool.  Its not just a server thing.  Each part of the assembly line needs to use it.

Once the individual components are entered into the framework, then the orchestration comes to play.  To start with, codify the most common workloads:  Creating VLAN, Carving out a LUN, Provisioning a VM, etc.

To orchestrate means to arrange or control the elements of, as to achieve a desired overall effect.  With the Framework, we are looking to automate all of the components to deliver a self service model to our end customer.

Self Service and Chargeback

Once we have the processes codified in the framework, we can now present a catalog to our users.  With a self service portal we recommend it not being completely automated to start out with.  With some frameworks, as a workload moves through the automated assembly line, it can send an email to the correct IT department to validate whether a workflow can move through.  So for example, if the user as part of the workflow wants a new VLAN for their VM environment, the networking administrator will receive an email and will be able to approve or deny.  This way, the workflow is monitored, the end requester knows where they are in the queue, and  once it is approved, it gets created automatically, then gets passed along to the next item in the assembly line.

For chargeback, the recommendation is to keep the menu small, and the price simple.

Security all throughout then Monitor, Rinse, and Repeat

More workflows will come into the system and the catalog will need to continuously need updating and revisions.  This is the programmable data center.  Iterations should be checked into a code repository similarly to how application developers use systems like github.com to store code updates.  You will have to do bug fixes and patch up any exposed holes.  With virtualization comes the ability to integrate more software security services like the ASA 1000v, or the VSG.

Action Items

  • Realize that your IT infrastructure is a collection of APIs waiting to be harnessed and programmed.  Challenge the people you work with to learn to use those APIs to automate their respective areas of expertise.
  • Optimize the assembly line by understanding the workflows.  Any manufacturing manager can tell you the throughput of the system.  An IT manager should be able to tell you the same thing about their system.  Start by understanding the individual components, how long it takes, and where the bottlenecks in the system are.
  • Standardize your infrastructure with a solid architecture.  Converged architectures are popular for a reason.  Don’t reinvent the wheel.
  • Standardizing processes is the hardest part.  Start with the most common.  These are usually documented.  Take the documentation and think how you would change it into code.
  • Program the DataCenter using a Framework.  Most of the work will have to be done in house or with service contracts.  The framework could be something like a vendors cloud software or something free like OpenStack.

 

Quick SPAN with the Nexus 1000v

Today I thought I’d take a look at creating a SPAN session on the 1000v to monitor traffic.  I found it really easy to do!  SPAN is one of those things that takes you longer to read and understand than to actually configure.  I find that true with a lot of Cisco products:  Fabric Path, OTV, LISP, etc.

SPAN is “Switched Port Analyzer”.  Its basically port monitoring.  You capture the traffic going from one port and then mirror it on another.  This is one of the benefits you get out of the box for the 1000v that enables the network administrator not to have this big black box of VMs.

To follow the guide, I installed 3 VMs.  iperf1, iperf2, and xcat.  The idea was I wanted to monitor traffic between iperf1 and iperf2 on the xcat virtual machine.

On the xcat virtual machine I created a new interface and put it in the same VLAN as the other VMs.  These were all on my port-profile called “VM Network”.  I created it like this:

conf
vlan 5
port-profile type vethernet “VM Network”
vmware port-group
switchport mode access
switchport access vlan 510
no shutdown
state enabled

Then, using vCenter I edited the VMs to assign them to that port group. (Remember: VMware Port-Group = Nexus 1000 Port-Profile)

On the Nexus 1000v Running the command:

# sh interface virtual

——————————————————————————-
Port Adapter Owner Mod Host
——————————————————————————-
Veth1 vmk3 VMware VMkernel 4 192.168.40.101
Veth2 vmk3 VMware VMkernel 3 192.168.40.102
Veth3 Net Adapter 1 xCAT2 3 192.168.40.102
Veth4 Net Adapter 2 iPerf2 3 192.168.40.102
Veth5 Net Adapter 3 xCAT 3 192.168.40.102
Veth6 Net Adapter 2 iPerf1 3 192.168.40.102

Allows me to see which vethernet is assigned to which VM. In this SPAN session, I decided I wanted to monitor the traffic coming out of iPerf1 (Veth6) on the xCAT VM (veth5).
No problem:

Create The SPAN session

To do this, we just configure a SPAN session:

n1kv221(config-monitor)# source interface vethernet 6 both
n1kv221(config-monitor)# destination interface vethernet 5
n1kv221(config-monitor)# no shutdown

As you can see from above, I’m monitoring both received and transmitted packets from vethernet 6( iPerf1). Then those packets are being mirrored to vethernet 5 (xCAT). If you have an IP address on xCAT (vethernet 5) you’ll find you can no longer ping it. The port is in span mode. Notice also that by default the monitoring session is off. You have to turn it on.

Now we want to check things out:

n1kv221(config-monitor)# sh monitor
Session State Reason Description
——- ———– ———————- ——————————–
1 up The session is up
n1kv221(config-monitor)# sh monitor session 1
session 1
—————
type : local
state : up
source intf :
rx : Veth6
tx : Veth6
both : Veth6
source VLANs :
rx :
tx :
both :
source port-profile :
rx :
tx :
both :
filter VLANs : filter not specified
destination ports : Veth5
destination port-profile :

Now, you’ll probably want to monitor the port right? I just installed wireshark on my xcat vm. (Its linux, yum -y install wireshark and ride). To watch from the command line I just ran the command:

root@xcat ~]# tshark -D
1. eth0
2. eth1
3. eth2
4. eth3
5. any (Pseudo-device that captures on all interfaces)
6. lo

This gives me the interfaces. By matching the MAC addresses, I can see that eth2 (or device 3 from the wireshark output) is the one that I have on the Nexus 1000v.

From here I run:

[root@xcat ~]# tshark -i 3 -R “eth.dst eq 00:50:56:9C:3B:13″
0.000151 192.168.50.151 -> 192.168.50.152 ICMP Echo (ping) reply
1.000210 192.168.50.151 -> 192.168.50.152 ICMP Echo (ping) reply
2.000100 192.168.50.151 -> 192.168.50.152 ICMP Echo (ping) reply
..

Then I get a long list of fun stuff to monitor. By pinging between iperf1 and iperf2 I can see all the traffic that goes on. Since there was nothing else on this VLAN it was pretty easy to see. Hopefully this helps me or you troubleshoot down the road.

MediaWiki Installation on RedHat 5.5

In modern data center things like IPs, user accounts, passwords, and such that you used to keep in Excel spreadsheets should be rolled into the management tools.  That way, you always have the most current information.  Static word, excel and the like are old news.  Today you can see those things start to get rolled up into vCloud Director, OpenStack and others.  But for now, most people are still doing Excel spreadsheets.

This is stupid.  Please, At least use a wiki.  Catch up to 2005.

Media Wiki is one that I’ve used for years.  Its easy to install and do stuff and the syntax doesn’t take too long to learn.

Here’s how I set it up:

1.  Download Media Wiki on your Linux Server

Go to Media Wiki and download the latest stable.

cd /var/www/html
rm -rf *
wget http://download.wikimedia.org/mediawiki/1.21/mediawiki-1.21.1.tar.gz
tar zxvf media*
mv mediawiki-1.21.1/* .
rm -rf mediawiki-1.21.1

2.  Installing the Linux Environment

Get PHP and mysql installed on your server.  My server is a Red Hat 5.5 (yes, old )  virtual machine that I’ve had for about 2 years.  I haven’t updated to 6.x.  The easiest thing to do would be to install a new server.  CentOS 6.4 might be good, but a challenge every now and then is fun, yeah?  So to get it working, you have to have at least php 5.3.x.  To update I had to just update my OS.  Since I didn’t get my subscription set up right with Red Hat, I just figured I’d use CentOS to update.  That was pretty easy.  I just did this:

wget http://mirror.centos.org/centos/5/os/x86_64/CentOS/centos-release-5-9.el5.centos.1.x86_64.rpm
wget http://mirror.centos.org/centos/5/os/x86_64/CentOS/centos-release-notes-5.9-0.x86_64.rpm
rpm -ql -p centos-release-5-9.el5.centos.1.x86_64.rpm # just to see what was in it, yep, its got the repo!
rpm -Uvh centos-release-5-9.el5.centos.1.x86_64.rpm centos-release-notes-5.9-0.x86_64.rpm # install repos

From here, I removed my older versions of php. This is just:

rpm -qa | grep mysql
rpm -qa | grep php

Then I used some:

yum -y remove

Then I updated everything:

yum -y update

This took a while. Finished, came back. Everything updated. Now I installed the right packages:

yum -y install php53 php53-mysql msyql-server php53-xml

There may have been several other RPMs that you’ll need as dependencies, but that should get you started. That’s how we got up. Don’t forget to now enable mysql and restart apache:

service httpd restart
service mysqld restart
chkconfig –level 345 httpd on
chkconfig –level 345 mysqld on

3.  Configuring via the Web Interface

Once there, go to http://<yourserver>/

You should see:

4.  Creating Content

Going to the next page it’ll start asking you questions and eventually you’ll have yourself a wiki setup.  The thing I first started looking at doing was adding a table for IP addresses.  It ended up looking like this:

This is good and helps us to know where things are.  I started to create several pages for different VLANs. It could be updated, but I wish it was update in place.  Not the best, but ok for now.

 

5.  Editing Help

Go here: http://www.mediawiki.org/wiki/Help:Editing to see all the syntax to use to do cool formatting.

Finally, now you have yourself a wiki to keep things in. Welcome to 2005.  You are awesome.  No shared Excel spreadsheet with multiple outdated copies.  Now you just have to get everyone to buy into using it.  To do that: Be the example.  Use it, refer people to it.  Pretty soon they’ll catch on.

But there is a better way right?  What could that be?  The truth is, to manage effectively, you really need to integrate the information into your management toolset.  Much in the way UCS keeps track of BIOS versions, settings, VLANs, etc, you need some kind of tool that does that.  Today you can do that with OpenStack, vCloud Director, and some others.  I’m still not sold on any of them at this point but as I start to play with OpenStack more, I hope to give more guidance and thoughts.

UCS Reverse Path Forwarding and Deja-Vu checks

UCS Fabric Interconnects are usually always run in end-host mode.  At this point in the story there really isn’t that many reasons to use switch-mode on the Fabric Interconnects.

Two checks, or features that make End Host Mode possible are Reverse Path Forwarding (RPF) checks and Deja-Vu checks.

RPF and Deja-Vu (from Cisco.com)

Reverse Path Forwarding Checks

Each server in the chassis is pinned dynamically (or you can set up pin groups and do it statically, but I don’t recommend that) to an uplink on Fabric Interconnect A and Fabric Interconnect B.  Let’s say you have 2 uplinks on port 31 and 32 of your Fabric Interconnect.  Server 1/1 (chassis 1 / blade 1)  may be pinned to port 31.  If a unicast packet is received for server 1/1 on uplink port 31, it will go through.  But if that same packet destined for server 1/1 is received on port 32, it will be dropped.  That’s because RPF checks to see if the destination for the unicast is actually forwarding its uplink traffic through that link.

Deja Vu Checks

The other check is called “Deja-Vu” .  In the Cisco documentation it says: “Server traffic received on any uplink port, except its pinned uplink port is dropped“.  That sounds a lot like RPF.  Another presentation from Cisco live states it this way: “Packet with source MAC belonging to a server received on an uplink port is dropped

An example to clear it up

VM A on server 1/1 wants to talk to VM B located somewhere else.  The Fabric Interconnects in this case are connected to a single Nexus 5500 switch.  The VM is pinned to one of the VNICs and that VNIC is pinned to go out port 31 of Fabric Interconnect A.  So what happens?

First the VM will send an ARP request.  An ARP request basically says:  I know the IP address but I want the MAC address.  (Obviously, this is in the same Layer 2 VLAN and subnet).  If Fabric Interconnect A doesn’t find the IP/MAC association in its CAM table, then it will not flood the server ports down stream.  That is something a switch would do.  The Fabric Interconnect is different.  The reason the Fabric Interconnect doesn’t send a broadcast down its server ports is because it is a source of truth and knows everyone connected on its server ports.

What it will do instead is forward the ARP request (unknown unicast) up the designated uplink (port 31).  Now the Nexus switch is a switch.  (And a very good one at that).  It will say:  “Hey, I don’t have a CAM table entry for VM B IP/MAC so I will do what we switches do best:  Flood all the ports! (except the port that the unknown unicast/ARP request came in on)

Remember Fabric Interconnect A port 32 is connected to this same switch as port 31 where the unknown unicast (ARP request) went out.  The Nexus 5500 will send this unknown unicast to port 32 just like every other port.  But port 32 says:  Wait a minute, the source address originated from me.  Deja-vu!  So he drops the packet.

Fabric Interconnect B has two ports 31 and 32 that will also receive the unknown unicast.  If VM B is pinned to a VNIC that is pinned to port 31 on Fabric Interconnect B, he will say:  I got this!  And the packet will go through.  Port 32, however on FI-B will look at the destination MAC and say:  This is not pinned to me, so I’ll drop the packet.  That is the RPF check.

To sum it up

Deja-Vu check:  don’t receive a packet from the upstream switch that originated from me.

Reverse Path Forward Check:  don’t receive a packet if there’s no server pinned to this uplink.