Category Archives: Cisco

The CCIE Data Center Certification Process

CCIEData_Center_UseLogo

 

On July 9th, 2014 I passed the CCIE Data Center lab exam in San Jose earning me the CCIE certification.  Hurray!  When my team heard that I had done it, their response was:  If Vallard can do it, so can I!  Ha ha.  So needless to say a few more people have started down the path to certification, of which I have no doubt they will surely reach.

I have to say it feels pretty great and the process I went through to get it was very rewarding in that it deepened my understanding of data center architectures as well as the solid hands on skills required to implement these solutions.  With the CCIE certification, its the journey that makes it so worth it.

I thought I would write a bit of my experience of the process and how I approached it.  To summarize, it took me 5 times until I passed the written exam and once I did that I passed the lab exam on my second try.  I’m not saying my approach is the best, but it worked for me and I’m happy with the outcome.  The funny thing is, even though I worked really hard and learned so much to get it, I still feel like there are many things I don’t know about the platforms.  One of the drawbacks of my position is I don’t do a lot of troubleshooting with my customers because most of the solutions Cisco offers work really well.  Take UCS for example:  I spend probably 2 hours a month at the most troubleshooting issues with it – And that’s with the hundreds of UCS systems that my customers have that I support!

In spite of that, I still know this stuff very well now.  When a coworker asked me how to configure VPCs on the Nexus 5548s – to just give him a quick and dirty config – I was able to spit it all out from memory and I knew it was right.  I’ve done it so many times now I can do it in my sleep.

Need an OTV config?  I got you covered there too.  I can do OTV on a stick light speed setting it up with multicast or adjacency servers, I don’t even care.  I can do it all.  Boom.  So yes, passing the CCIE exam gives you confidence because you learn a ton.  That’s kind of how I felt when I graduated with my computer science degree from Berkeley.  Even though the program kicked my trash and made me feel like a sorry sucker most the time, it made me believe that armed with the skills I could do anything… given enough time.

So here’s my experience:

The Written Exam

The CCIE Data Center written exam topics are spelled out pretty clear on the Cisco Learning Network page.    I first took the test, in its beta form and I knew very little about the Nexus product line other than a few switches I had set up before.  I took the test without studying.  Zero prep.  Didn’t even look to see what was on it.  You see, I had to have humility beaten into me.  Anyway, I failed miserably.  Seriously.  I thought I was the man at UCS.  I got less than 20% right on it.  I blamed it on the way  the questions were worded, but in hind site, there were very clear answers that stood out among the wrong ones.  The thing was, it was hard.

After my first failure in August 2012, I gave up for about a year, not thinking it was for me.  Then I learned that a few more friends had already passed the written and were working towards the lab.  My pride made me think the same thing my team mates thought when I passed:  “If they can do it, then I can do it.”  My method, I thought would be a brute force attack on the exam.   So I took the exam again a year later in July 2013 after really working specifically on Nexus and MDS.  I felt that if given the beta exam again I could pass it.  The problem was, the exam was much different than I remembered it and again I did poorly.  When I failed, I rescheduled after realizing a few of my mistakes.  I took it again in August and September each time doing a little better, but each time not quite getting it.  By the time my December test came I was solidly prepared and just before Christmas on the 23rd of December I passed the written.

So what’s my advice on the written:

1.  If you already work in this field and have hands on with Nexus, MDS, & UCS, take the exam to see what’s on it.  CCIE is a total investment and if it takes you a few hundred dollars to pass the written exam, it might be worth it.

2.  If you fail the first time, take it again as soon as you can.  I think there are new rules going into affect that make it so you might not be able to take it as often.  However, once you start down the road to CCIE certification, you can’t stop until you’ve reached the end.  Otherwise you lose it.  That year I spent off was a waste.  I should have kept going.

3.  Once you pass the written exam, schedule the lab exam as soon as possible.  There are several months of waiting time right now and you don’t want this train to stop, so keep working towards it.

The CCIE Data Center Lab Exam

My entire IT career has been spent doing very hands on things.  I’m fortunate in that when I learn how to do something via the command line, my fingers seem to remember how to do it pretty well.  In some ways that’s bad because I have a hard time explaining things (which means maybe I don’t know how it works in the first place?) But I can usually always get things to work.  Being a fast typer helps as well.

As soon as I passed my written exam, I scheduled the lab.  The soonest I could get in was April 15, 2014.  That’s right: 4 months out. I had very little to go by other than the blueprint and Bryan McGahan’s excellent writeup.  I flew in the night before, and went to bed around 10PM, but then at 3AM had trouble going back to sleep.  I tossed and turned until about 5AM and then finally just got up, went for a 3 mile run, ate a good breakfast and showed up at Building C in San Jose 30 minutes early.  I sat in the waiting room with 14 other nervous people.  Man, I was tense.  I hadn’t felt that way since finals in undergrad.  We finally went in and I went to work.

As I was taking the exam, I tried to get that zen experience that Brian talked about in his blog, but it didn’t happen for me at all.  In fact, hardly anything happened for me.  For some reason, though, I thought I had done pretty well.  Wrong.  0% in multiple categories.

But I didn’t go into this thing the first time blindly.  I How did I prepare?  Hands on baby.  Stick time.  Yeah!

I was fortunate enough to have a pretty decent lab.  My equipment was good, but not complete.  I had:

- UCS with the old 6120s (but I’ve worked on plenty of 6248s so I wasn’t worried if that’s what they would have in the lab since I know all about unified ports.).  But 6120 fabric interconnects was all that was available to me.

- One Nexus 7010.  I had 1 Sup1 but I upgraded it to 8GB of RAM so that I could do 4+1 VDCs.  Didn’t matter, 4 would have been fine, since what I really needed was 8 VDCs.  But I made due.  I had one M1 line card and one F1 line card so that I could practice OTV, LISP, FabricPath, FCoE and layer 3 stuff.

- One Nexus 5548.  No line modules but I was fortunate enough to have layer 3 capabilities.  This helped me when I practiced OTV.  I also had several Nexus 2148s hanging around so I could do FEX things, but I could only do so much with a single Nexus 5548.

- One MDS 9148 Fibre Channel switch.  He worked pretty well.

I had a great base to get going on but in the end, I just couldn’t put it all together.  Why did I fail the first time?  Two reasons I think:

1.  Lack of confidence.  This is a big deal.  Nobody expected me to pass.  I’ve only been at Cisco for 3 years and I know people  who have been here a long time and haven’t earned the CCIE certification.  The second time I went in, I told my manager that I was getting it.  I was solidly prepared.

2.  Lack of equipment.  This was the biggest reason in my mind.  I’m cocky (conceited? immature? ignorant?) enough to think I can do these things.  I have young 4 children, and I’ve watched them all alone for 4 days straight, so I’ve already faced huge challenges!  I can do this!  If you look at the lab information and the equipment they use, you can see that I’m somewhat lacking.  For example, I had no director class fibre channel switch and not enough equipment to fully test things out.  This is one of the biggest barriers to passing the CCIE data center exam:  Having the equipment.  You are at least looking at several million dollars here and that’s probably why renting is such a good option and makes a ton of sense!

Anyway, here are my tips for the lab, when I passed, as well as for life in general:

Tip 1:  Higher is lower/ lower is higher?

I was also informed of a very cool trick.  When you think about priorities of different protocols or features, there’s an easy way to remember it.  This was taught to me by Ryan Boyd, a great guy I work with:  If its a layer 2 protocol (LACP, fibre channel stuff, VCP, spanning tree) the lower the number means higher the priority.  If its a layer 3 protocol (OSPF, EIGRP, OTV, VRRP, etc) higher the number higher the priority.  Fabric path is tricky, because its supposedly layer 2, but when you realize that its running IS-IS as the control plane then it makes more sense that it falls under the layer 3 rule: The higher the number, the higher the priority.  Why didn’t anyone tell me this before?

Tip 2:  copy & paste

I had several people tell me they use notepad, copy the command line stuff into it and then just put it in.  One of my friends told me he did that and blew away his switch and had to start from scratch.  This takes away far too many precious minutes from your lab time.  Lab day is one of the fastest days ever.  I spent a lot of time trying to debug something in the lab the day I passed.  When I looked at the clock, I realized that I had just spent 45 minutes burning away lab time.  Bad form!  (Fortunately, I had everything else done) So I don’t copy and paste.  I just type it out on the command line.  I have really good typing skills.  Its the one thing in high school that I did on a typewriter that really helped and has stuck with me.  Plus, writing all that code in college got me pretty good as well.  So for me it was type away.  Even if I’m doing the same thing on multiple switches.

As an aside:  The other funny thing I noticed is that people that do Cisco switches don’t type in all the words.  They do things like

sh run or sh int brie

Since I have big Linux roots, I do a lot of tabbing.  So maybe I add one extra keystroke, but this works for me.

Tip 3:

Draw it out.  In Brian’s blog he shows how he spent the first hour drawing it out.  I didn’t do quite that much the second time when I passed, but I did read through each section before I started working on that section.  This helped me when I had to remember which interfaces were connected to where.  You get as much scratch paper as you want.  I used more than the average.

After I failed the first test, I scheduled the second lab attempt as soon as I could.  The problem was:  The next available time was in September!!  Wow.  So I checked every day, several times a day for an opening.  After 3 days of this, I got July 9th.  So my lesson of not getting off the train helped out.  I thought:  Let’s keep going.

My friends had recommended INE labs and those things are *really* good.  I read through some of my friends labs, but didn’t use any of them.  Instead, a colleague of mine was building a lab out of spare parts and I joined forces and we built it together.  I like this approach a lot because I like touching hardware.  I like knowing how to set it up from scratch.  I’ve always done this.  We got a study group together of people that were going to take the lab exam and we hammered through all kinds of scenarios, really making sure we knew how to do it.  I’ll never forget watching the USA play in the world cup trying to get all our components working.

I tore the lab up several times and the week before the test, I really went to town.  (UCS, N1kv, MDS, N7k, N5k on the 4th of July is super patriotic, so that’s how I celebrated!)  I was continuing to go through it all the way up until 11PM the night before the test.  By that point, I had had enough.  I felt super ready.  I slept all the way until 6AM, extremely thankful I didn’t wake up at 3AM again.  I was still really nervous.  I got to building C early.

10518018_830156400328249_982627123_nThis time I had experience and I blew through all the questions keeping track of points feeling like I got nearly everything.  By lunch time I felt really good.  By 2PM I was sure I was passing… if only I could get this one thing working… I got it working by 3PM by being calm and retracing my steps.  I spent the remaining time going through the questions and making sure I had answered them right, tweaking things here and there and finding some things I had forgot. I counted the points and even though there were some things I never got working, I felt pretty sure I had enough to make it happen.

I left San Jose and went to the airport.  I called my wife and told her I felt good, but still wasn’t sure.  What if I missed something?  What if I didn’t save something?  (But I remember saving at least 3 times on every item before I left, so I was pretty sure about that)  Before I boarded the plan an email came.  I opened it up.  Put my hands in the air and jumped for joy.  The people in the airport probably thought I had just won the lottery.  But this wasn’t luck my friends, this was being prepared.  I had passed.  I texted my manager a few good friends and thanked them for their support.  It was a good day.

 

Distributed Data Centers

My thoughts on what cloud computing and the future of the data center has changed a bit in the last 3 years.  When I first started working on a cloud computing project for a large bank in America back in 2008 I was convinced that soon every enterprise would create their own private cloud and use xCAT (or something).  Then I thought they would instead all use OpenStack.  But I figured every organization would indeed build its own private cloud.  This has not panned out.  Not even close and its 6 years later.

Eventually, I thought, all enterprises would migrate to one public cloud provider, and it never occurred to me that people would see fit to use more than one public cloud provider.   I did form a concept of the InterCloud back then so I’m not too far off the mark.  But my vision is evolving and becoming more clear.  I finally see where IT is going.  (Or at least I think I do)

In my small sector of the world hardly anybody has a private cloud.  And when I say private cloud, I mean self service portals with completely automated provisioning.  Yeah, that’s just not happening.    The truth is, I don’t think it will for most organizations.  There’s not enough need there.  The only people that need VMs in a self service portal for most organizations are the VMware admins themselves and they are savvy enough to right click and make that happen without all your bloated self provisioning tools, thank you very much.

What I am seeing is that more and more are going to the public cloud.  This started out more as a shadow IT initiative, but more of the people I work with have in fact embraced it at central IT.   But its managed as a one off and people are still trying to figure it out.  People aren’t ditching their own data centers, and just like they’re not ditching their mainframes, in large enterprises there will always be some footprint on premise for IT services.

The other thing that seems completely obvious now is that people will want to use more than one public cloud provider.  The reason being some public clouds specialize in different things.  For example:  I might run Exchange/Office 365 on Azure, but I might run some development applications on AWS.  Similarly, I might have a backup as a service contract with SunGuard.  But I may not trust my data to anyone but my own 6 node Oracle RAC cluster that’s sitting in my very own datacenter.  Can you see where this leads us?

Central IT is now responsible for sourcing workloads.  The data center is distributed.  My organization’s data is all over the place.  My problem now is managing the sprawl.  Getting visibility to where the sprawl is and making sure I’m using it most effectively.

Another misconception I see is that people think using two or more public clouds  means VMs move between data centers.  Today, that’s pretty impractical.  Migrating VMs between data centers takes too long, even if the network problems weren’t a problem.  And besides, when you think that way, you are thinking more about pets in your data center instead of cattle like the future of applications is.  So forget about that right now.

Instead, focus on the real issue that needs to be solved.  And this is where I think Cisco can make big things happen.  That is:  How do you connect distributed data centers?

The Nexus 1000v InterCloud, or InterCloud Fabric I think is what Cisco is calling it now starts down this road.   It allows us to communicate with VMs in a public cloud with our own cloud using our same layer 2 address schema.  This is pretty cool, and a good start, but we’ll need more.  For example:  We might have our data base servers reside in our own data center.  (No self service portal here). Then we’ll develop apps that will be hosted in public clouds.  The application servers will need to communicate with each other and with the database.  The different applications may be in different clouds.  The real issue is how do they talk and communicate effectively, securely, and seamlessly.  That is the big issue that needs to be solved with distributed data centers.

Is this where you think we’re headed?  I feel like for the first time in five years I finally get what’s happening to IT.  So I’ll take comfort in that for now, until things change next month.

 

A few Nexus notes

I’ve been working with OTV and Fabricpath and I thought I’d put a few pointers down that I learned that had me scratching my head for a while.

1.  To makes sure that OTV is working, the join interfaces on each side must be able to ping each other.  Let’s say that site A has join interface 192.168.101.1 and site B has a join interface 192.168.102.1.  Before OTV can work, from the OTV VDC, you need to make sure that they can ping each other.  This will usually mean that routing is set up properly.

2.  For Fabricpath to work with OTV, I had my main aggregation VDC connected to the OTV VDC through an M1 interface.  This doesn’t work.  Instead, connect the OTV VDC to the aggregation VDC through an F1 interface.  This makes it so Fabricpath is terminated and moved into classical ethernet.  I scratched my head for probably 4 hours last night until I thought of trying that this morning.  Lessons learned.

Hopefully that helps someone.

QoS and Jumbo Frames on the Nexus 5500

Nexus 5548 UP

 

I’ve had the fortunate opportunity to have two Nexus 5548UPs in my lab to help test upgrade problems for one of my customers.  Its been great to have some gear to play with and really try to understand how it all works together.

One of the issues I’ve run up against in the past (and you may have too) is configuring jumbo frames on network switches.  Jumbo frames enable more bytes to be sent more efficiently through the data center.  The default maximum transmission unit size (MTU) on nearly all networks and server NICs is 1500 bytes.  That means if you want to send more traffic, then you need to send more frames.  When you increase the packet size to 9000 bytes then you send less headers, less frames, and more data.  The result is that it is supposed to decrease CPU load.  The adverse effects are that you may end up with higher latency and you may end up configuring a lot of end points.  That can get really complicated.

As an example of configuring jumbo frames in a data center consider all the endpoints that has to be configured:

  • Operating System:  The NIC must be set to MTU 9000.  On VMware, the vSwitch has to be set to this as well as the VMs if they will be supporting jumbo frames.
  • On UCS, the vNIC has to be mapped to a jumbo frames QoS policy.
  • The ports on the network switch must have jumbo frames enabled on the uplink.
  • The ports on the storage controller must have jumbo frames enabled.
  • The storage network interfaces must have jumbo frames configured.

So you can see there is a great deal of orchestration between several teams in the data center.  Everybody has to know what they are doing.

You can test if your jumbo frames are enabled on Linux by sending a simple ping:

1
ping -M do -s 8972 <node>

If that goes through without errors, then congratulations!  You have jumbo frames enabled from node to node.

Jumbo Frames on the Nexus 5500

There is a guide on Cisco’s web page that talks about enabling Jumbo frames.  But to do it, you have to do things like policy maps and class maps.  I’ve often thought:  Why is this so hard to do?  It seems like just an easy command would be more sufficient.

The reason goes back to the standard trade offs engineers make.  “Make it Easy” vs. “Make it Flexible”.   You can’t really have flexibility without more nerd knobs to turn.  Most of this is probably more applicable to application traffic.

The other problem I always think of:  Why can’t you just apply the MTU to the interface?  With the Nexus 5500 you don’t typically set it to the interface.  But can you?  Kind of.

QoS Class Map and Policy Maps

The architecture of Nexus 5500s is a bit different than that of Catalyst switches.  The buffering is done more on the ingress ports.  (The ports where traffic enters).  As such the first thing you’ll want to do is “tag” or “classify” the traffic that comes into the port.  You do this with a class-map command of type QoS.  How do you identify traffic?

The most common way to match traffic is with either an IP Access List, or by the protocol.  Let’s do it:

1
2
3
4
5k-top# conf
Enter configuration commands, one per line. End with CNTL/Z.
n5k-top(config)# class-map myTraffic
n5k-top(config-cmap-qos)# match protocol iscsi

Here we’re just matching iSCSI traffic. That’s one that we might want to do Jumbo Frames on. But we could also do something for IP addresses. Let’s say that all hosts on the 172.20.0.0/24 network should have jumbo frames. That would make sense if this network was for storage (NFS, iSCSI or whatever).

To do that we would use an access list:

1
2
n5k-top(config)# ip access-list jumbo-list
n5k-top(config-acl)# permit ip 127.20.0.0/24 any

Now we can put that on our QoS group:

1
2
3
4
n5k-top(config-acl)# class-map type qos myTraffic
n5k-top(config-cmap-qos)# no match protocol iscsi
n5k-top(config-cmap-qos)# match access-group name jumbo-list
n5k-top(config-cmap-qos)# sh class-map type qos myTraffic
1
2
3
4
Type qos class-maps
===================
class-map type qos match-all myTraffic
match access-group name jumbo-list

 

Great! Now we need to put this into one of the qos groups that we can work on. There’s a lot more we can do, but for simplicity, we’ll just put this into qos group 2.  That is a good place to be since this traffic could be NFS or iSCSI and may be more important than normal traffic.  We need to put it in a qos group because there are 2 other QoS types that we haven’t talked about, and those QoS types classify by qos-group numbers.   Here’s the other two:

network-qos: This is the QoS type that allows for Jumbo Frames (mtu), multicast, pause-no drop, and other settings that show how a packet gets through the network.  The shape of it and how it reacts.  The others do more for what happens when a packet enters or leaves the switch.

queuing: This is the last type of QoS and does things like allocate how much bandwidth certain traffic gets as it goes through the network.  By default, 100 percent of the bandwidth goes to the default class.

So now we know:  We have 3 types of QoS settings and each of these settings requires a class-map (to tag the traffic) and a policy-map: what to do with all the traffic when it comes in.  With policy-maps on each of these classes, you’ll associate multiple class-maps with behaviors.  Finally, once we have these policy-maps for each of the three types of classes, we assign this to the system QoS.

Let’s finish the qos type.  So far we’ve only made a class-map for it.  But now, we want a policy-map.  We may have three types of traffic:  Default traffic, myTraffic, and fcoe.  FCoE and the default traffic are there “by default”.  So let’s make a policy-map with those traffic types:

1
2
3
4
5
n5k-top(config-cmap-nq)# policy-map type qos myQoS-policy
n5k-top(config-pmap-qos)# class type qos myTraffic
n5k-top(config-pmap-c-qos)# set qos-group 2
n5k-top(config-pmap-c-qos)# class type qos class-fcoe
n5k-top(config-pmap-c-qos)# set qos-group 1

There, now we have three traffic lanes marked. The default was put in there for us. Check it out:

1
n5k-top(config-pmap-c-qos)# sh policy-map type qos myQoS-policy

Type qos policy-maps
====================

policy-map type qos myQoS-policy
class type qos myTraffic
set qos-group 2
class type qos class-fcoe
set qos-group 1
class type qos class-default
set qos-group 0

Ok, now we want to set our jumbo frames. That means we need to create a network-qos type of QoS. First we have to mark what we want. Our only options are to classify by qos-groups for the network-qos type of QoS. So let’s create a class-map:

1
2
n5k-top(config-cmap-que)# class-map type network-qos myTraffic
n5k-top(config-cmap-nq)# match qos-group 2

Notice that I kept the name the same in the network-qos type of QoS as I did for the qos type of QoS. This makes things a bit easier. Now we just need to create a policy-map using this class-map, as well as the fcoe class-map (the fcoe class-map is created by default)

1
2
3
4
5
6
7
n5k-top(config-cmap-nq)# policy-map type network-qos myNetwork-QoS-policy
n5k-top(config-pmap-nq)# class type network-qos myTraffic
n5k-top(config-pmap-nq-c)# mtu 9216
n5k-top(config-pmap-nq-c)# class type network-qos class-fcoe
n5k-top(config-pmap-nq-c)# pause no-drop
n5k-top(config-pmap-nq-c)# mtu 2158
n5k-top(config-pmap-nq-c)# sh policy-map type network-qos myNetwork-QoS-policy

Type network-qos policy-maps
===============================

policy-map type network-qos myNetwork-QoS-policy
class type network-qos myTraffic

mtu 9216
class type network-qos class-fcoe

pause no-drop
mtu 2158
class type network-qos class-default

mtu 1500
multicast-optimize

Ok! 2 down, 1 to go. The queueing policy. We just need to split these loads. First we need to mark the traffic. Again, we can only use qos-groups to do that. So this time we’ll create a queuing type of traffic. We’ll call the name the same as we did the last two class-maps:

1
2
n5k-top(config-pmap-c-qos)# class-map type queuing myTraffic
n5k-top(config-cmap-que)# match qos-group 2

Now that we have that, lets associate it to a policy map. (just like the others). Here we can just split it in half. We don’t really know what our traffic will be. (Maybe you do in your network). So we’re just going to give most of it to our jumbo frame traffic (50%), and then we’ll give 25% to the other two types (fcoe and default).

1
2
3
4
5
6
7
8
n5k-top(config-pmap-nq-c)# policy-map type queuing myQueuing-policy
n5k-top(config-pmap-que)# class type queuing myTraffic
n5k-top(config-pmap-c-que)# bandwidth percent 50
n5k-top(config-pmap-c-que)# class type queuing class-fcoe
n5k-top(config-pmap-c-que)# bandwidth percent 25
n5k-top(config-pmap-c-que)# class type queuing class-default
n5k-top(config-pmap-c-que)# bandwidth percent 25
n5k-top(config-pmap-c-que)# sh policy-map type queuing myQueuing-policy

Type queuing policy-maps
========================

policy-map type queuing myQueuing-policy
class type queuing myTraffic
bandwidth percent 50
class type queuing class-fcoe
bandwidth percent 25
class type queuing class-default
bandwidth percent 25

Boom! Just like that, we now have our three QoS policies created. One for type qos called myQos-policy. One for type network-qos called myNetwork-QoS-policy. And finally, the one we just created for type queuing called myQueuing-policy.

What’s left? Now we just need to apply this to the system:

1
2
3
4
n5k-top(config-sys-qos)# service-policy type qos input myQoS-policy
n5k-top(config-sys-qos)# service-policy type network-qos myNetwork-QoS-policy
n5k-top(config-sys-qos)# service-policy type queuing input myQueuing-policy
n5k-top(config-sys-qos)# service-policy type queuing output myQueuing-policy

Notice for the Queuing policy we applied it twice. Once to the input and once to the output. The QoS type is only applied to the input, because that’s where traffic comes in and is marked. The network-qos effects input and output but should be the same for all, so we only configure it once.

Well, hopefully that wasn’t too confusing.  You can obviously see the power in this.  If we had voice, video, and other types of traffic coming through here, we could add it to our policy-maps after we tag it.  With QoS, the settings should be the same on all switches in the datacenter.  If you are running a VPC, you’ll have to make sure that the other switch has the same set up, otherwise you’ll get a Type two error:

1
2
3
n5k-top(config-sys-qos)# sh vpc
Legend:
(*) - local vPC is down, forwarding via vPC peer-link

vPC domain id : 1
Peer status : peer adjacency formed ok
vPC keep-alive status : peer is alive
Configuration consistency status : success
Per-vlan consistency status : success
Type-2 consistency status : failed
Type-2 inconsistency reason : QoSMgr Network QoS configuration incompatible
vPC role : primary
Number of vPCs configured : 3
Peer Gateway : Disabled
Dual-active excluded VLANs : -
Graceful Consistency Check : Enabled
Auto-recovery status : Disabled

vPC Peer-link status
———————————————————————
id Port Status Active vlans
– —- —— ————————————————–
1 Po1 up 1,500

vPC status
—————————————————————————-
id Port Status Consistency Reason Active vlans
—— ———– —— ———– ————————– ———–
10 Po10 up success success 1,100,500
20 Po20 up success success 1,100,500
80 Po80 up success success 1,100,500

See? Nobody wants a type 2 error. Apply settings throughout the data center!

Hope this helps someone struggling with getting jumbo frames on a Nexus 5k.

FCoE with UCS C-Series

I have in my lab a C210 that I want to turn into an FCoE target storage.  I’ll write more on that in another post.  The first challenge was to get it up with FCoE.  Its attached to a pair of Nexus 5548s.  I installed RedHat Linux 6.5 on the C210 and booted up.  The big issue I had was that even though RedHat Linux 6.5 comes with the fnic and enic drivers, the FCoE never happened.  It wasn’t until I installed the updated drivers from Cisco that I finally saw a flogi.  But there were other tricks that you had to do to make the C210 actually work with FCoE.

C210 CIMC

The first part to start is looking in the CIMC (with the machine powered on) and configure the vHBAs. From the GUI go to:

Server -> Inventory

Then on the work pane, the ‘Network Adapters’ tab, then down below select vHBAs.  Here you will see two vHBAs by default.  From here you have to set the VLAN that the vHBA will go over.  Clicking the ‘Properties’ on the interface you have to select the VLAN.  I set the MAC address to ‘AUTO’ based on a TAC case I looked at, but this never persisted.  From there I entered the VLAN.  VLAN 10 for the first interface and VLAN 20 for the second interface.  This VLAN 10 matches the FCoE VLAN and VSAN that I created on the Nexus 5548.  On the other Nexus I creed VLAN 20 to match FCoE VLAN 20 and VSAN 20.

This then seemed to require a reboot of the Linux Server for the VLANs to take effect.  In hindsight this is something I probably should have done first.

RedHat Linux 6.5

This needs to have the Cisco drivers for the fnic.  You might want to install the enic drivers as well.  I got these from cisco.com.  I used the B series drivers and it was a 1.2GB file that I had to download all to get a 656KB driver package.  I installed the kmod-fnic-1.6.0.6-1 RPM.  I had a customer who had updated to a later kernel and he had to install the kernel-devel rpm and recompile the driver.  After it came up, it worked for him.

With the C210 I wanted to bond the 10Gb NICs into a vPC.  So I did an LACP bond with Linux.  This was done as follows:

Created file: /etc/modprobe.d/bond.conf

alias bond0 bonding
options bonding mode=4 miimon=100 lacp_rate=1

Created file: /etc/sysconfig/network-scripts/ifcfg-bond0

DEVICE=bond0
IPADDR=172.20.1.1
ONBOOT=yes
NETMASK=255.255.0.0
STARTMODE=onboot
MTU=9000

Edited the /etc/sysconfig/network-scripts/ifcfg-eth2

DEVICE=eth2
MASTER=bond0
SLAVE=yes
HWADDR=58:8D:09:0F:14:BE
TYPE=Ethernet
UUID=8bde8c1f-926f-4960-87ff-c0973f5ef921
ONBOOT=yes
NM_CONTROLLED=no
BOOTPROTO=none

Edited the /etc/sysconfig/network-scripts/ifcfg-eth3

DEVICE=eth3
MASTER=bond0
SLAVE=yes
HWADDR=58:8D:09:0F:14:BF
TYPE=Ethernet
UUID=6e2e7493-c1a1-4164-9215-04f0584b338c
ONBOOT=yes
NM_CONTROLLED=no
BOOTPROTO=none

Next restart the network and you should have a bond. You may need to restart this after you configure the Nexus 5548 side.

service network restart

Nexus 5548 Top
Log in and create VPCs and stuff.  Also don’t forget to do the MTU 9000 system class.  I use this for jumbo frames in the data center.

policy-map type network-qos jumbo
class type network-qos class-default
mtu 9216
multicast-optimize
system qos
service-policy type network-qos jumbo

One thing that drives me crazy is that you can’t do sh int po 4 to see that the MTU is 9000. From the documents, you have to do

sh queuing int po 4

to see that your jumbo frames are enabled.

The C210 is attached to ethernet port 1 on each of the switches.  Here’s the Ethernet configuration:

The ethernet:

interface Ethernet1/1
switchport mode trunk
switchport trunk allowed vlan 1,10
spanning-tree port type edge trunk
channel-group 4

The port channel:

interface port-channel4
switchport mode trunk
switchport trunk allowed vlan 1,10
speed 10000
vpc 4

As you can see VLAN 10 is the VSAN. We need to create the VSAN info for that.

feature fcoe
vsan database
vsan 10
vlan 10
fcoe vsan 10

Finally, we need to create the vfc for the interface:

interface vfc1
bind interface Ethernet1/1
switchport description Connection to NFS server FCoE
no shutdown
vsan database
vsan 10 interface vfc1

Nexus 5548 Bottom
The other Nexus is similar configuration.  The difference is that instead of VSAN 10, VLAN 10, we use VSAN20, VLAN 20 and bind the FCoE to VSAN 20.  In the SAN world, we don’t cross the streams.  You’ll see that the VLANS are not the same in the two switches.

Notice that in the below configuration, VLAN 20 nor 10 is defined for through the peer link so you’ll only see VLAN 1 enabled on the vPC:

N5k-bottom# sh vpc consistency-parameters interface po 4

Legend:
Type 1 : vPC will be suspended in case of mismatch

Name Type Local Value Peer Value
————- —- ———————- ———————–
Shut Lan 1 No No
STP Port Type 1 Default Default
STP Port Guard 1 None None
STP MST Simulate PVST 1 Default Default
mode 1 on on
Speed 1 10 Gb/s 10 Gb/s
Duplex 1 full full
Port Mode 1 trunk trunk
Native Vlan 1 1 1
MTU 1 1500 1500
Admin port mode 1
lag-id 1
vPC card type 1 Empty Empty
Allowed VLANs – 1 1
Local suspended VLANs – – -

But on the individual nodes you’ll see that the VLAN is enabled in the VPC. VLAN 10 is carrying storage traffic.

# sh vpc 4

vPC status
—————————————————————————-
id Port Status Consistency Reason Active vlans
—— ———– —— ———– ————————– ———–
4 Po4 up success success 1,10

Success?

How do you know you succeeded?

N5k-bottom# sh flogi database
——————————————————————————–
INTERFACE VSAN FCID PORT NAME NODE NAME
——————————————————————————–
vfc1 10 0x2d0000 20:00:58:8d:09:0f:14:c1 10:00:58:8d:09:0f:14:c1

Total number of flogi = 1.

You’ll see the login. If not, then try restarting the interface on the Linux side. You should see a different WWPN in each Nexus. Another issue you might have is that the VLANS may be mismatched, so make sure you have the right node on the right server.

Let me know how it worked for you!

Changing UCS IP addresses

I have a UCS lab machine that I sometimes take to different locations for proof of concept work.  One of the things I regularly have to do is change the password and hostname.  Here’s how you do it on the command line:

KCTest-A# scope fabric-interconnect a
KCTest-A /fabric-interconnect # set out-of-band ip 10.1.1.23 netmask 255.255.255.0 gw 10.1.1.1
Warning: When committed, this change may disconnect the current CLI session
KCTest-A /fabric-interconnect* # scope fabric-interconnect b
KCTest-A /fabric-interconnect* # set out-of-band ip 10.1.1.24 netmask 255.255.255.0 gw 10.1.1.1
Warning: When committed, this change may disconnect the current CLI session
KCTest-A /fabric-interconnect* # scope system
KCTest-A /system* # set virtual-ip 10.1.1.25
KCTest-A /system* # set name ccielab
KCTest-A /system* # commit-buffer

 

It’s great because you can change all the IP addresses on each server, the virtual server, and the hostname in one shot.

Source of docs

 

1000v in and out of vCenter

I was setting up the Nexus 1110 (aka: virtual service appliance, aka: VSA) with one of our best customers and as we were doing it the appliance rebooted never to come up again without completely reinstalling the firmware from the remote media.  Most of this was probably my fault because I didn’t follow the docs exactly, and I think we can now move forward, but it made me realize I hadn’t written down an important way to reconnect to an orphaned 1000v from a new virtual supervisor module (VSM).
Here’s the situation:  When you lose the 1000v that is connecting into vCenter, there is no way to remove the virtual distributed switch (VDS or DVS) that the 1000v presented to vCenter.  You can remove hosts from the DVS but you can’t get rid of that switch.
In the above picture, there is my DVS.  If I try to remove it, I get the following error:
In my case, I didn’t want to get rid of it, I just wanted to reconnect a new VSM that I created with the same name.  But this operation can be used to remove the 1000v DVS from vCenter as well.
So here’s how you do it:
Adopt an  Orphaned Nexus 1000v DVS
Install a VSM.  I usually do mine manually, so that it doesn’t try to register with vCenter or one of the hosts.  Don’t do any configuration, other than an IP address.  Just get it so that you can log in.  Once you can log in, if you did create an SVS connection you’ll need to disconnect.  In mine, I made an svs connection and called it venter.  To disconnect from vCenter and erase the svs connection run:
# config
# svs connection vcenter
# no connect
# exit
# no svs connection venter
Trivia: What does SVS stand for?  “Service Virtual Switch
Step 2.  Change the hostname to match what is in vCenter
Looking at the error picture above, you can see there is a folder named nexus1000v with a DVS named nexus1000v.  To make vCenter think that this new 1000v is the same one, we need to change the name to match what is in vCenter
nexus1000v-a(config)# conf
nexus1000v-a(config)# hostname nexus1000v
nexus1000v(config)#
Step 3.  Build SVS Connection
Since we destroyed (or never built) the SVS connection in step 1, we’ll need to build one and try to connect.  The SVS connection should have the same name as the one you created when you first made you SVS.  So if you called your SVS ‘vCenter’, or ‘VCENTER’, or ‘VMware’ then you’ll need to name it the same thing.  I named mine ‘vcenter’ so that’s what I use.  Similarly, you’ll have to create the datacenter-name the same as what you had before.
nexus1000v(config)# svs connection vcenter
nexus1000v(config-svs-conn)# remote ip address 10.93.234.91 port 80
nexus1000v(config-svs-conn)# vmware dvs datacenter-name Lucky Lab
nexus1000v(config-svs-conn)# protocol vmware-vim
nexus1000v(config-svs-conn)# max-ports 8192
nexus1000v(config-svs-conn)# admin user n1kUser
nexus1000v(config-svs-conn)# connect
ERROR:  [VMware vCenter Server 5.0.0 build-455964] Cannot create a VDS of extension key Cisco_Nexus_1000V_1169242977 that is different than that of the login user session Cisco_Nexus_1000V_125266846. The extension key of the vSphere Distributed Switch (dvsExtensionKey) is not the same as the login session’s extension key (sessionExtensionKey)..
Notice that when I tried to connect I got an error.  This is because the extension key in my Nexus 1000v (that was created when it was installed) doesn’t match what the old one is.  The nice thing, is I can actually change that, and that is how I make this new 1000v take over the other one.

Step 4.  Change the extension key to match what is in vCenter.
To see what the current extension-key is (or the offending key is) run the following command:
nexus1000v(config-svs-conn)# show vmware vc extension-key
Extension ID: Cisco_Nexus_1000V_125266846
That is the one we need to change.  You can see the extension-key that vCenter wants from the error message we saw in the previous step.  In the previous step it showed that the extension key we wanted was ‘Cisco_Nexus_1000V_1169242977′.  So we need to make our extension-key on the 1000v match that.  No problem:
nexus1000v(config-svs-conn)# no connect
nexus1000v(config-svs-conn)# exit
nexus1000v(config)# no svs connection vcenter
nexus1000v(config)# vmware vc extension-key Cisco_Nexus_1000V_1169242977

Now we should be able to connect and run things as before.

Step 5. (Optional) Remove the 1000v

If you’re just trying to remove the 1000v because you had that orphaned one sitting around, we simply disconnect now from vCenter

nexus1000v(config)# svs connection vcenter
nexus1000v(config-svs-conn)# no connect
nexus1000v(config-svs-conn)# connect
nexus1000v(config-svs-conn)# no vmware dvs
This will remove the DVS from the vCenter Server and any associated port-groups. Do you really want to proceed(yes/no)? [yes] yes

Now, the orphaned Nexus 1000v is gone. If you want to remove it from your vCenter plugins then you will have to navigate the managed object browser and remove the extension key. Not a big deal. By opening a web browser to the host that manages vCenter (e.g.: http://10.93.234.91 ) then you can “Browse objects managed by vSphere”. From there go to “content” then “Extension Manager”. To unregister the 1000v plugin, select “UnregisterExtension” and enter in the vCenter Extension key. This will be the same extension key that you used in step 4. (In our example: Cisco_Nexus_1000V_1169242977 )

Hope that helps!

Cloud Computing: How Do I Get There?

This post comes from a talk that I’ll be presenting on at the Pacific Northwest Digital Government Summit Conference on October 2nd, 2013.

History shows us that those that embrace technology and change survive while those that resist and stick with “business as usual” get left behind.  If we have the technology and we don’t use it to make IT look like magic, then we’re probably doing it wrong. (Read “The Innovator’s Dilemma” and Clarke’s Three Law.)

I’ll be talking mainly about private cloud today, but many of these ideas can be taken into the public cloud as well.

Optimizing ROI on your Technology

My friend tells a story about when his wife first started using an iPhone.  To get directions on a map she’d open up Safari and go to http://maps.google.com.  To check Facebook she would open Safari and go to http://facebook.com.  To check her mail she’d open up Safari again and navigate to http://gmail.com.  You get the idea.

She was still getting great use of her iPhone.  She could now do things she could never do before.  But there was a big part she was missing out on.  She wasn’t using the App ecosystem that makes all of these things easier and delivers a richer experience.

Today, most organizations have virtualization in the data center.  Because of this IT is able to do things they’ve never been able to do before.  They’re shrinking their server footprints to once unimaginable levels saving money in capital and management costs.  I’ve been in many data centers  where people proudly point to where rows of racks have been consolidated to one UCS domain with only a few blades.  Its pretty cool and very impressive.

But they’re missing something as big as the App Store.  They’re missing out on the APIs.  This is where ROI is not being optimized in the data center in a big way.

IT is shifting (or has shifted) to a DevOps model. DevOps means that your IT infrastructure team is more tightly aligned with your developers/application people.  This is a management perspective.  But from a trenches perspective, the operations team is now turning into programmers.  Programmers of the data center.  The guy that manages the virtual environment, the guy who adds VLANs to switches, or the guy who creates another storage LUN: they’re all being told to automate and program what they do.

The group now treats the IT infrastructure like an application that is constantly adding features and doing bug fixes.

The programming of the IT infrastructure isn’t done in compiled languages like Java, C, or C++.  Its done in interpreted languages like Python, Ruby, Bash,  Powershell, etc.  But the languages alone don’t get you there.  You need a framework.  This is where things like Puppet or Chef come into play.  In fact, you even can look at it like you’re programming a data center operating system.  This is where OpenStack provides you a framework to develop your data center operating system.  Its analogous to the Web Application development world.  Twitter was originally developed in Ruby using a framework called Ruby on Rails.  (Twitter has since moved off Ruby on Rails).

Making this shift gives you unprecedented speed, agility, and standardization.  Those that don’t do it, will find their constituents looking elsewhere for IT services that can be delivered faster and cheaper.

The IT assembly line

Its hard for people to think of their IT professionals as assembly line workers.  After all, they are doing complex things like installing servers, configuring networks, and updating firmware.  These are CCIEs, VCPs, and Storage Gurus.  But that’s actually what people in the trenches are:  Workers of the virtual Assembly line.  IT managers should look at the way work enters the assembly line, understand the bottlenecks, and track how long it takes to get things through the line.  Naturally, there are exceptions that crop up.  But for the most part, the work required to deliver applications to the business are repetitive tasks.  They’re just complicated, multi-step, repetitive tasks.

To start with, we need to look at the common requests that come in:  Creating new servers, deploying new applications, delivering a new test environment.  Whatever it is, management really needs to understand how it gets done, and look at it like the manufacturing foreman sitting above the plant, looking down and watching a physical product make its way through.  Observe which processes are in place, where they are being side stepped, or where they don’t exist at all.

As an example, consider all the steps required to deploy a server.  It may look something like the flowchart below:

That sure looks like an assembly line to me.  If you can view work that enters the infrastructure like an assembly line, you can start measuring how long it takes for certain activities to get done.  Then you can figure out ways to optimize.

Standardization of the Infrastructure

Manufacturing lines optimize throughput by standardizing processes and equipment.  When I hear VMware tell everybody that “the hardware doesn’t matter”, I take exception.  It matters.  A lot.  Just like your virtualization software matters.  Cisco and other hardware venders come from it the opposite direction and say “the hypervisor doesn’t matter, we’ll support them all”.  What all parties are really telling you is that they want you to standardize on them.  All parties are trying to prove their value in a private cloud situation.

What an organization will standardize on depends on a lot of things: Budget, skill set of Admins, Relationship with vendors and consultants, etc.  In short, when considering the holy trinity of the data center: Servers, Storage, & Networking it usually gets into a religious discussion.

But whatever you do, the infrastructure needs to be robust.  This is why the emergence of Converged Infrastructures like Vblocks, FlexPods, and other reference architectures have become popular.  The  “One-Piece-At-A-Time” accidental/cobbled architecture is not a good play.

Consider the analogy that a virtualized workload is cargo on a Semi Truck.  Do you want that truck running over a 6 lane solid government highway like I-5 or do you want that stuff traveling at 60mph down a rinky bridge?

This?

Or This?

Similarly, if your virtualization team doesn’t have strong Linux skills, you probably don’t want them running OpenStack on KVM.  That’s why VMware and Hyper-V are so popular.  Its a lot easier for most people’s skill level.

What to Standardize On?

While the choice of infrastructure standardization is a religious one, there are role models we can look to when deciding.  Start out by looking at the big boys, or the people you aspire to be when you grow up.  Who are the big boys that are running a world class IT as a service infrastructure?  AWS, RackSpace, Yahoo, Google, Microsoft, Facebook, right?

What are they standardizing on?  Chances are its not what your organization is doing.  Instead of VMware, Cisco, IBM, HP, Dell, EMC, NetApp, etc, they’re using open source, building their own servers, and using their own distributed filesystems.  They do this because they have a large investment in their DevOps team that is able to put these things together.

A State organization that has already standardized on a FlexPod or Vblock with VMware is not going to throw away what they’ve done and start over just so they can match what the big boys do.  However, as they move forward, perhaps they can make future decisions based on emulating these guys.

Standardize Processes

The missing part is standardizing the processes once the infrastrucutre is in place.  Standardization is tedious because it involves looking at every detail of how things are done.  One of my customers has a repository of documentation they use every time they need to do something to their infrastructure.  For example, 2 weeks ago we added new blade servers to the UCS.  He pulled out the document and we walked through it.  There were still things we modified in the documentation, but for the most part the steps were exact.

Unfortunately, this was only one part of the process.  The Networking team had their own way of keeping notes (or not at all) on how to do things.  So the processes were documented in separate places.  What the IT manager needs to do is make sure they understand how the processes (or work centers) are put together and how long each one takes.

The manager should be able to have their own master process plan to be able to track work through the system.  (The system being the different individuals doing the work).  This is what is meant by “work flow”.  Even if they just do this by hand or as is commonly done with a Gantt chart, there should be some understanding.

Each job that comes in, should get its own workflow, or Gantt Chart, and entered into something like a Kanban board.  Once you understand this for the common requests, you can see how many one offs there are.

Whether these requests are for public cloud or private cloud, there is still a workflow.  It is an iterative process that may not be complete the first few times it is done, but over time will become better.  There is a great book called “The Phoenix Project” that talks about how the IT staff starts to standardize and work together between development and operations to get their processes better.  These ideas are based off an earlier business classic called “The Goal”

Automate the Processes

Once the processes are known we turn our assembly line into programmers of the processes.  I used to worked as a consulting engineer to help deploy High Performance Computing clusters.  On several occasions the RFPs required that the cluster be able to be deployed from scratch in less than 1 hour.  From bare metal, to running jobs.  We created scripts that would go through and deploy the OS, customize the user libraries, and even set up a job queuing system.  It was pretty amazing to see 1,200 bare metal rack mount servers do that.  When we would leave, if the customer had problems with a server then they could replace it, plug it in, and walk away.  The system would self provision.

While that was a complicated process and still is, it is still simpler than what virtualization has done to the management of the data center.  We never had to mess with the network once it was set up.  Workflows for a new development environment are pretty common and require provisioning several VMs with private networks and their own storage.  However, the same method of scripting the infrastructure can still be applied.  It just needs to be orchestrated.

Automate and Orchestrate with a Framework

Back when we did HPC systems, we used an open source management tool called xCAT.  That was the framework by which we managed the datacenter.  The tool had capabilities but really what it gave us was a framework to insert our customizations or our processes that were specific for each site.  The tool was an enabler of the solution, not the solution itself.

Today there are lots of “enterprise” private cloud management tools.  In fact, any company that wants to sell a “Private Cloud”  will have its own tool.  VMware vCloud Director, HP Cloud System, IBM Cloudburst, Cisco UCS Director, etc.  All of these products, regardless of how they are sold should be regarded as frameworks for automating your processes.

At a recent VMUG, the presenter asked “How many people are using vCloud Director or any other cloud orchestration tool?”  Nobody raised their hand.  Based on what I’ve seen its because most organizations haven’t yet standardized their IT processes.  There is no need for orchestration if you don’t know what you’re orchestrating.

Usually each framework will come with a part or all of what Cisco calls the “10 domains of cloud” which may include: A self service portal, chargeback/showback, service catalog, security, etc.  If you are using a public cloud, you are using their framework.

Once you select one, you’ll need to get the operations teams (network, storage, compute, virtualization) to sign off and use the tool.  Its not just a server thing.  Each part of the assembly line needs to use it.

Once the individual components are entered into the framework, then the orchestration comes to play.  To start with, codify the most common workloads:  Creating VLAN, Carving out a LUN, Provisioning a VM, etc.

To orchestrate means to arrange or control the elements of, as to achieve a desired overall effect.  With the Framework, we are looking to automate all of the components to deliver a self service model to our end customer.

Self Service and Chargeback

Once we have the processes codified in the framework, we can now present a catalog to our users.  With a self service portal we recommend it not being completely automated to start out with.  With some frameworks, as a workload moves through the automated assembly line, it can send an email to the correct IT department to validate whether a workflow can move through.  So for example, if the user as part of the workflow wants a new VLAN for their VM environment, the networking administrator will receive an email and will be able to approve or deny.  This way, the workflow is monitored, the end requester knows where they are in the queue, and  once it is approved, it gets created automatically, then gets passed along to the next item in the assembly line.

For chargeback, the recommendation is to keep the menu small, and the price simple.

Security all throughout then Monitor, Rinse, and Repeat

More workflows will come into the system and the catalog will need to continuously need updating and revisions.  This is the programmable data center.  Iterations should be checked into a code repository similarly to how application developers use systems like github.com to store code updates.  You will have to do bug fixes and patch up any exposed holes.  With virtualization comes the ability to integrate more software security services like the ASA 1000v, or the VSG.

Action Items

  • Realize that your IT infrastructure is a collection of APIs waiting to be harnessed and programmed.  Challenge the people you work with to learn to use those APIs to automate their respective areas of expertise.
  • Optimize the assembly line by understanding the workflows.  Any manufacturing manager can tell you the throughput of the system.  An IT manager should be able to tell you the same thing about their system.  Start by understanding the individual components, how long it takes, and where the bottlenecks in the system are.
  • Standardize your infrastructure with a solid architecture.  Converged architectures are popular for a reason.  Don’t reinvent the wheel.
  • Standardizing processes is the hardest part.  Start with the most common.  These are usually documented.  Take the documentation and think how you would change it into code.
  • Program the DataCenter using a Framework.  Most of the work will have to be done in house or with service contracts.  The framework could be something like a vendors cloud software or something free like OpenStack.

 

Quick SPAN with the Nexus 1000v

Today I thought I’d take a look at creating a SPAN session on the 1000v to monitor traffic.  I found it really easy to do!  SPAN is one of those things that takes you longer to read and understand than to actually configure.  I find that true with a lot of Cisco products:  Fabric Path, OTV, LISP, etc.

SPAN is “Switched Port Analyzer”.  Its basically port monitoring.  You capture the traffic going from one port and then mirror it on another.  This is one of the benefits you get out of the box for the 1000v that enables the network administrator not to have this big black box of VMs.

To follow the guide, I installed 3 VMs.  iperf1, iperf2, and xcat.  The idea was I wanted to monitor traffic between iperf1 and iperf2 on the xcat virtual machine.

On the xcat virtual machine I created a new interface and put it in the same VLAN as the other VMs.  These were all on my port-profile called “VM Network”.  I created it like this:

conf
vlan 5
port-profile type vethernet “VM Network”
vmware port-group
switchport mode access
switchport access vlan 510
no shutdown
state enabled

Then, using vCenter I edited the VMs to assign them to that port group. (Remember: VMware Port-Group = Nexus 1000 Port-Profile)

On the Nexus 1000v Running the command:

# sh interface virtual

——————————————————————————-
Port Adapter Owner Mod Host
——————————————————————————-
Veth1 vmk3 VMware VMkernel 4 192.168.40.101
Veth2 vmk3 VMware VMkernel 3 192.168.40.102
Veth3 Net Adapter 1 xCAT2 3 192.168.40.102
Veth4 Net Adapter 2 iPerf2 3 192.168.40.102
Veth5 Net Adapter 3 xCAT 3 192.168.40.102
Veth6 Net Adapter 2 iPerf1 3 192.168.40.102

Allows me to see which vethernet is assigned to which VM. In this SPAN session, I decided I wanted to monitor the traffic coming out of iPerf1 (Veth6) on the xCAT VM (veth5).
No problem:

Create The SPAN session

To do this, we just configure a SPAN session:

n1kv221(config-monitor)# source interface vethernet 6 both
n1kv221(config-monitor)# destination interface vethernet 5
n1kv221(config-monitor)# no shutdown

As you can see from above, I’m monitoring both received and transmitted packets from vethernet 6( iPerf1). Then those packets are being mirrored to vethernet 5 (xCAT). If you have an IP address on xCAT (vethernet 5) you’ll find you can no longer ping it. The port is in span mode. Notice also that by default the monitoring session is off. You have to turn it on.

Now we want to check things out:

n1kv221(config-monitor)# sh monitor
Session State Reason Description
——- ———– ———————- ——————————–
1 up The session is up
n1kv221(config-monitor)# sh monitor session 1
session 1
—————
type : local
state : up
source intf :
rx : Veth6
tx : Veth6
both : Veth6
source VLANs :
rx :
tx :
both :
source port-profile :
rx :
tx :
both :
filter VLANs : filter not specified
destination ports : Veth5
destination port-profile :

Now, you’ll probably want to monitor the port right? I just installed wireshark on my xcat vm. (Its linux, yum -y install wireshark and ride). To watch from the command line I just ran the command:

root@xcat ~]# tshark -D
1. eth0
2. eth1
3. eth2
4. eth3
5. any (Pseudo-device that captures on all interfaces)
6. lo

This gives me the interfaces. By matching the MAC addresses, I can see that eth2 (or device 3 from the wireshark output) is the one that I have on the Nexus 1000v.

From here I run:

[root@xcat ~]# tshark -i 3 -R “eth.dst eq 00:50:56:9C:3B:13″
0.000151 192.168.50.151 -> 192.168.50.152 ICMP Echo (ping) reply
1.000210 192.168.50.151 -> 192.168.50.152 ICMP Echo (ping) reply
2.000100 192.168.50.151 -> 192.168.50.152 ICMP Echo (ping) reply
..

Then I get a long list of fun stuff to monitor. By pinging between iperf1 and iperf2 I can see all the traffic that goes on. Since there was nothing else on this VLAN it was pretty easy to see. Hopefully this helps me or you troubleshoot down the road.

UCS Reverse Path Forwarding and Deja-Vu checks

UCS Fabric Interconnects are usually always run in end-host mode.  At this point in the story there really isn’t that many reasons to use switch-mode on the Fabric Interconnects.

Two checks, or features that make End Host Mode possible are Reverse Path Forwarding (RPF) checks and Deja-Vu checks.

RPF and Deja-Vu (from Cisco.com)

Reverse Path Forwarding Checks

Each server in the chassis is pinned dynamically (or you can set up pin groups and do it statically, but I don’t recommend that) to an uplink on Fabric Interconnect A and Fabric Interconnect B.  Let’s say you have 2 uplinks on port 31 and 32 of your Fabric Interconnect.  Server 1/1 (chassis 1 / blade 1)  may be pinned to port 31.  If a unicast packet is received for server 1/1 on uplink port 31, it will go through.  But if that same packet destined for server 1/1 is received on port 32, it will be dropped.  That’s because RPF checks to see if the destination for the unicast is actually forwarding its uplink traffic through that link.

Deja Vu Checks

The other check is called “Deja-Vu” .  In the Cisco documentation it says: “Server traffic received on any uplink port, except its pinned uplink port is dropped“.  That sounds a lot like RPF.  Another presentation from Cisco live states it this way: “Packet with source MAC belonging to a server received on an uplink port is dropped

An example to clear it up

VM A on server 1/1 wants to talk to VM B located somewhere else.  The Fabric Interconnects in this case are connected to a single Nexus 5500 switch.  The VM is pinned to one of the VNICs and that VNIC is pinned to go out port 31 of Fabric Interconnect A.  So what happens?

First the VM will send an ARP request.  An ARP request basically says:  I know the IP address but I want the MAC address.  (Obviously, this is in the same Layer 2 VLAN and subnet).  If Fabric Interconnect A doesn’t find the IP/MAC association in its CAM table, then it will not flood the server ports down stream.  That is something a switch would do.  The Fabric Interconnect is different.  The reason the Fabric Interconnect doesn’t send a broadcast down its server ports is because it is a source of truth and knows everyone connected on its server ports.

What it will do instead is forward the ARP request (unknown unicast) up the designated uplink (port 31).  Now the Nexus switch is a switch.  (And a very good one at that).  It will say:  “Hey, I don’t have a CAM table entry for VM B IP/MAC so I will do what we switches do best:  Flood all the ports! (except the port that the unknown unicast/ARP request came in on)

Remember Fabric Interconnect A port 32 is connected to this same switch as port 31 where the unknown unicast (ARP request) went out.  The Nexus 5500 will send this unknown unicast to port 32 just like every other port.  But port 32 says:  Wait a minute, the source address originated from me.  Deja-vu!  So he drops the packet.

Fabric Interconnect B has two ports 31 and 32 that will also receive the unknown unicast.  If VM B is pinned to a VNIC that is pinned to port 31 on Fabric Interconnect B, he will say:  I got this!  And the packet will go through.  Port 32, however on FI-B will look at the destination MAC and say:  This is not pinned to me, so I’ll drop the packet.  That is the RPF check.

To sum it up

Deja-Vu check:  don’t receive a packet from the upstream switch that originated from me.

Reverse Path Forward Check:  don’t receive a packet if there’s no server pinned to this uplink.