Category Archives: Uncategorized

Installing Cisco DCNM on Red Hat Linux

DCNM is Cisco’s GUI for managing MDS and Nexus products.  It’s pretty great for getting a visual of how things are configured and performing.

I thought I would go into a little more detail than I’ve seen posted online about installing DCNM on RedHat Linux.  In this example we’ll be installing two servers.  One server will be our app server and the other one will be our Postgres database server.  You can do it all in just one server, but where is the fun in that?

1. Download binaries

From Cisco’s homepage, click support.  In the ‘Downloads’ section start typing in ‘Data Center Network’.  (DCNM showed no results when I tried it) You’ll see the first entry is Cisco Prime DCNM as shown below.

c51541d95b8f5ce13af56c04f500248c

We will be using DCNM 6.3.2 since its the latest and works great.  We need to download 2 files.

a5d7fb529e317d463fc85375727c5121

f7cd12ccc5167c06be64ed2580cbf085

The installer is really all you need, but its kind of nice to use the silent installer to script the installation process.

2.  Initial VM installation

Using the release notes as our guide as well as other installation instructions we will be creating two VMs with the following characteristics:

Processors 2 x 2GHz cores
Memory 8GB (8096MB)
Storage 64-100GB

 

For this installation, we’re just doing this as a test, so you may need more space.  Also, notice that in the release notes it states that when doing LAN and SAN monitoring with DCNM you need to use an Oracle Database.  A Postgres Database is supported on just SAN for up to 2000 ports or just LAN for up to 1000 ports.

Create these VMs.  I’m using KVM but you can use vSphere or Hyper-V.

3.  Operating System Installation 

The installation guides show that RHEL 5.4/5.5/5.6/5.7/6.4 (32-bit and 64-bit) are supported.  I’m using RHEL 6.5 x86_64.  It comes by default with PostgreSQL 8.4.  So I might be living on the edge a little bit, but I had 0 problems with the OS.

I installed two machines:

dcnm-app 10.1.1.17
dcnm-db 10.1.1.18

During the installation, I changed 2 things, but other than setting up the network I accepted the defaults with nearly everything.

3.1 dcnm-app

I set up as a Desktop as shown below.

0b4a589c1a2dc22e91835e291f2e0dc7

 

3.2 dcnm-db

Set up as a Database server as shown below

8aaad864f638c1760b7b8f3b220f3eb4

4. Operating System Configuration

There are several quick things to do to get this up and running.  You probably have OS hardening procedures at your organization, but this is howI did it to get up and running.   Do the following on both servers.

4.1 Disable SELinux

Does anybody besides Federal agencies use this?  Edit /etc/sysconfig/selinux.

Change the line to be:

SELINUX=disabled.

This then requires a reboot.

4.2 Disable iptables

Yeah, I’m just closing the firewall.  There are some ports pointed out in the installation guide you can use to create custom firewalls, but I’m just leaving things wide open.

service iptables stop
chkconfig --del iptables

4.3 Enable YUM

If you set your server up with the RedHat network then you are ready to go.  I’m just going to keep it local bro!  I do this by mounting an expanded RedHat installation media  via NFS.  Here’s how I do it:

mkdir /media/rhel6.5
mount 10.1.1.100:/install/rhels6.5/x86_64/media/rhel6.5

If you are cool then you can put it in /etc/fstab so it persists.

I then created the file /etc/yum.repos.d/local.repo.  I edited it to look like the below:

[local]
name=Red Hat Enterprise Linux $releasever - $basearch - Source
baseurl=file:///media/rhel6.5
enabled=1
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-release

4.4 Install additional RPMs as needed

One that you will need on dcnm-app is glibc.i686

yum -y install glibc.i686

5. Database Installation on dcnm-db

This step is only needed on dcnm-db.  Using the info from the database installation guide we are using Postgres.  If you followed above like I did then you should just be able to see all the postgres RPMs installed.

If not, then you can install them all with

yum -y groupinstall 'PostgreSQL Database Server'

Next, start up the data base:

service postgresql initdb
service postgresql start

With the default installation of Postgres on RedHat, a user named postgres is created who pretty much does everything. We use him to configure the database.

su - postgres
created dcmdb
createuser -P -s -e dcnmuser

5.1 Postgres Config

Postgres on RHEL6.5 doesn’t accept network connections by default.  That makes it more secure.  To enable our App server to connect to it, we need to change two files.

/var/lib/pgsql/data/postgresql.conf

Modify this file by adding the IP address for it to listen on.  By default its set to only listen for connections on ‘localhost’.
Change this line:

listen_addresses = 'localhost'         # what IP address(es) to listen on;

To look like this:

listen_addresses = '10.1.1.18,127.0.0.1'

Or you can just make it ‘*’ (that says: listen on every interface). In my case this works because my Database servers IP address is 10.1.1.18, so I’m listening on eth0 and the local interface.

/var/lib/pgsql/data/pg_hba.conf

Modify this file by adding in a line for our DCNM user.  At the bottom of the file I added this line:

host    dcmdb       dcnmuser    10.1.1.17/32          md5

Once those two files are changed, restart postgres.

service postgresql restart

Now you should be ready to rock the Database server. We’ll check it in a minute. Now lets go over to the app server.

6.  Configure the App Server

You need to login via either VNC or on the console for XWindows.  VNC is probably the easiest way to see it remote.

yum -y install tigervnc-server

Start the VNC server and then you can VNC into it.

service vncserver start

You’ll then need to copy the dcmn installer that you downloaded from Cisco in step 1 as well as the properties file that you downloaded.  I put mine in the /tmp directory.  Change this to be an executable by running:

chmod 755 /tmp/dcnm-installer*

6.1 Modify the installer.properties

The dcnm-silent-installer-properties file is a zip file.  When expanded it has a directory called Postgres+Linux.  In this directory is the file we will use for our installation.  For the most part, I left it alone.  I just changed a few of the entries:

USE_EXISTING_DB=FALSE  # ! make sure you add this!
#USE_EXISTING_DB=TRUE  # ! comment this out!

#------------Use Existing Postgres--------------
DCNM_DB_URL=jdbc\:postgresql\://10.1.1.18\:5432/dcmdb
DCNM_DB_NAME=dcmdb
SELECTED_DATABASE=postgresql
DCNM_DB_USERNAME=dcnmuser
DCNM_DB_USER_PASSWORD=Cisco.123

 

With that, we are ready to run!

7. Install DCNM

On the App server, we finally run:

dcnm-installer-x64-linux.6.3.2.bin -i silent -f /tmp/installer.properties

If all goes well, you should be able to open a browser to dcnm-app and see the Cisco login screen.

Hurray!

The CCIE Data Center Certification Process

CCIEData_Center_UseLogo

 

On July 9th, 2014 I passed the CCIE Data Center lab exam in San Jose earning me the CCIE certification.  Hurray!  When my team heard that I had done it, their response was:  If Vallard can do it, so can I!  Ha ha.  So needless to say a few more people have started down the path to certification, of which I have no doubt they will surely reach.

I have to say it feels pretty great and the process I went through to get it was very rewarding in that it deepened my understanding of data center architectures as well as the solid hands on skills required to implement these solutions.  With the CCIE certification, its the journey that makes it so worth it.

I thought I would write a bit of my experience of the process and how I approached it.  To summarize, it took me 5 times until I passed the written exam and once I did that I passed the lab exam on my second try.  I’m not saying my approach is the best, but it worked for me and I’m happy with the outcome.  The funny thing is, even though I worked really hard and learned so much to get it, I still feel like there are many things I don’t know about the platforms.  One of the drawbacks of my position is I don’t do a lot of troubleshooting with my customers because most of the solutions Cisco offers work really well.  Take UCS for example:  I spend probably 2 hours a month at the most troubleshooting issues with it – And that’s with the hundreds of UCS systems that my customers have that I support!

In spite of that, I still know this stuff very well now.  When a coworker asked me how to configure VPCs on the Nexus 5548s – to just give him a quick and dirty config – I was able to spit it all out from memory and I knew it was right.  I’ve done it so many times now I can do it in my sleep.

Need an OTV config?  I got you covered there too.  I can do OTV on a stick light speed setting it up with multicast or adjacency servers, I don’t even care.  I can do it all.  Boom.  So yes, passing the CCIE exam gives you confidence because you learn a ton.  That’s kind of how I felt when I graduated with my computer science degree from Berkeley.  Even though the program kicked my trash and made me feel like a sorry sucker most the time, it made me believe that armed with the skills I could do anything… given enough time.

So here’s my experience:

The Written Exam

The CCIE Data Center written exam topics are spelled out pretty clear on the Cisco Learning Network page.    I first took the test, in its beta form and I knew very little about the Nexus product line other than a few switches I had set up before.  I took the test without studying.  Zero prep.  Didn’t even look to see what was on it.  You see, I had to have humility beaten into me.  Anyway, I failed miserably.  Seriously.  I thought I was the man at UCS.  I got less than 20% right on it.  I blamed it on the way  the questions were worded, but in hind site, there were very clear answers that stood out among the wrong ones.  The thing was, it was hard.

After my first failure in August 2012, I gave up for about a year, not thinking it was for me.  Then I learned that a few more friends had already passed the written and were working towards the lab.  My pride made me think the same thing my team mates thought when I passed:  “If they can do it, then I can do it.”  My method, I thought would be a brute force attack on the exam.   So I took the exam again a year later in July 2013 after really working specifically on Nexus and MDS.  I felt that if given the beta exam again I could pass it.  The problem was, the exam was much different than I remembered it and again I did poorly.  When I failed, I rescheduled after realizing a few of my mistakes.  I took it again in August and September each time doing a little better, but each time not quite getting it.  By the time my December test came I was solidly prepared and just before Christmas on the 23rd of December I passed the written.

So what’s my advice on the written:

1.  If you already work in this field and have hands on with Nexus, MDS, & UCS, take the exam to see what’s on it.  CCIE is a total investment and if it takes you a few hundred dollars to pass the written exam, it might be worth it.

2.  If you fail the first time, take it again as soon as you can.  I think there are new rules going into affect that make it so you might not be able to take it as often.  However, once you start down the road to CCIE certification, you can’t stop until you’ve reached the end.  Otherwise you lose it.  That year I spent off was a waste.  I should have kept going.

3.  Once you pass the written exam, schedule the lab exam as soon as possible.  There are several months of waiting time right now and you don’t want this train to stop, so keep working towards it.

The CCIE Data Center Lab Exam

My entire IT career has been spent doing very hands on things.  I’m fortunate in that when I learn how to do something via the command line, my fingers seem to remember how to do it pretty well.  In some ways that’s bad because I have a hard time explaining things (which means maybe I don’t know how it works in the first place?) But I can usually always get things to work.  Being a fast typer helps as well.

As soon as I passed my written exam, I scheduled the lab.  The soonest I could get in was April 15, 2014.  That’s right: 4 months out. I had very little to go by other than the blueprint and Bryan McGahan’s excellent writeup.  I flew in the night before, and went to bed around 10PM, but then at 3AM had trouble going back to sleep.  I tossed and turned until about 5AM and then finally just got up, went for a 3 mile run, ate a good breakfast and showed up at Building C in San Jose 30 minutes early.  I sat in the waiting room with 14 other nervous people.  Man, I was tense.  I hadn’t felt that way since finals in undergrad.  We finally went in and I went to work.

As I was taking the exam, I tried to get that zen experience that Brian talked about in his blog, but it didn’t happen for me at all.  In fact, hardly anything happened for me.  For some reason, though, I thought I had done pretty well.  Wrong.  0% in multiple categories.

But I didn’t go into this thing the first time blindly.  I How did I prepare?  Hands on baby.  Stick time.  Yeah!

I was fortunate enough to have a pretty decent lab.  My equipment was good, but not complete.  I had:

- UCS with the old 6120s (but I’ve worked on plenty of 6248s so I wasn’t worried if that’s what they would have in the lab since I know all about unified ports.).  But 6120 fabric interconnects was all that was available to me.

- One Nexus 7010.  I had 1 Sup1 but I upgraded it to 8GB of RAM so that I could do 4+1 VDCs.  Didn’t matter, 4 would have been fine, since what I really needed was 8 VDCs.  But I made due.  I had one M1 line card and one F1 line card so that I could practice OTV, LISP, FabricPath, FCoE and layer 3 stuff.

- One Nexus 5548.  No line modules but I was fortunate enough to have layer 3 capabilities.  This helped me when I practiced OTV.  I also had several Nexus 2148s hanging around so I could do FEX things, but I could only do so much with a single Nexus 5548.

- One MDS 9148 Fibre Channel switch.  He worked pretty well.

I had a great base to get going on but in the end, I just couldn’t put it all together.  Why did I fail the first time?  Two reasons I think:

1.  Lack of confidence.  This is a big deal.  Nobody expected me to pass.  I’ve only been at Cisco for 3 years and I know people  who have been here a long time and haven’t earned the CCIE certification.  The second time I went in, I told my manager that I was getting it.  I was solidly prepared.

2.  Lack of equipment.  This was the biggest reason in my mind.  I’m cocky (conceited? immature? ignorant?) enough to think I can do these things.  I have young 4 children, and I’ve watched them all alone for 4 days straight, so I’ve already faced huge challenges!  I can do this!  If you look at the lab information and the equipment they use, you can see that I’m somewhat lacking.  For example, I had no director class fibre channel switch and not enough equipment to fully test things out.  This is one of the biggest barriers to passing the CCIE data center exam:  Having the equipment.  You are at least looking at several million dollars here and that’s probably why renting is such a good option and makes a ton of sense!

Anyway, here are my tips for the lab, when I passed, as well as for life in general:

Tip 1:  Higher is lower/ lower is higher?

I was also informed of a very cool trick.  When you think about priorities of different protocols or features, there’s an easy way to remember it.  This was taught to me by Ryan Boyd, a great guy I work with:  If its a layer 2 protocol (LACP, fibre channel stuff, VCP, spanning tree) the lower the number means higher the priority.  If its a layer 3 protocol (OSPF, EIGRP, OTV, VRRP, etc) higher the number higher the priority.  Fabric path is tricky, because its supposedly layer 2, but when you realize that its running IS-IS as the control plane then it makes more sense that it falls under the layer 3 rule: The higher the number, the higher the priority.  Why didn’t anyone tell me this before?

Tip 2:  copy & paste

I had several people tell me they use notepad, copy the command line stuff into it and then just put it in.  One of my friends told me he did that and blew away his switch and had to start from scratch.  This takes away far too many precious minutes from your lab time.  Lab day is one of the fastest days ever.  I spent a lot of time trying to debug something in the lab the day I passed.  When I looked at the clock, I realized that I had just spent 45 minutes burning away lab time.  Bad form!  (Fortunately, I had everything else done) So I don’t copy and paste.  I just type it out on the command line.  I have really good typing skills.  Its the one thing in high school that I did on a typewriter that really helped and has stuck with me.  Plus, writing all that code in college got me pretty good as well.  So for me it was type away.  Even if I’m doing the same thing on multiple switches.

As an aside:  The other funny thing I noticed is that people that do Cisco switches don’t type in all the words.  They do things like

sh run or sh int brie

Since I have big Linux roots, I do a lot of tabbing.  So maybe I add one extra keystroke, but this works for me.

Tip 3:

Draw it out.  In Brian’s blog he shows how he spent the first hour drawing it out.  I didn’t do quite that much the second time when I passed, but I did read through each section before I started working on that section.  This helped me when I had to remember which interfaces were connected to where.  You get as much scratch paper as you want.  I used more than the average.

After I failed the first test, I scheduled the second lab attempt as soon as I could.  The problem was:  The next available time was in September!!  Wow.  So I checked every day, several times a day for an opening.  After 3 days of this, I got July 9th.  So my lesson of not getting off the train helped out.  I thought:  Let’s keep going.

My friends had recommended INE labs and those things are *really* good.  I read through some of my friends labs, but didn’t use any of them.  Instead, a colleague of mine was building a lab out of spare parts and I joined forces and we built it together.  I like this approach a lot because I like touching hardware.  I like knowing how to set it up from scratch.  I’ve always done this.  We got a study group together of people that were going to take the lab exam and we hammered through all kinds of scenarios, really making sure we knew how to do it.  I’ll never forget watching the USA play in the world cup trying to get all our components working.

I tore the lab up several times and the week before the test, I really went to town.  (UCS, N1kv, MDS, N7k, N5k on the 4th of July is super patriotic, so that’s how I celebrated!)  I was continuing to go through it all the way up until 11PM the night before the test.  By that point, I had had enough.  I felt super ready.  I slept all the way until 6AM, extremely thankful I didn’t wake up at 3AM again.  I was still really nervous.  I got to building C early.

10518018_830156400328249_982627123_nThis time I had experience and I blew through all the questions keeping track of points feeling like I got nearly everything.  By lunch time I felt really good.  By 2PM I was sure I was passing… if only I could get this one thing working… I got it working by 3PM by being calm and retracing my steps.  I spent the remaining time going through the questions and making sure I had answered them right, tweaking things here and there and finding some things I had forgot. I counted the points and even though there were some things I never got working, I felt pretty sure I had enough to make it happen.

I left San Jose and went to the airport.  I called my wife and told her I felt good, but still wasn’t sure.  What if I missed something?  What if I didn’t save something?  (But I remember saving at least 3 times on every item before I left, so I was pretty sure about that)  Before I boarded the plan an email came.  I opened it up.  Put my hands in the air and jumped for joy.  The people in the airport probably thought I had just won the lottery.  But this wasn’t luck my friends, this was being prepared.  I had passed.  I texted my manager a few good friends and thanked them for their support.  It was a good day.

 

1000v in and out of vCenter

I was setting up the Nexus 1110 (aka: virtual service appliance, aka: VSA) with one of our best customers and as we were doing it the appliance rebooted never to come up again without completely reinstalling the firmware from the remote media.  Most of this was probably my fault because I didn’t follow the docs exactly, and I think we can now move forward, but it made me realize I hadn’t written down an important way to reconnect to an orphaned 1000v from a new virtual supervisor module (VSM).
Here’s the situation:  When you lose the 1000v that is connecting into vCenter, there is no way to remove the virtual distributed switch (VDS or DVS) that the 1000v presented to vCenter.  You can remove hosts from the DVS but you can’t get rid of that switch.
In the above picture, there is my DVS.  If I try to remove it, I get the following error:
In my case, I didn’t want to get rid of it, I just wanted to reconnect a new VSM that I created with the same name.  But this operation can be used to remove the 1000v DVS from vCenter as well.
So here’s how you do it:
Adopt an  Orphaned Nexus 1000v DVS
Install a VSM.  I usually do mine manually, so that it doesn’t try to register with vCenter or one of the hosts.  Don’t do any configuration, other than an IP address.  Just get it so that you can log in.  Once you can log in, if you did create an SVS connection you’ll need to disconnect.  In mine, I made an svs connection and called it venter.  To disconnect from vCenter and erase the svs connection run:
# config
# svs connection vcenter
# no connect
# exit
# no svs connection venter
Trivia: What does SVS stand for?  “Service Virtual Switch
Step 2.  Change the hostname to match what is in vCenter
Looking at the error picture above, you can see there is a folder named nexus1000v with a DVS named nexus1000v.  To make vCenter think that this new 1000v is the same one, we need to change the name to match what is in vCenter
nexus1000v-a(config)# conf
nexus1000v-a(config)# hostname nexus1000v
nexus1000v(config)#
Step 3.  Build SVS Connection
Since we destroyed (or never built) the SVS connection in step 1, we’ll need to build one and try to connect.  The SVS connection should have the same name as the one you created when you first made you SVS.  So if you called your SVS ‘vCenter’, or ‘VCENTER’, or ‘VMware’ then you’ll need to name it the same thing.  I named mine ‘vcenter’ so that’s what I use.  Similarly, you’ll have to create the datacenter-name the same as what you had before.
nexus1000v(config)# svs connection vcenter
nexus1000v(config-svs-conn)# remote ip address 10.93.234.91 port 80
nexus1000v(config-svs-conn)# vmware dvs datacenter-name Lucky Lab
nexus1000v(config-svs-conn)# protocol vmware-vim
nexus1000v(config-svs-conn)# max-ports 8192
nexus1000v(config-svs-conn)# admin user n1kUser
nexus1000v(config-svs-conn)# connect
ERROR:  [VMware vCenter Server 5.0.0 build-455964] Cannot create a VDS of extension key Cisco_Nexus_1000V_1169242977 that is different than that of the login user session Cisco_Nexus_1000V_125266846. The extension key of the vSphere Distributed Switch (dvsExtensionKey) is not the same as the login session’s extension key (sessionExtensionKey)..
Notice that when I tried to connect I got an error.  This is because the extension key in my Nexus 1000v (that was created when it was installed) doesn’t match what the old one is.  The nice thing, is I can actually change that, and that is how I make this new 1000v take over the other one.

Step 4.  Change the extension key to match what is in vCenter.
To see what the current extension-key is (or the offending key is) run the following command:
nexus1000v(config-svs-conn)# show vmware vc extension-key
Extension ID: Cisco_Nexus_1000V_125266846
That is the one we need to change.  You can see the extension-key that vCenter wants from the error message we saw in the previous step.  In the previous step it showed that the extension key we wanted was ‘Cisco_Nexus_1000V_1169242977′.  So we need to make our extension-key on the 1000v match that.  No problem:
nexus1000v(config-svs-conn)# no connect
nexus1000v(config-svs-conn)# exit
nexus1000v(config)# no svs connection vcenter
nexus1000v(config)# vmware vc extension-key Cisco_Nexus_1000V_1169242977

Now we should be able to connect and run things as before.

Step 5. (Optional) Remove the 1000v

If you’re just trying to remove the 1000v because you had that orphaned one sitting around, we simply disconnect now from vCenter

nexus1000v(config)# svs connection vcenter
nexus1000v(config-svs-conn)# no connect
nexus1000v(config-svs-conn)# connect
nexus1000v(config-svs-conn)# no vmware dvs
This will remove the DVS from the vCenter Server and any associated port-groups. Do you really want to proceed(yes/no)? [yes] yes

Now, the orphaned Nexus 1000v is gone. If you want to remove it from your vCenter plugins then you will have to navigate the managed object browser and remove the extension key. Not a big deal. By opening a web browser to the host that manages vCenter (e.g.: http://10.93.234.91 ) then you can “Browse objects managed by vSphere”. From there go to “content” then “Extension Manager”. To unregister the 1000v plugin, select “UnregisterExtension” and enter in the vCenter Extension key. This will be the same extension key that you used in step 4. (In our example: Cisco_Nexus_1000V_1169242977 )

Hope that helps!

Nexus 1000v – A kinder gentler approach

One of the issues skeptical Server Administrators have with the 1000v is that they don’t like the management interface being subject to a virtual machine.  Even though the 1000v can be configured so that if the VSM gets disconnected/powered-off/blownup the system ports can still be forwarded.  But that is voodoo.  Most say:  Give me a simple access port so I can do my business.

I’m totally on board with this level of thinking.  After all, we don’t want any Jr. Woodchuck network engineer to be taking down our virtual management layer.  So let’s keep it simple.

In fact!  You may not want Jr. Woodchuck Networking engineer to be able to touch your production VLANs for your production VMs.  Well, here’s a solution for you:  You don’t want to do the networking, but you don’t want the networking guy to do the networking either.  So how can we make things right?  Why not just ease into it.  The diagram below, presents, the NIC level of how you can configure your ESXi hosts:

Here, is what is so great about this configuration.  The VMware administrator can use things “business as usual” with the first 6 NICs.

Management A/B teams up with vmknic0 with IP address 192.168.40.101.  This is the management interface and used to talk to vCenter.  This is not controlled by the Nexus 1000v.  Business as usual here.

IP Storage A/B teams up with vmknic1 with IP address 192.168.30.101. This is to communicate with storage devices (NFS, iSCSI).  Not controlled by Nexus 1000v.  Business as usual.

VM Traffic A/B team up.  This is a trunking interface and all kinds of VLANs pass through here.  This is controlled either by a virtual standard switch or using VMware’s distributed Virtual Switch.  Business as usual.  You as the VMware administrator don’t have to worry about anything a Jr. Woodchuck Nexus 1000v administrator might do.

Now, here’s where its all good.  With UCS you can create another vmknic2 with IP address 192.168.10.101.  This is our link that is managed by the Nexus 1000v.  In UCS we would configure this as a trunk port with all kinds of VLANs enabled over it.  This can use the same VNIC Template that the standard VM-A and VM-B used.  Same VLANs, etc.

(Aside:  Some people would be more comfortable with 8 vNICs, Then you can do vMotion over its own native VMware interface.  In my lab this is 192.168.20.101)

The difference is that this IP address 192.168.10.101 belongs on our Control & Packet VLAN.  This is a back end network that the VSM will communicate with the VEM over.  Now, the only VM kernel interface that we need to have controlled by the Nexus 1000v is the 192.168.10.101 IP address.  And this is isolated from the rest of the virtualization stack.  So if we want to move a machine over to the other virtual switch, we can do that with little problem.  A simple edit of the VMs configuration can change it back.

Now, the testing can coexist on a production environment because the VMs that are being tested are running over the 1000v.  Now you can install the VSG, DCNM, the ASA 1000v, and all that good vPath stuff, and test it out.

From the 1000v, I created a port profile called “uplink” that I assign to these two interfaces:

port-profile type ethernet uplink
vmware port-group
switchport mode trunk
switchport trunk allowed vlan 1,501-512
channel-group auto mode on mac-pinning
no shutdown
system vlan 505
state enabled

By making it a system VLAN, I make it so that this control/packet VLAN stays up. For the vmknic (192.168.10.101) I also created a port profile for control:

port-profile type vethernet L3-control
capability l3control
vmware port-group
switchport mode access
switchport access vlan 505
no shutdown
system vlan 505
state enabled

This allows me to migrate the vmknic over from being managed by VMware to being managed by the Nexus 1000v. My VSM has an IP address on the same subnet as vCenter (even though its layer 3)

n1kv221# sh interface mgmt 0 brief

——————————————————————————–
Port VRF Status IP Address Speed MTU
——————————————————————————–
mgmt0 — up 192.168.40.31 1000 1500

Interestingly enough, when I do the sh module vem command, it shows up with the management interface:

Mod Server-IP Server-UUID Server-Name
— ————— ———————————— ——————–
3 192.168.40.102 00000000-0000-0000-cafe-00000000000e 192.168.40.102
4 192.168.40.101 00000000-0000-0000-cafe-00000000000f 192.168.40.101

On the VMware side, too, it shows up with the management interface: 192.168.40.101

Even though I only migrated the 192.168.10.101 vmknic over.

This configuration works great.  It provides a nice opportunity for the networking team to get with it and start taking back control of the access layer.  And it provides the VMware/Server team a clear path to move VMs back to a network they’re more familiar with if they are not yet comfortable with the 1000v.

Let me know what you think about this set up.

Teaching Kids to Program

I get asked a lot from different parents about teaching their kids to write computer programs.  “What is a good way to get started?” , “How did you get into it?”.  As my oldest child is now 9 I’ve been frequently asking myself the same question.  I feel it is very important that young people know how to write code.  I feel that years from now people will look back on those who couldn’t write basic computer programs the same way we look back to those who can’t write a simple letter.

Much online casino australia of my thinking has been confirmed and augmented by a Ted Talk I watched this week by Mitch Resnick.  In his talk, he affirms that just because people can code doesn’t mean we expect them to all be professional computer scientists or developers.  We don’t expect all people who learn how to write to become novelists or journalists.  Its just a basic skill that is needed in our day and age.

With his program “Scratch” that him and his team has made I think I’ve found the answer I was looking for.  I got home last night and downloaded it onto our family iMac.  It sits right in the kitchen and got my 9 year old and 6 year old started on it.  We started out with a picture of a “sprite”, or in our case, the default picture of a kitten.  We then created “controls” such as: “When I press the spacebar”.  Then underneath the control we did things like “change color” or move 10.  (the 10 is 10 pixels, but kids don’t really know that yet).  My kids would then keep pressing the space bar.  That’s when we introduced the “Forever” loop to them.  Amazing!  In just a quick 10 min, they understood loops and making things happen.

I’m hoping to do more with this and my kids.  I don’t want them to think of computer programming as dry and boring, but rather a creative medium for doing really cool things.  I am thankful for the people at MIT for making this possible.

 

Fabric Interconnect Failover tests

The default timeout for failover of a UCS fabric Interconnect is 5 seconds. Want to change that? Check this out.

If you fail over the primary fabric interconnect (which UCS manager will be running on) you’ll be logged out of UCS manager.  No worry, just wait 5 seconds and log back in.  You’ll be up on the primary.

When you fail over both of them to test, make sure your HA is back up and running before failing one of them over.  Just log in via SSH:

connect local-mgmt
show cluster stat
A: UP, PRIMARYB: UP, SUBORDINATE
HA READY

This will tell you that the cluster is ready.  At this point you should be able to unplug one of the Fabric Interconnects to test that failover works.

When they come back on line, you may want to change who the primary Fabric Interconnect is.  To do this, once again, SSH into the fabric interconnect:

connect local-mgmt
cluster lead a

Once, we didn’t let the HA get ready and we had to run cluster force primary to make sure the subordinate (who hadn’t been synced yet) become the primary.

 

CCIE Data Center Exam

The CCIE Data Center exam was announced in March of this year.  The list of topics is quite comprehensive.  I for one was stoked to see it announced as I wasn’t even thinking about doing a CCIE until this came up.

After some prodding from my team mates, I signed up for the Beta written exam and I took it today.  120 questions covering UCS, Nexus 7000, 5000, 1000v, MDS.  I don’t know the results of the test because the exam is in beta form and they won’t give out a passing score until after the beta period ends.

My overall feeling of the written exam in its current encarnation is that it is passible.  The UCS stuff I know pretty well.  The other topics… well, I could use some work.  But having taken it (and after all its only $50) then I think I’m ready to get serious and go for the CCIE.  I’m setting a timeline of Fall 2013 to have it passed.  Guess we’ll see.

Cisco UCS Role Based Access Control

One of the cool things that UCS allows you to do is create a place where different users of different organizations can go to to configure their pools of resources.  Its a common goal for many organizations to reduce duplication and allow agility and flexibility.  A multi-tenant solution that has been talked about can actually become a reality with UCS in the form of Role Based Access Control (RBAC).

Let’s suppose that a local county has decided it wants to consolidate its IT infrastructure into its IT department as opposed to every department having its own IT instances.  It can start off slowly, by say, starting with one or two organizations like the department of Superior Courts and the department of Executive Services.

Here’s how the main IT organization might configure RBAC for the Superior Courts and the Department of Executive Services.

1.  Create suborganizations

Log in as admin and navigate to the Servers tab.  From there you can expand the Service Profiles and see “root” and “Sub-Organizations”.  Right click on “root” and add an organization:

2.  Create Locale

A locale in UCSM is designed to reflect the location of a user in an organization.  By default all users are at the ‘root’ level locale, but if we are creating sub-organizations, we want them to use their own stuff and not modify existing resources that exist at the root level, or with other organizations.

Navigate to the Admin Tab in navigation pane, filter by User Management, expand User Services and right click on Locales.

From here we can create a local named Superior_Court and bind it to the Superior_Court organization we created.

Next, to assign the organization, we just expand the Organizations menu and drag the Superior_Court into the pane on the right.

Clicking finish gives us our new locale bound to its sub organization.

3.  Create a User for the Organization

Now let’s create a user called sc-admin that has all the rights in the Superior_Court local, but can’t change things in the root locale or any other locales.

On the navigation pane in the same place you were on the previous step, right click Locally Authenticated Users and select ‘Create User’.

The first fields are pretty self-explanatory.  We created the user and password and left out some of the other information.  The important part is that the locale is set to Superior_Court.  This confines the powers of this user into Superior_Court.  We can then select all the roles except the following:

- aaa:  Authentication, Authorization, and Accounting.  This can only be done in the root locale

- admin: this can only be given in the root locale

- operations: can only be given to root locale.

Now that sc-admin is created.  Give him to your local friendly Superior Court tenant and let them have access to the system.

Now then… What can sc-admin do?

If you now log in as sc-admin, you can see that he can create service profiles, pools, and policies, but only in his superior_court suborg.  If sc-admin tries to create a resource in the root organization, he is blocked from doing so because all of the options are greyed out:

 

Here’s what else he can do:

  • He can create sub organizations within his own Sub-organization.
  • He can create VLANs in the LAN and enable and disable network ports on the Fabric Interconnects.  (because he was given network access… if you don’t want this take away the network privilege)
  • He can create VSANs and disable and enable FC interfaces. (take away the storage privilege if you don’t want him to do this)

An interesting scenario I ran across is that if you remove a role from a user while that user is still logged in, it doesn’t seem to take effect until the user logs in later.  For example, I disabled sc-admin’s network role and he was still able to create VLANs and turn ports off and on.  When I logged him out and logged him back in again, the role acted how it should have.

One of the disadvantages of disabling the network role is that sc-admin can’t create VNIC Templates.  This is something we might want to allow him to do in his own org.  We can change this by creating a new role in the user management entitled Network_SP.  For this role, we just check:

  • Service Profile Network
  • Service Profile Network-Policy
  • Service Profile Qos
  • Service Profile Qos Policy

Next, add this role into the sc-admin account (click on locally authenticated users and right click sc-admin and add a check to the Network_SP role we created)

Now sc-admin can create vNIC templates in his own sub org, but he isn’t allowed to create external VLANs and disable/enable ports on the Fabric Interconnect.  For this to take affect, have sc-admin log out and log back in again after you apply the role.

You can do something very similar on the Storage tab in order to allow a suborg to create and modify its own vHBA Templates but not be able to disable FC ports on the Fabric Interconnects.

Once this is in place, you can repeat the operation for the department of Executive Services.  As other departments join the consolidated data center their users are simply added to the locales and given roles.

App crazy

I’ve been going a little app crazy to start out this year, and I’m very pleased with the results.  With the help of others, I’ve released updates to the two Cisco based apps: UCS Tech Specs, and FlexPod Tech Specs.  And I’ve finally released the xCAT iOS client!  Hurray!

I’ve been doing all this for the past several months during that precious moments I have after the kids go to bed and I drift off to sleep.  Lucky for me, my wife has enough interesting projects going on in her life that she doesn’t miss me… too much!  Don’t get me wrong:  We still find time to go out and have a great time. And for those times when my day job also becomes my night job, you can see why it takes a long time for many of these projects to get done.  Whew!

There are also many other projects cooking.  With my coworker Tige Phillips at Cisco, we are slowly creating SiMU HD, an iPad version of SiMU pro that will manage UCS systems.  …Well, I should restate that:  He’s doing most of the work and I’m lending a hand!

I’ve also thought about starting a little game development?  How about a game for managing clusters?  A game for managing UCS that gives you prizes for learning how to do certain cool features?  Ha!  Yes, I have a lot of bad ideas!  Hope you have a great February!

xCAT r* commands with UCS

xCAT out of the box works on UCS.  Or UCS out of the box works with xCAT? Whichever way you look at it, it works. All of the cool things you can do with xCAT like provision nodes, KVM, vSphere, stateless computing, etc, can all be done with UCS.  In fact, you can even run most of the r* commands on UCS.

Cisco UCS allows this through IPMI.  And configuring IPMI on UCS is easier than any other system I’ve ever used.  While I still plan on furthering my xCAT UCS plugin to get more capabilities into xCAT, most xCAT functions can be used with UCS managing the servers with IPMI.  For most people, this is good enough.

Using IPMI this is what seems to work with xCAT 2.6.6 and UCSM 2.0(1): (See the end of this for sample output)

  • rpower on|off|stat|boot
  • rbeacon on|off
  • reventlog [clear]
  • rvitals  (this is quite thorough)

rinv seems to hang on me.  This I think is due to the nature of service profiles, where UUIDs and MAC addresses are transient.  I’ll investigate this further.

So how do you do it?

Configuring an IPMI machine with xCAT has been well documented.  What I haven’t seen documented so much is configuring IPMI inside UCS.  This is surprisingly easy.  Here’s how its done:

1.  Create a Service Profile Template that you will apply to your blades.  This is documented very well in various places so I won’t go into it here.  Creating a Service Profile Template is UCS 101.   After you’ve created your service profile, assuming its an updating template you can proceed to the next step.  (Don’t worry, any changes made for doing IPMI don’t require a reboot)

2.  From the Servers tab, filter by Service Profile Templates, and navigate to your service profile template.

3.  Click on the policies table and look at the IPMI Access Profile Policy

4.  Create a new policy.  In this policy you’ll give the name of the user and give it a password.  Make sure they have admin privileges.  For simplicity, I just made my user and password the same as my UCSM user and password.

5.  Apply the setting and click save.

From here on out you can just run IPMI commands.  The only issue now is to know which IP address corresponds to the IPMI interface of which blade.

This can be found in UCSM under the Admin tab, Communication Management, Management IP pool.  If you click on the IP addresses tab on the left hand side, you’ll see all the IP addresses.  

Ok my friend, you now have it. xCAT running rpower commands.

And now, here is a sample output running rvitals on a UCS B200 M1:

# rvitals lucky01
lucky01: BIOSPOST_TIMEOUT: N/A
lucky01: BIOS_POST_CMPLT: 0
lucky01: CATERR_N: 0
lucky01: CPUS_PRCHT_N: 0
lucky01: DDR3_P1_A1_ECC: 0 error
lucky01: DDR3_P1_A1_PRS: 0
lucky01: DDR3_P1_A1_TMP: 26 C (79 F)
lucky01: DDR3_P1_A2_ECC: 0 error
lucky01: DDR3_P1_A2_PRS: 0
lucky01: DDR3_P1_A2_TMP: 25 C (77 F)
lucky01: DDR3_P1_B1_ECC: 0 error
lucky01: DDR3_P1_B1_PRS: 0
lucky01: DDR3_P1_B1_TMP: 26 C (79 F)
lucky01: DDR3_P1_B2_ECC: 0 error
lucky01: DDR3_P1_B2_PRS: 0
lucky01: DDR3_P1_B2_TMP: 27 C (81 F)
lucky01: DDR3_P1_C1_ECC: 0 error
lucky01: DDR3_P1_C1_PRS: 0
lucky01: DDR3_P1_C1_TMP: 24 C (75 F)
lucky01: DDR3_P1_C2_ECC: 0 error
lucky01: DDR3_P1_C2_PRS: 0
lucky01: DDR3_P1_C2_TMP: 25 C (77 F)
lucky01: DDR3_P2_D1_ECC: 0 error
lucky01: DDR3_P2_D1_PRS: 0
lucky01: DDR3_P2_D1_TMP: 22 C (72 F)
lucky01: DDR3_P2_D2_ECC: 0 error
lucky01: DDR3_P2_D2_PRS: 0
lucky01: DDR3_P2_D2_TMP: 22 C (72 F)
lucky01: DDR3_P2_E1_ECC: 0 error
lucky01: DDR3_P2_E1_PRS: 0
lucky01: DDR3_P2_E1_TMP: 22 C (72 F)
lucky01: DDR3_P2_E2_ECC: 0 error
lucky01: DDR3_P2_E2_PRS: 0
lucky01: DDR3_P2_E2_TMP: 22 C (72 F)
lucky01: DDR3_P2_F1_ECC: 0 error
lucky01: DDR3_P2_F1_PRS: 0
lucky01: DDR3_P2_F1_TMP: 21 C (70 F)
lucky01: DDR3_P2_F2_ECC: 0 error
lucky01: DDR3_P2_F2_PRS: 0
lucky01: DDR3_P2_F2_TMP: 22 C (72 F)
lucky01: ECC_STROM: 0
lucky01: FM_TEMP_SENS_IO: 21 C (70 F)
lucky01: FM_TEMP_SEN_REAR: 22 C (72 F)
lucky01: HDD0_PRS: 0
lucky01: HDD1_PRS: 0
lucky01: HDD_BP_PRS: 0
lucky01: IOH_THERMALERT_N: 0
lucky01: IOH_THERMTRIP_N: 0
lucky01: IRQ_P1_RDIM_EVNT: 0
lucky01: IRQ_P1_VRHOT: 0
lucky01: IRQ_P2_RDIM_EVNT: 0
lucky01: IRQ_P2_VRHOT: 0
lucky01: LED_BLADE_STATUS: 0
lucky01: LED_FPID: 0
lucky01: LED_MEZZ_FAULT: 0
lucky01: LED_MEZZ_TP_FLT: 0
lucky01: LED_SAS0_FAULT: 0
lucky01: LED_SAS1_FAULT: 0
lucky01: LED_SYS_ACT: 0
lucky01: MAIN_POWER: 0
lucky01: MEZZ_PRS: 0
lucky01: P0V75_DDR3_P1: 0.7644 Volts
lucky01: P0V75_DDR3_P2: 0.7644 Volts
lucky01: P12V_BP: 11.948 Volts
lucky01: P12V_CUR_SENS: 10.78 Amps
lucky01: P1V05_ICH: 1.0486 Volts
lucky01: P1V1_IOH: 1.078 Volts
lucky01: P1V1_VCCP_P1: 1.0192 Volts
lucky01: P1V1_VCCP_P2: 0.931 Volts
lucky01: P1V1_VTT_P1: 1.1368 Volts
lucky01: P1V1_VTT_P2: 1.1564 Volts
lucky01: P1V2_SAS: 1.2152 Volts
lucky01: P1V5_DDR3_P1: 1.5288 Volts
lucky01: P1V5_DDR3_P1_IMN: 5.13 Amps
lucky01: P1V5_DDR3_P2: 1.5386 Volts
lucky01: P1V5_DDR3_P2_IMN: 14.25 Amps
lucky01: P1V5_ICH: 1.5092 Volts
lucky01: P1V8_IOH: 1.813 Volts
lucky01: P1V8_P1: 1.7836 Volts
lucky01: P1V8_P2: 1.7836 Volts
lucky01: P1_PRESENT: 0
lucky01: P1_TEMP_SENS: 39.5 C (103 F)
lucky01: P1_THERMTRIP_N: 0
lucky01: P2_PRESENT: 0
lucky01: P2_TEMP_SENS: 37.5 C (100 F)
lucky01: P2_THERMTRIP_N: 0
lucky01: P3V3_SCALED: 3.2548 Volts
lucky01: P3V_BAT_SCALED: 3.102 Volts
lucky01: P5V_SCALED: 4.9405 Volts
lucky01: POWER_ON_FAIL: 0
lucky01: POWER_USAGE: 126 Watts (430 BTUs/hr)
lucky01: SAS0_FAULT: N/A
lucky01: SAS1_FAULT: N/A
lucky01: SEL_FULLNESS: 0
lucky01: VR_P1_IMON: 1.75 Amps
lucky01: VR_P2_IMON: 3.5 Amps