Cloud Computing: How Do I Get There?

This post comes from a talk that I’ll be presenting on at the Pacific Northwest Digital Government Summit Conference on October 2nd, 2013.

History shows us that those that embrace technology and change survive while those that resist and stick with “business as usual” get left behind.  If we have the technology and we don’t use it to make IT look like magic, then we’re probably doing it wrong. (Read “The Innovator’s Dilemma” and Clarke’s Three Law.)

I’ll be talking mainly about private cloud today, but many of these ideas can be taken into the public cloud as well.

Optimizing ROI on your Technology

My friend tells a story about when his wife first started using an iPhone.  To get directions on a map she’d open up Safari and go to http://maps.google.com.  To check Facebook she would open Safari and go to http://facebook.com.  To check her mail she’d open up Safari again and navigate to http://gmail.com.  You get the idea.

She was still getting great use of her iPhone.  She could now do things she could never do before.  But there was a big part she was missing out on.  She wasn’t using the App ecosystem that makes all of these things easier and delivers a richer experience.

Today, most organizations have virtualization in the data center.  Because of this IT is able to do things they’ve never been able to do before.  They’re shrinking their server footprints to once unimaginable levels saving money in capital and management costs.  I’ve been in many data centers  where people proudly point to where rows of racks have been consolidated to one UCS domain with only a few blades.  Its pretty cool and very impressive.

But they’re missing something as big as the App Store.  They’re missing out on the APIs.  This is where ROI is not being optimized in the data center in a big way.

IT is shifting (or has shifted) to a DevOps model. DevOps means that your IT infrastructure team is more tightly aligned with your developers/application people.  This is a management perspective.  But from a trenches perspective, the operations team is now turning into programmers.  Programmers of the data center.  The guy that manages the virtual environment, the guy who adds VLANs to switches, or the guy who creates another storage LUN: they’re all being told to automate and program what they do.

The group now treats the IT infrastructure like an application that is constantly adding features and doing bug fixes.

The programming of the IT infrastructure isn’t done in compiled languages like Java, C, or C++.  Its done in interpreted languages like Python, Ruby, Bash,  Powershell, etc.  But the languages alone don’t get you there.  You need a framework.  This is where things like Puppet or Chef come into play.  In fact, you even can look at it like you’re programming a data center operating system.  This is where OpenStack provides you a framework to develop your data center operating system.  Its analogous to the Web Application development world.  Twitter was originally developed in Ruby using a framework called Ruby on Rails.  (Twitter has since moved off Ruby on Rails).

Making this shift gives you unprecedented speed, agility, and standardization.  Those that don’t do it, will find their constituents looking elsewhere for IT services that can be delivered faster and cheaper.

The IT assembly line

Its hard for people to think of their IT professionals as assembly line workers.  After all, they are doing complex things like installing servers, configuring networks, and updating firmware.  These are CCIEs, VCPs, and Storage Gurus.  But that’s actually what people in the trenches are:  Workers of the virtual Assembly line.  IT managers should look at the way work enters the assembly line, understand the bottlenecks, and track how long it takes to get things through the line.  Naturally, there are exceptions that crop up.  But for the most part, the work required to deliver applications to the business are repetitive tasks.  They’re just complicated, multi-step, repetitive tasks.

To start with, we need to look at the common requests that come in:  Creating new servers, deploying new applications, delivering a new test environment.  Whatever it is, management really needs to understand how it gets done, and look at it like the manufacturing foreman sitting above the plant, looking down and watching a physical product make its way through.  Observe which processes are in place, where they are being side stepped, or where they don’t exist at all.

As an example, consider all the steps required to deploy a server.  It may look something like the flowchart below:

That sure looks like an assembly line to me.  If you can view work that enters the infrastructure like an assembly line, you can start measuring how long it takes for certain activities to get done.  Then you can figure out ways to optimize.

Standardization of the Infrastructure

Manufacturing lines optimize throughput by standardizing processes and equipment.  When I hear VMware tell everybody that “the hardware doesn’t matter”, I take exception.  It matters.  A lot.  Just like your virtualization software matters.  Cisco and other hardware venders come from it the opposite direction and say “the hypervisor doesn’t matter, we’ll support them all”.  What all parties are really telling you is that they want you to standardize on them.  All parties are trying to prove their value in a private cloud situation.

What an organization will standardize on depends on a lot of things: Budget, skill set of Admins, Relationship with vendors and consultants, etc.  In short, when considering the holy trinity of the data center: Servers, Storage, & Networking it usually gets into a religious discussion.

But whatever you do, the infrastructure needs to be robust.  This is why the emergence of Converged Infrastructures like Vblocks, FlexPods, and other reference architectures have become popular.  The  “One-Piece-At-A-Time” accidental/cobbled architecture is not a good play.

Consider the analogy that a virtualized workload is cargo on a Semi Truck.  Do you want that truck running over a 6 lane solid government highway like I-5 or do you want that stuff traveling at 60mph down a rinky bridge?

This?

Or This?

Similarly, if your virtualization team doesn’t have strong Linux skills, you probably don’t want them running OpenStack on KVM.  That’s why VMware and Hyper-V are so popular.  Its a lot easier for most people’s skill level.

What to Standardize On?

While the choice of infrastructure standardization is a religious one, there are role models we can look to when deciding.  Start out by looking at the big boys, or the people you aspire to be when you grow up.  Who are the big boys that are running a world class IT as a service infrastructure?  AWS, RackSpace, Yahoo, Google, Microsoft, Facebook, right?

What are they standardizing on?  Chances are its not what your organization is doing.  Instead of VMware, Cisco, IBM, HP, Dell, EMC, NetApp, etc, they’re using open source, building their own servers, and using their own distributed filesystems.  They do this because they have a large investment in their DevOps team that is able to put these things together.

A State organization that has already standardized on a FlexPod or Vblock with VMware is not going to throw away what they’ve done and start over just so they can match what the big boys do.  However, as they move forward, perhaps they can make future decisions based on emulating these guys.

Standardize Processes

The missing part is standardizing the processes once the infrastrucutre is in place.  Standardization is tedious because it involves looking at every detail of how things are done.  One of my customers has a repository of documentation they use every time they need to do something to their infrastructure.  For example, 2 weeks ago we added new blade servers to the UCS.  He pulled out the document and we walked through it.  There were still things we modified in the documentation, but for the most part the steps were exact.

Unfortunately, this was only one part of the process.  The Networking team had their own way of keeping notes (or not at all) on how to do things.  So the processes were documented in separate places.  What the IT manager needs to do is make sure they understand how the processes (or work centers) are put together and how long each one takes.

The manager should be able to have their own master process plan to be able to track work through the system.  (The system being the different individuals doing the work).  This is what is meant by “work flow”.  Even if they just do this by hand or as is commonly done with a Gantt chart, there should be some understanding.

Each job that comes in, should get its own workflow, or Gantt Chart, and entered into something like a Kanban board.  Once you understand this for the common requests, you can see how many one offs there are.

Whether these requests are for public cloud or private cloud, there is still a workflow.  It is an iterative process that may not be complete the first few times it is done, but over time will become better.  There is a great book called “The Phoenix Project” that talks about how the IT staff starts to standardize and work together between development and operations to get their processes better.  These ideas are based off an earlier business classic called “The Goal”

Automate the Processes

Once the processes are known we turn our assembly line into programmers of the processes.  I used to worked as a consulting engineer to help deploy High Performance Computing clusters.  On several occasions the RFPs required that the cluster be able to be deployed from scratch in less than 1 hour.  From bare metal, to running jobs.  We created scripts that would go through and deploy the OS, customize the user libraries, and even set up a job queuing system.  It was pretty amazing to see 1,200 bare metal rack mount servers do that.  When we would leave, if the customer had problems with a server then they could replace it, plug it in, and walk away.  The system would self provision.

While that was a complicated process and still is, it is still simpler than what virtualization has done to the management of the data center.  We never had to mess with the network once it was set up.  Workflows for a new development environment are pretty common and require provisioning several VMs with private networks and their own storage.  However, the same method of scripting the infrastructure can still be applied.  It just needs to be orchestrated.

Automate and Orchestrate with a Framework

Back when we did HPC systems, we used an open source management tool called xCAT.  That was the framework by which we managed the datacenter.  The tool had capabilities but really what it gave us was a framework to insert our customizations or our processes that were specific for each site.  The tool was an enabler of the solution, not the solution itself.

Today there are lots of “enterprise” private cloud management tools.  In fact, any company that wants to sell a “Private Cloud”  will have its own tool.  VMware vCloud Director, HP Cloud System, IBM Cloudburst, Cisco UCS Director, etc.  All of these products, regardless of how they are sold should be regarded as frameworks for automating your processes.

At a recent VMUG, the presenter asked “How many people are using vCloud Director or any other cloud orchestration tool?”  Nobody raised their hand.  Based on what I’ve seen its because most organizations haven’t yet standardized their IT processes.  There is no need for orchestration if you don’t know what you’re orchestrating.

Usually each framework will come with a part or all of what Cisco calls the “10 domains of cloud” which may include: A self service portal, chargeback/showback, service catalog, security, etc.  If you are using a public cloud, you are using their framework.

Once you select one, you’ll need to get the operations teams (network, storage, compute, virtualization) to sign off and use the tool.  Its not just a server thing.  Each part of the assembly line needs to use it.

Once the individual components are entered into the framework, then the orchestration comes to play.  To start with, codify the most common workloads:  Creating VLAN, Carving out a LUN, Provisioning a VM, etc.

To orchestrate means to arrange or control the elements of, as to achieve a desired overall effect.  With the Framework, we are looking to automate all of the components to deliver a self service model to our end customer.

Self Service and Chargeback

Once we have the processes codified in the framework, we can now present a catalog to our users.  With a self service portal we recommend it not being completely automated to start out with.  With some frameworks, as a workload moves through the automated assembly line, it can send an email to the correct IT department to validate whether a workflow can move through.  So for example, if the user as part of the workflow wants a new VLAN for their VM environment, the networking administrator will receive an email and will be able to approve or deny.  This way, the workflow is monitored, the end requester knows where they are in the queue, and  once it is approved, it gets created automatically, then gets passed along to the next item in the assembly line.

For chargeback, the recommendation is to keep the menu small, and the price simple.

Security all throughout then Monitor, Rinse, and Repeat

More workflows will come into the system and the catalog will need to continuously need updating and revisions.  This is the programmable data center.  Iterations should be checked into a code repository similarly to how application developers use systems like github.com to store code updates.  You will have to do bug fixes and patch up any exposed holes.  With virtualization comes the ability to integrate more software security services like the ASA 1000v, or the VSG.

Action Items

  • Realize that your IT infrastructure is a collection of APIs waiting to be harnessed and programmed.  Challenge the people you work with to learn to use those APIs to automate their respective areas of expertise.
  • Optimize the assembly line by understanding the workflows.  Any manufacturing manager can tell you the throughput of the system.  An IT manager should be able to tell you the same thing about their system.  Start by understanding the individual components, how long it takes, and where the bottlenecks in the system are.
  • Standardize your infrastructure with a solid architecture.  Converged architectures are popular for a reason.  Don’t reinvent the wheel.
  • Standardizing processes is the hardest part.  Start with the most common.  These are usually documented.  Take the documentation and think how you would change it into code.
  • Program the DataCenter using a Framework.  Most of the work will have to be done in house or with service contracts.  The framework could be something like a vendors cloud software or something free like OpenStack.

 

Quick SPAN with the Nexus 1000v

Today I thought I’d take a look at creating a SPAN session on the 1000v to monitor traffic.  I found it really easy to do!  SPAN is one of those things that takes you longer to read and understand than to actually configure.  I find that true with a lot of Cisco products:  Fabric Path, OTV, LISP, etc.

SPAN is “Switched Port Analyzer”.  Its basically port monitoring.  You capture the traffic going from one port and then mirror it on another.  This is one of the benefits you get out of the box for the 1000v that enables the network administrator not to have this big black box of VMs.

To follow the guide, I installed 3 VMs.  iperf1, iperf2, and xcat.  The idea was I wanted to monitor traffic between iperf1 and iperf2 on the xcat virtual machine.

On the xcat virtual machine I created a new interface and put it in the same VLAN as the other VMs.  These were all on my port-profile called “VM Network”.  I created it like this:

conf
vlan 5
port-profile type vethernet “VM Network”
vmware port-group
switchport mode access
switchport access vlan 510
no shutdown
state enabled

Then, using vCenter I edited the VMs to assign them to that port group. (Remember: VMware Port-Group = Nexus 1000 Port-Profile)

On the Nexus 1000v Running the command:

# sh interface virtual

——————————————————————————-
Port Adapter Owner Mod Host
——————————————————————————-
Veth1 vmk3 VMware VMkernel 4 192.168.40.101
Veth2 vmk3 VMware VMkernel 3 192.168.40.102
Veth3 Net Adapter 1 xCAT2 3 192.168.40.102
Veth4 Net Adapter 2 iPerf2 3 192.168.40.102
Veth5 Net Adapter 3 xCAT 3 192.168.40.102
Veth6 Net Adapter 2 iPerf1 3 192.168.40.102

Allows me to see which vethernet is assigned to which VM. In this SPAN session, I decided I wanted to monitor the traffic coming out of iPerf1 (Veth6) on the xCAT VM (veth5).
No problem:

Create The SPAN session

To do this, we just configure a SPAN session:

n1kv221(config-monitor)# source interface vethernet 6 both
n1kv221(config-monitor)# destination interface vethernet 5
n1kv221(config-monitor)# no shutdown

As you can see from above, I’m monitoring both received and transmitted packets from vethernet 6( iPerf1). Then those packets are being mirrored to vethernet 5 (xCAT). If you have an IP address on xCAT (vethernet 5) you’ll find you can no longer ping it. The port is in span mode. Notice also that by default the monitoring session is off. You have to turn it on.

Now we want to check things out:

n1kv221(config-monitor)# sh monitor
Session State Reason Description
——- ———– ———————- ——————————–
1 up The session is up
n1kv221(config-monitor)# sh monitor session 1
session 1
—————
type : local
state : up
source intf :
rx : Veth6
tx : Veth6
both : Veth6
source VLANs :
rx :
tx :
both :
source port-profile :
rx :
tx :
both :
filter VLANs : filter not specified
destination ports : Veth5
destination port-profile :

Now, you’ll probably want to monitor the port right? I just installed wireshark on my xcat vm. (Its linux, yum -y install wireshark and ride). To watch from the command line I just ran the command:

root@xcat ~]# tshark -D
1. eth0
2. eth1
3. eth2
4. eth3
5. any (Pseudo-device that captures on all interfaces)
6. lo

This gives me the interfaces. By matching the MAC addresses, I can see that eth2 (or device 3 from the wireshark output) is the one that I have on the Nexus 1000v.

From here I run:

[root@xcat ~]# tshark -i 3 -R “eth.dst eq 00:50:56:9C:3B:13″
0.000151 192.168.50.151 -> 192.168.50.152 ICMP Echo (ping) reply
1.000210 192.168.50.151 -> 192.168.50.152 ICMP Echo (ping) reply
2.000100 192.168.50.151 -> 192.168.50.152 ICMP Echo (ping) reply
..

Then I get a long list of fun stuff to monitor. By pinging between iperf1 and iperf2 I can see all the traffic that goes on. Since there was nothing else on this VLAN it was pretty easy to see. Hopefully this helps me or you troubleshoot down the road.

MediaWiki Installation on RedHat 5.5

In modern data center things like IPs, user accounts, passwords, and such that you used to keep in Excel spreadsheets should be rolled into the management tools.  That way, you always have the most current information.  Static word, excel and the like are old news.  Today you can see those things start to get rolled up into vCloud Director, OpenStack and others.  But for now, most people are still doing Excel spreadsheets.

This is stupid.  Please, At least use a wiki.  Catch up to 2005.

Media Wiki is one that I’ve used for years.  Its easy to install and do stuff and the syntax doesn’t take too long to learn.

Here’s how I set it up:

1.  Download Media Wiki on your Linux Server

Go to Media Wiki and download the latest stable.

cd /var/www/html
rm -rf *
wget http://download.wikimedia.org/mediawiki/1.21/mediawiki-1.21.1.tar.gz
tar zxvf media*
mv mediawiki-1.21.1/* .
rm -rf mediawiki-1.21.1

2.  Installing the Linux Environment

Get PHP and mysql installed on your server.  My server is a Red Hat 5.5 (yes, old )  virtual machine that I’ve had for about 2 years.  I haven’t updated to 6.x.  The easiest thing to do would be to install a new server.  CentOS 6.4 might be good, but a challenge every now and then is fun, yeah?  So to get it working, you have to have at least php 5.3.x.  To update I had to just update my OS.  Since I didn’t get my subscription set up right with Red Hat, I just figured I’d use CentOS to update.  That was pretty easy.  I just did this:

wget http://mirror.centos.org/centos/5/os/x86_64/CentOS/centos-release-5-9.el5.centos.1.x86_64.rpm
wget http://mirror.centos.org/centos/5/os/x86_64/CentOS/centos-release-notes-5.9-0.x86_64.rpm
rpm -ql -p centos-release-5-9.el5.centos.1.x86_64.rpm # just to see what was in it, yep, its got the repo!
rpm -Uvh centos-release-5-9.el5.centos.1.x86_64.rpm centos-release-notes-5.9-0.x86_64.rpm # install repos

From here, I removed my older versions of php. This is just:

rpm -qa | grep mysql
rpm -qa | grep php

Then I used some:

yum -y remove

Then I updated everything:

yum -y update

This took a while. Finished, came back. Everything updated. Now I installed the right packages:

yum -y install php53 php53-mysql msyql-server php53-xml

There may have been several other RPMs that you’ll need as dependencies, but that should get you started. That’s how we got up. Don’t forget to now enable mysql and restart apache:

service httpd restart
service mysqld restart
chkconfig –level 345 httpd on
chkconfig –level 345 mysqld on

3.  Configuring via the Web Interface

Once there, go to http://<yourserver>/

You should see:

4.  Creating Content

Going to the next page it’ll start asking you questions and eventually you’ll have yourself a wiki setup.  The thing I first started looking at doing was adding a table for IP addresses.  It ended up looking like this:

This is good and helps us to know where things are.  I started to create several pages for different VLANs. It could be updated, but I wish it was update in place.  Not the best, but ok for now.

 

5.  Editing Help

Go here: http://www.mediawiki.org/wiki/Help:Editing to see all the syntax to use to do cool formatting.

Finally, now you have yourself a wiki to keep things in. Welcome to 2005.  You are awesome.  No shared Excel spreadsheet with multiple outdated copies.  Now you just have to get everyone to buy into using it.  To do that: Be the example.  Use it, refer people to it.  Pretty soon they’ll catch on.

But there is a better way right?  What could that be?  The truth is, to manage effectively, you really need to integrate the information into your management toolset.  Much in the way UCS keeps track of BIOS versions, settings, VLANs, etc, you need some kind of tool that does that.  Today you can do that with OpenStack, vCloud Director, and some others.  I’m still not sold on any of them at this point but as I start to play with OpenStack more, I hope to give more guidance and thoughts.

UCS Reverse Path Forwarding and Deja-Vu checks

UCS Fabric Interconnects are usually always run in end-host mode.  At this point in the story there really isn’t that many reasons to use switch-mode on the Fabric Interconnects.

Two checks, or features that make End Host Mode possible are Reverse Path Forwarding (RPF) checks and Deja-Vu checks.

RPF and Deja-Vu (from Cisco.com)

Reverse Path Forwarding Checks

Each server in the chassis is pinned dynamically (or you can set up pin groups and do it statically, but I don’t recommend that) to an uplink on Fabric Interconnect A and Fabric Interconnect B.  Let’s say you have 2 uplinks on port 31 and 32 of your Fabric Interconnect.  Server 1/1 (chassis 1 / blade 1)  may be pinned to port 31.  If a unicast packet is received for server 1/1 on uplink port 31, it will go through.  But if that same packet destined for server 1/1 is received on port 32, it will be dropped.  That’s because RPF checks to see if the destination for the unicast is actually forwarding its uplink traffic through that link.

Deja Vu Checks

The other check is called “Deja-Vu” .  In the Cisco documentation it says: “Server traffic received on any uplink port, except its pinned uplink port is dropped“.  That sounds a lot like RPF.  Another presentation from Cisco live states it this way: “Packet with source MAC belonging to a server received on an uplink port is dropped

An example to clear it up

VM A on server 1/1 wants to talk to VM B located somewhere else.  The Fabric Interconnects in this case are connected to a single Nexus 5500 switch.  The VM is pinned to one of the VNICs and that VNIC is pinned to go out port 31 of Fabric Interconnect A.  So what happens?

First the VM will send an ARP request.  An ARP request basically says:  I know the IP address but I want the MAC address.  (Obviously, this is in the same Layer 2 VLAN and subnet).  If Fabric Interconnect A doesn’t find the IP/MAC association in its CAM table, then it will not flood the server ports down stream.  That is something a switch would do.  The Fabric Interconnect is different.  The reason the Fabric Interconnect doesn’t send a broadcast down its server ports is because it is a source of truth and knows everyone connected on its server ports.

What it will do instead is forward the ARP request (unknown unicast) up the designated uplink (port 31).  Now the Nexus switch is a switch.  (And a very good one at that).  It will say:  “Hey, I don’t have a CAM table entry for VM B IP/MAC so I will do what we switches do best:  Flood all the ports! (except the port that the unknown unicast/ARP request came in on)

Remember Fabric Interconnect A port 32 is connected to this same switch as port 31 where the unknown unicast (ARP request) went out.  The Nexus 5500 will send this unknown unicast to port 32 just like every other port.  But port 32 says:  Wait a minute, the source address originated from me.  Deja-vu!  So he drops the packet.

Fabric Interconnect B has two ports 31 and 32 that will also receive the unknown unicast.  If VM B is pinned to a VNIC that is pinned to port 31 on Fabric Interconnect B, he will say:  I got this!  And the packet will go through.  Port 32, however on FI-B will look at the destination MAC and say:  This is not pinned to me, so I’ll drop the packet.  That is the RPF check.

To sum it up

Deja-Vu check:  don’t receive a packet from the upstream switch that originated from me.

Reverse Path Forward Check:  don’t receive a packet if there’s no server pinned to this uplink.

Backing up UCS

Backing up UCS can be a little confusing especially since it presents you a few options.  What you may be expecting is something simple like a one button easy “Back it up” button.  But in fact, that is not the case.  And the nice thing about it is there are lots of different things you can do with backup files.
From the Admin Tab under All in UCS Manager, under the general tab, you select “Backup Configuration”

But now, we have a few choices as to how we set this up.  Now you create a backup operation

Then you are presented the below screen and now things get a little bit complicated.

Let’s go through some of these seemingly confusing options:

Admin State

This is a bit confusing.  But here’s how to think about it:  If you want to run the backup now, right this second, when you click “OK” and don’t want to wait, select “Enabled”.  Most of the time this is what you want.  If instead, you just want to save this backup operation, so that you can click it on the Backup operations list and do it, then set disabled.

Type

There are 4 different configurations that can be backed up by UCS.  All of them deal with data that lives in the Fabric Interconnect.  They are illustrated in the diagram below

 

The brim of the triangle is the Full State.  This is a binary file that can be used to backup on any system to restore the settings that this Fabric Interconnect has.  Its different than all the other types.  Its the only one that can be used for system restore.  This is usually fun to backup off your own system.  I haven’t tried putting it into the platform emulator yet, but it might be fun to try.

The three other backups are just XML files.  They’re useful for importing into other systems.  The “All Configuration” is just a fancy way of saying “System Configuration” and “Logical Configuration”.  It does both.

The System Configuration is user names, roles, and locals.  This is useful if you are installing another UCS somewhere and you want to keep the same users and locales (if you are using some type of multi-tenancy) but in that case, why aren’t you using UCS Central?  Try it, its free for up to 5 domains.  And you can do global service profiles.

The Logical Configuration is all the pools and policies, service profiles, service profile templates you would expect to be backed up.  This is pretty good to put inside the emulator to fool around with different settings you are using.  Or, if you don’t have your UCS yet and you’re waiting to order it, then you can just create the pools and policies in the emulator.  Then when the real thing comes, import the logical configuration in and you are ready to rock.

The tricky button that shows up when you select the All Configuration or the Logical Configuration is the label:  Preserve Identities. This is only on logical and all configurations because it has to do with making service profiles that are already mapped to pools retain their mapping.  This is good if you’re going to move some service profiles from one fabric interconnect domain to another and want to keep the same setup.  Otherwise, it doesn’t really matter to keep those identities.

The other options presented for how you want to back up the system is pretty self explanatory.  You can either back this up to your local machine or some other machine that has another service running like SSH, TFTP, etc.

After you’ve created a backup operation, the nice thing is that it saves it for you in a backup operations list.  When you want to actually do it, just select it, then hit admin enable and it will perform the backup.

Performing Routine Periodic Backups

But wait you say, what if I want it to periodically backup itself?

Well, that’s where you move to the next tab which is the Policy Backup & Export

Here you have the option of backing up just the binary system restore button, or the all-configuration.  The all configuration is good for backing up XML files just in case some administrator accidentally changes a bunch of configs on you.

Here you can see, My XML and binary files will be backed up every day.  (That may be a little more than you need, as things don’t usually change so much in most environments, but hey, now you have it, use it.)

When it saves to those remote files you’ll get a timestamp on the name:

full-backup.bin.2013-07-28T22-55-11.555
all-config.xml.2013-07-28T22-57-11.559

So that’s backing up the system and all the ways it can be done.  There’s a few nerd nobs, but I wanted to make sure I understood it.

The last thing to cover is import operations.  Its important to understand that you can do two different types:  A merge or replace.  With merge, if you have a MAC pool called A and it has 30 MACs already, a merge will add the new MACs to it.  (So if there are 20 in the import, you will now have 50).  With replace, you’ll now just have 20.  You can only merge XML files.

Lastly, all of this information is found here in the latest  UCS GUI Configuration Guide It was nice to gain a more solid understanding of it.  Backing up is something I go over briefly in some of my tech days I do, but this flushes it out a little better if there are any further questions.

Thanks for reading!

 

 

Nexus 1000v – A kinder gentler approach

One of the issues skeptical Server Administrators have with the 1000v is that they don’t like the management interface being subject to a virtual machine.  Even though the 1000v can be configured so that if the VSM gets disconnected/powered-off/blownup the system ports can still be forwarded.  But that is voodoo.  Most say:  Give me a simple access port so I can do my business.

I’m totally on board with this level of thinking.  After all, we don’t want any Jr. Woodchuck network engineer to be taking down our virtual management layer.  So let’s keep it simple.

In fact!  You may not want Jr. Woodchuck Networking engineer to be able to touch your production VLANs for your production VMs.  Well, here’s a solution for you:  You don’t want to do the networking, but you don’t want the networking guy to do the networking either.  So how can we make things right?  Why not just ease into it.  The diagram below, presents, the NIC level of how you can configure your ESXi hosts:

Here, is what is so great about this configuration.  The VMware administrator can use things “business as usual” with the first 6 NICs.

Management A/B teams up with vmknic0 with IP address 192.168.40.101.  This is the management interface and used to talk to vCenter.  This is not controlled by the Nexus 1000v.  Business as usual here.

IP Storage A/B teams up with vmknic1 with IP address 192.168.30.101. This is to communicate with storage devices (NFS, iSCSI).  Not controlled by Nexus 1000v.  Business as usual.

VM Traffic A/B team up.  This is a trunking interface and all kinds of VLANs pass through here.  This is controlled either by a virtual standard switch or using VMware’s distributed Virtual Switch.  Business as usual.  You as the VMware administrator don’t have to worry about anything a Jr. Woodchuck Nexus 1000v administrator might do.

Now, here’s where its all good.  With UCS you can create another vmknic2 with IP address 192.168.10.101.  This is our link that is managed by the Nexus 1000v.  In UCS we would configure this as a trunk port with all kinds of VLANs enabled over it.  This can use the same VNIC Template that the standard VM-A and VM-B used.  Same VLANs, etc.

(Aside:  Some people would be more comfortable with 8 vNICs, Then you can do vMotion over its own native VMware interface.  In my lab this is 192.168.20.101)

The difference is that this IP address 192.168.10.101 belongs on our Control & Packet VLAN.  This is a back end network that the VSM will communicate with the VEM over.  Now, the only VM kernel interface that we need to have controlled by the Nexus 1000v is the 192.168.10.101 IP address.  And this is isolated from the rest of the virtualization stack.  So if we want to move a machine over to the other virtual switch, we can do that with little problem.  A simple edit of the VMs configuration can change it back.

Now, the testing can coexist on a production environment because the VMs that are being tested are running over the 1000v.  Now you can install the VSG, DCNM, the ASA 1000v, and all that good vPath stuff, and test it out.

From the 1000v, I created a port profile called “uplink” that I assign to these two interfaces:

port-profile type ethernet uplink
vmware port-group
switchport mode trunk
switchport trunk allowed vlan 1,501-512
channel-group auto mode on mac-pinning
no shutdown
system vlan 505
state enabled

By making it a system VLAN, I make it so that this control/packet VLAN stays up. For the vmknic (192.168.10.101) I also created a port profile for control:

port-profile type vethernet L3-control
capability l3control
vmware port-group
switchport mode access
switchport access vlan 505
no shutdown
system vlan 505
state enabled

This allows me to migrate the vmknic over from being managed by VMware to being managed by the Nexus 1000v. My VSM has an IP address on the same subnet as vCenter (even though its layer 3)

n1kv221# sh interface mgmt 0 brief

——————————————————————————–
Port VRF Status IP Address Speed MTU
——————————————————————————–
mgmt0 — up 192.168.40.31 1000 1500

Interestingly enough, when I do the sh module vem command, it shows up with the management interface:

Mod Server-IP Server-UUID Server-Name
— ————— ———————————— ——————–
3 192.168.40.102 00000000-0000-0000-cafe-00000000000e 192.168.40.102
4 192.168.40.101 00000000-0000-0000-cafe-00000000000f 192.168.40.101

On the VMware side, too, it shows up with the management interface: 192.168.40.101

Even though I only migrated the 192.168.10.101 vmknic over.

This configuration works great.  It provides a nice opportunity for the networking team to get with it and start taking back control of the access layer.  And it provides the VMware/Server team a clear path to move VMs back to a network they’re more familiar with if they are not yet comfortable with the 1000v.

Let me know what you think about this set up.

Change a Fabric Interconnect into a Nexus Switch

I got a Nexus 5010 from our spare parts department.  When I booted it up, lo and behold, it thought it was a UCS Fabric Interconnect 6020!

As most people know the 6120XP is the same hardware as the Nexus 5010.  Only difference is that its spray painted green.  Well this particular model I got was gray and said it was a Nexus 5010.  So I was bound and determined to get it back.  I got pretty close, and wanted to write down the steps I took.

I’m sad to say, however, that I didn’t get it to work all the way.

Here’s what I did:

Step 1. Get TFTP server setup (This explains how to do it for a MacBook Pro)
I’m running Mountain Lion OSX. Turns out there is a default tftp server installed with it. Getting it running is pretty easy. Just run:

sudo launchctl load -F /System/Library/LaunchDaemons/tftp.plist

(Turning it off is done with:

sudo launchctl unload -F /System/Library/LaunchDaemons/tftp.plist

)
(To see if its running run:

sudo launchctl list | grep tftp,

if you see output its running, if not, its not)

From there you need to put the files you need into the /private/tftpboot/

I went to Cisco’s support page and easily found two files:
n5000-uk9.5.2.1.N1.5.bin < the software file
and
n5000-uk9-kickstart.5.2.1.N1.5.bin < the kickstart file

I had to copy them with sudo since you’re going into a privileged directory.

You should test your tftp server to make sure it works. No use yelling at the Nexus 5000 for telling you it can’t access the file.

From the command prompt on the mac:

cd ~/Desktop
tftp localhost
get n5000-uk9.5.2.1.N1.5.bin

If that works, you are in business.

Step 2. Load the Nexus 5000 (that thinks its a 6100) into the loader prompt.

When the machine started booting, I had to do

Ctrl+Shift+R

right as it was loading the UCS kickstart file. Doing this got me to a
loader>

prompt.

From here, we don’t have a lot of options. But all we need to do is set the mgmt0 interface and kickstart from our Nexus image that we have on tftp.
(Incidently, at this point I ran the dir command to see if there were any nexus images, and there wasn’t! Only UCS images. )

Here’s how we set that:

loader> set ip 192.168.1.99 255.255.255.0

Then it confirmed that this was good. Now, to load up the kickstart file:

loader> boot tftp://192.168.1.234/private/tftpboot/n5000-u9-kickstart.5.2.1.N1.5.bin
Address: 192.168.1.99
Netmask: 255.255.255.0
Server: 192.168.1.234
Gateway: 0.0.0.0
Booting: /private/tftpboot/n5000-uk9-kickstart.5.2.1.N1.5.bin console=ttyS0,960
0n8nn

the system then boots up. Does some image verification and loads into a boot prompt:

Cisco Nexus Operating System (NX-OS) Software
TAC support: http://www.cisco.com/tac
Copyright (c) 2002-2013, Cisco Systems, Inc. All rights reserved.
The copyrights to certain works contained in this software are
owned by other third parties and used and distributed under
license. Certain components of this software are licensed under
the GNU General Public License (GPL) version 2.0 or the GNU
Lesser General Public License (LGPL) Version 2.1. A copy of each
such license is available at
http://www.opensource.org/licenses/gpl-2.0.php and

http://www.opensource.org/licenses/lgpl-2.1.php

switch(boot)#

Step 3: Copy files and continue booting

Now we just need to get the files on the device.

switch(boot)# con t
switch(boot)(config)# inter mgmt 0
switch(boot)(config-if)# ip address 192.168.1.99 255.255.255.0
switch(boot)(config-if)# no shutdown
switch(boot)(config-if)# exit
switch(boot)(config)# exit
switch(boot)# copy tftp: boot flash:
switch(boot)# copy tftp: bootflash:
Enter source filename: /private/tftpboot/n5000-uk9.5.2.1.N1.5.bin
Enter hostname for the tftp server: 192.168.1.234
Trying to connect to tftp server……
Connection to server Established. Copying Started…..

At this point I went downstairs and had some chips to eat. I got back and had to wait like 15-20 min for it to copy. Shesh! Finally, when I was about to cancel it, I saw:

TFTP get operation was successful
Copy complete, now saving to disk (please wait)…

Now we need to get the kickstart file:

switch(boot)# copy tftp://192.168.1.234/n5000-uk9-kickstart.5.2.1.N1.5.bin boot flash:

So I waited some more, this one didn’t take as long.

Then I deleted a bunch of UCS files:

switch(boot)# delete bootflash:ucs-6100-k9-system.4.0.1a.N2.1.0.1036.gbin
switch(boot)# delete bootflash:cisco_nexus_1000v_certificate.pem
switch(boot)# delete bootflash:ucs-6100-k9-kickstart.4.0.1a.N2.1.0.1036.gbin
switch(boot)# delete bootflash:ucs-6100-k9-kickstart.4.0.1a.N2.1.0.1056d.gbin
switch(boot)# delete bootflash:ucs-6100-k9-system.4.0.1a.N2.1.0.1056d.gbin
switch(boot)# delete bootflash:ucs-manager-k9.1.0.0.1036.gbin
switch(boot)# delete bootflash:ucs-manager-k9.1.0.0.1056d.gbin

Then I booted the image:

switch(boot)# load n5000-uk9.5.2.1.N1.5.bin

This set me to the boot prompt again. So I hit exit:

boot switch(boot)# exit

It kept rebooting to stored images of UCS manager. So I found this command:

init system check-filesystem

From here, I repeated the operation of downloading the 2 Nexus images.  At least now it didn’t boot up into UCS Fabric Interconnect, but I could never get it to go to regular Cisco Nexus 5010.  It may be that there was something wrong with the hardware.  It certainly looks a little beat if you look at this hardware.  If nothing else, I learned a little more about the boot files in the Nexus 5000.

Cisco UCS East-West Traffic Performance.

The worst thing you can do in tech is claim something positive or negative about some technology without anything to back it up.  Ever since UCS was first brought to market, other blade vendors have been quick to point out any flaw they can find.  This is mostly because their market share of the x86 blade space has been threatened and in some cases (IBM & Dell) surpassed by UCS.

One of the claims that I’ve heard while presenting UCS is that the major flaw with the architecture makes switching between to blades inferior to the legacy architectures that other hardware vendors use.  You see, (they told me) in order for one UCS blade to communicate to another UCS blade you have to leave the chassis, go into the Fabric Interconnects (that could be all the way at the top of rack, or even in another rack), and then come back into the chassis.  This must take an eternity.

Network traffic from one blade to another in the same chassis is called “East-West” traffic because the traffic doesn’t leave the chassis.  (Picture it going sideways) where as “Nort-West” traffic is network traffic that leaves the chassis and goes out to some other end point that doesn’t reside in the chassis.  The widely held belief was that UCS was a a huge disadvantage here.

After all, every other blade chassis on the market has network switches that sit inside the chassis and *must* be able to perform faster than UCS.  For a while now, I’ve wondered how much latency that adds.  Because, frankly, I thought the same way they did.  Surely the internal wires must be faster than twinax cables.

But science, that pesky disprover of legacy traditions and beliefs, has finally come to settle the argument.  And in fact has turned the argument on its head.  The east-west traffic inside UCS is faster than the legacy chassis.

The full blog can be read here.  There’s a link to a few great papers on this site that show how the measurements done.

Plus one for the scientific method!

OpenStack Summit 2013 Food Recommendations

I’m really looking forward to the OpenStack Summit 2013 conference next week.  I have my schedule blocked off to be able to soak in as much information as I can.

Being as I live in Portland, I thought I’d put out a few recommendations of places I like to eat in case you’re around since some people asked me.  Yes, I’m probably leaving off tons of stuff.  The food carts, the grilled cheese grill, but hey, I just wanted to put together a quick list.  Feel free to invite me.

Breakfast

Waffle Window – I’ll have ice cream on my waffles for breakfast.  Thanks

Quick Lunch

Por que No – Really good carne asada tacos.  Two locations.

Bunk Sandwiches – Super good.  No space to eat inside but great to grab a great Sandwich.

Kenny and Zuke’s – People love this place.  I think its pretty good.  Big sandwich.

Dinner or Big lunches

Asian Style

Bamboo Sushi – Awesome sushi and kobe beef hamburgers.  Love this place.  Get both.

Lucky Strike – Never been here, but here its fantastic.

Italian & Mediterianian Style

Acena – Never been here but here its amazing.

Serrato - Looks good.  Can’t remember if I ate here or not.  I think I did.

Apizza Scholls – Probably the best pizza in Portland or the world.

Western Asianish

Marrakesh – Belly dancers?  Eating on the floor?  Eat with your fingers?  Yes.

East India Co. – Even my Indian friends admit you can’t even get Indian food this good in India.

Portlandish

Paley’s – Want to know the name of the chicken you are eating?  Where it grew up?

Screen Door – Southern Cuisine.  Loved it.  Don’t remember much more than that.

Castagna – Northwest Cuisine.  Good hamburgers.  Also Pigs feet if you like that too.

Up North

If you are staying in Vancouver and don’t mind a quick trip east, check out LaPella.  Pretty good.  My wife and I ate there 2 weeks ago.

Looking forward to seeing everyone!

Hacking UCS Manager to get pictures

I was reading the API for UCS manager the other day (hey, everybody has a hobby right?) and I found out a pretty cool place where the Java UCS Manager downloads the picture files.  I still haven’t found all the files (like the Fabric Interconnects and the Chassis, and IOMs) but most of the server models are found this way.  Substitute your UCS Manager IP address into the script below and it will download the pictures of the blades.  I wish I would have known this before I gathered pictures for UCS Tech Specs as these are great pictures.

#!/bin/bash
IP=10.93.234.241/pictures
wget http://$IP/blade/B230.png
wget http://$IP/blade/B230.png
wget http://$IP/blade/B440.png
wget http://$IP/blade/Blade_full_width_front.png
wget http://$IP/blade/Blade_full_width_front.png
wget http://$IP/blade/Blade_half_width_front.png
wget http://$IP/blade/Blade_half_width_front.png
wget http://$IP/blade/Blade_half_width_front_marin.png
wget http://$IP/blade/Blade_half_width_front_marin.png
wget http://$IP/blade/SfBlade.png
wget http://$IP/blade/SfBlade.png
wget http://$IP/blade/sequoia_front.png
wget http://$IP/blade/sequoia_top.png
wget http://$IP/blade/silver_creek_front.png
wget http://$IP/blade/silver_creek_top.png
wget http://$IP/blade/ucs_b200_m3_front.png
wget http://$IP/blade/ucs_b200_m3_top.png
wget http://$IP/fi/switch_psu_DC.png
wget http://$IP/rack/Alameda_1_front.png
wget http://$IP/rack/Alameda_1_top.png
wget http://$IP/rack/Alameda_2_front.png
wget http://$IP/rack/Alameda_2_top.png
wget http://$IP/rack/Alpine_M2.png
wget http://$IP/rack/Alpine_M2_front.png
wget http://$IP/rack/C220M3_front_small.png
wget http://$IP/rack/C220M3_top.png
wget http://$IP/rack/C420_front.png
wget http://$IP/rack/C420_internal.png
wget http://$IP/rack/SD1_Gen2_front.png
wget http://$IP/rack/SD1_Gen2_front.png
wget http://$IP/rack/SD1_Gen2_internal.png
wget http://$IP/rack/SD1_Gen2_internal.png
wget http://$IP/rack/san_mateo_front.png
wget http://$IP/rack/san_mateo_internal.png
wget http://$IP/rack/sl2_front.png
wget http://$IP/rack/sl2_front.png
wget http://$IP/rack/sl2_top.png
wget http://$IP/rack/sl2_top.png
wget http://$IP/rack/st_louis_1u_front.png
wget http://$IP/rack/st_louis_1u_top.png
wget http://$IP/rack/st_louis_2u_front.png
wget http://$IP/rack/st_louis_2u_top.png