Category Archives: UCS

Cisco, Cluster on Die, and the E5-2600v3

I was in a meeting where someone noted that VMware does not support the Cluster on Die feature of the new Intel Haswell E5-2600v3 processors.  The question came: “What performance degradation do we get by not having this supported?  And would it be better to instead use the M3 servers”

This information they got from Cisco’s whitepaper on recommended BIOS settings for the new B200 M4, C240 M4, and C220 M4 servers.

The snarky answer is: You’re asking the wrong question.  The nice answer is: “M4 will give better performance than M3″.

Let’s understand a few things.

1.  Cluster on Die (CoD) is a new feature.  It wasn’t in the previous v2 processors.  So assuming all things equal, running the v3 without it just means that you don’t get whatever increased ‘goodness’ it will provide.

2.  Understand what CoD does.   This article does a good job pointing out what it does.  Look at it this way:  As each socket gets more and more cores in each iteration of Intel’s new chips, you have a lot of cores looking into the same memory bank.  It starts making it so latency goes up to find the correct data.  CoD carves up the regions so data to the cores stay somewhat more coherent.  Almost like giving each core its own bank of cache and memory.  This helps the latency go down as more cores are added.

To sum it up:  If I have a C240 M3 with 12 cores per processor and I compare it to a C240 M4 with 12 cores per processor with CoD disabled, then I still have the same memory contention problem with the M4 cores that I did with the M3s.  When VMware eventually supports CoD, then you just get a speed enhancement.

 

Unauthorized UCS Invicta usage

I had a UCS Invicta that I needed to change from a scaling appliance (where more than one invicta node is managed by several others) to a stand alone appliance.  I thought this could be done in the field but at this point I don’t think its an option still.  The below info is just some things I tried in a lab and I’m pretty sure are not supported by Cisco.

While the UCS Invicta appliance has had its share of negative press and setbacks, I still have pretty high hopes for this flash array.  Primarily because the operating system is so well designed for solid state media.  Its different, full featured, and I think offers a lot of value for anyone in need of fast storage. (everyone?)

First step was to take out the Infiniband card and put in the fibre channel cards.  I put it in the exact spot the IB card was in.

putting the FC card in the Invicta

CIMC

I had to change IP address.  Booted up.  Pressed F8, changed the IP address to 10.1.1.36/24 and didn’t touch the password.

The default CIMC password to login was admin / d3m0n0w! (source)

Console

It takes forever to boot up because there is no InfiniBand card in the appliance anymore.  I took it out and replaced with a fibre channel HBA that came in one of the SSRs that managed the nodes.

The default console user/password is  console / cisco1 (source)

This didn’t work for me I thought because whoever had the system before me changed it.  Fortunately for me, this is Linux.  Rebooted the system and added ‘single’ to the grub menu.   Yes, I securely compromised the system.  Such is the power of the Linux user.

IMG_6674

Once on the command line I entered the passwd command to change the root password into something I could log in with.  I took a look at the /etc/passwd table and noticed that there was no console user.  Hmm, guess that’s why I couldn’t log in.

I then rebooted.

Reformatting

There were three commands that looked pretty promising.  After digging around I found:

/usr/local/tools/invicta/factoryreset/

/usr/local/tools/invicta/menu  and the accompanying   menu.sh  command.

and

/usr/local/tools/invicta/support/secureWipe.sh 

I tried all these and then rebooted the system.  Nothing seemed to indicate any progress as to turning this node into a stand alone appliance.  I have a few emails out and if I figure it out, I’ll post an update. 

FCoE with UCS C-Series

I have in my lab a C210 that I want to turn into an FCoE target storage.  I’ll write more on that in another post.  The first challenge was to get it up with FCoE.  Its attached to a pair of Nexus 5548s.  I installed RedHat Linux 6.5 on the C210 and booted up.  The big issue I had was that even though RedHat Linux 6.5 comes with the fnic and enic drivers, the FCoE never happened.  It wasn’t until I installed the updated drivers from Cisco that I finally saw a flogi.  But there were other tricks that you had to do to make the C210 actually work with FCoE.

C210 CIMC

The first part to start is looking in the CIMC (with the machine powered on) and configure the vHBAs. From the GUI go to:

Server -> Inventory

Then on the work pane, the ‘Network Adapters’ tab, then down below select vHBAs.  Here you will see two vHBAs by default.  From here you have to set the VLAN that the vHBA will go over.  Clicking the ‘Properties’ on the interface you have to select the VLAN.  I set the MAC address to ‘AUTO’ based on a TAC case I looked at, but this never persisted.  From there I entered the VLAN.  VLAN 10 for the first interface and VLAN 20 for the second interface.  This VLAN 10 matches the FCoE VLAN and VSAN that I created on the Nexus 5548.  On the other Nexus I creed VLAN 20 to match FCoE VLAN 20 and VSAN 20.

This then seemed to require a reboot of the Linux Server for the VLANs to take effect.  In hindsight this is something I probably should have done first.

RedHat Linux 6.5

This needs to have the Cisco drivers for the fnic.  You might want to install the enic drivers as well.  I got these from cisco.com.  I used the B series drivers and it was a 1.2GB file that I had to download all to get a 656KB driver package.  I installed the kmod-fnic-1.6.0.6-1 RPM.  I had a customer who had updated to a later kernel and he had to install the kernel-devel rpm and recompile the driver.  After it came up, it worked for him.

With the C210 I wanted to bond the 10Gb NICs into a vPC.  So I did an LACP bond with Linux.  This was done as follows:

Created file: /etc/modprobe.d/bond.conf

alias bond0 bonding
options bonding mode=4 miimon=100 lacp_rate=1

Created file: /etc/sysconfig/network-scripts/ifcfg-bond0

DEVICE=bond0
IPADDR=172.20.1.1
ONBOOT=yes
NETMASK=255.255.0.0
STARTMODE=onboot
MTU=9000

Edited the /etc/sysconfig/network-scripts/ifcfg-eth2

DEVICE=eth2
MASTER=bond0
SLAVE=yes
HWADDR=58:8D:09:0F:14:BE
TYPE=Ethernet
UUID=8bde8c1f-926f-4960-87ff-c0973f5ef921
ONBOOT=yes
NM_CONTROLLED=no
BOOTPROTO=none

Edited the /etc/sysconfig/network-scripts/ifcfg-eth3

DEVICE=eth3
MASTER=bond0
SLAVE=yes
HWADDR=58:8D:09:0F:14:BF
TYPE=Ethernet
UUID=6e2e7493-c1a1-4164-9215-04f0584b338c
ONBOOT=yes
NM_CONTROLLED=no
BOOTPROTO=none

Next restart the network and you should have a bond. You may need to restart this after you configure the Nexus 5548 side.

service network restart

Nexus 5548 Top
Log in and create VPCs and stuff.  Also don’t forget to do the MTU 9000 system class.  I use this for jumbo frames in the data center.

policy-map type network-qos jumbo
class type network-qos class-default
mtu 9216
multicast-optimize
system qos
service-policy type network-qos jumbo

One thing that drives me crazy is that you can’t do sh int po 4 to see that the MTU is 9000. From the documents, you have to do

sh queuing int po 4

to see that your jumbo frames are enabled.

The C210 is attached to ethernet port 1 on each of the switches.  Here’s the Ethernet configuration:

The ethernet:

interface Ethernet1/1
switchport mode trunk
switchport trunk allowed vlan 1,10
spanning-tree port type edge trunk
channel-group 4

The port channel:

interface port-channel4
switchport mode trunk
switchport trunk allowed vlan 1,10
speed 10000
vpc 4

As you can see VLAN 10 is the VSAN. We need to create the VSAN info for that.

feature fcoe
vsan database
vsan 10
vlan 10
fcoe vsan 10

Finally, we need to create the vfc for the interface:

interface vfc1
bind interface Ethernet1/1
switchport description Connection to NFS server FCoE
no shutdown
vsan database
vsan 10 interface vfc1

Nexus 5548 Bottom
The other Nexus is similar configuration.  The difference is that instead of VSAN 10, VLAN 10, we use VSAN20, VLAN 20 and bind the FCoE to VSAN 20.  In the SAN world, we don’t cross the streams.  You’ll see that the VLANS are not the same in the two switches.

Notice that in the below configuration, VLAN 20 nor 10 is defined for through the peer link so you’ll only see VLAN 1 enabled on the vPC:

N5k-bottom# sh vpc consistency-parameters interface po 4

Legend:
Type 1 : vPC will be suspended in case of mismatch

Name Type Local Value Peer Value
————- —- ———————- ———————–
Shut Lan 1 No No
STP Port Type 1 Default Default
STP Port Guard 1 None None
STP MST Simulate PVST 1 Default Default
mode 1 on on
Speed 1 10 Gb/s 10 Gb/s
Duplex 1 full full
Port Mode 1 trunk trunk
Native Vlan 1 1 1
MTU 1 1500 1500
Admin port mode 1
lag-id 1
vPC card type 1 Empty Empty
Allowed VLANs – 1 1
Local suspended VLANs – – -

But on the individual nodes you’ll see that the VLAN is enabled in the VPC. VLAN 10 is carrying storage traffic.

# sh vpc 4

vPC status
—————————————————————————-
id Port Status Consistency Reason Active vlans
—— ———– —— ———– ————————– ———–
4 Po4 up success success 1,10

Success?

How do you know you succeeded?

N5k-bottom# sh flogi database
——————————————————————————–
INTERFACE VSAN FCID PORT NAME NODE NAME
——————————————————————————–
vfc1 10 0x2d0000 20:00:58:8d:09:0f:14:c1 10:00:58:8d:09:0f:14:c1

Total number of flogi = 1.

You’ll see the login. If not, then try restarting the interface on the Linux side. You should see a different WWPN in each Nexus. Another issue you might have is that the VLANS may be mismatched, so make sure you have the right node on the right server.

Let me know how it worked for you!

Changing UCS IP addresses

I have a UCS lab machine that I sometimes take to different locations for proof of concept work.  One of the things I regularly have to do is change the password and hostname.  Here’s how you do it on the command line:

KCTest-A# scope fabric-interconnect a
KCTest-A /fabric-interconnect # set out-of-band ip 10.1.1.23 netmask 255.255.255.0 gw 10.1.1.1
Warning: When committed, this change may disconnect the current CLI session
KCTest-A /fabric-interconnect* # scope fabric-interconnect b
KCTest-A /fabric-interconnect* # set out-of-band ip 10.1.1.24 netmask 255.255.255.0 gw 10.1.1.1
Warning: When committed, this change may disconnect the current CLI session
KCTest-A /fabric-interconnect* # scope system
KCTest-A /system* # set virtual-ip 10.1.1.25
KCTest-A /system* # set name ccielab
KCTest-A /system* # commit-buffer

 

It’s great because you can change all the IP addresses on each server, the virtual server, and the hostname in one shot.

Source of docs

 

Cloud Computing: How Do I Get There?

This post comes from a talk that I’ll be presenting on at the Pacific Northwest Digital Government Summit Conference on October 2nd, 2013.

History shows us that those that embrace technology and change survive while those that resist and stick with “business as usual” get left behind.  If we have the technology and we don’t use it to make IT look like magic, then we’re probably doing it wrong. (Read “The Innovator’s Dilemma” and Clarke’s Three Law.)

I’ll be talking mainly about private cloud today, but many of these ideas can be taken into the public cloud as well.

Optimizing ROI on your Technology

My friend tells a story about when his wife first started using an iPhone.  To get directions on a map she’d open up Safari and go to http://maps.google.com.  To check Facebook she would open Safari and go to http://facebook.com.  To check her mail she’d open up Safari again and navigate to http://gmail.com.  You get the idea.

She was still getting great use of her iPhone.  She could now do things she could never do before.  But there was a big part she was missing out on.  She wasn’t using the App ecosystem that makes all of these things easier and delivers a richer experience.

Today, most organizations have virtualization in the data center.  Because of this IT is able to do things they’ve never been able to do before.  They’re shrinking their server footprints to once unimaginable levels saving money in capital and management costs.  I’ve been in many data centers  where people proudly point to where rows of racks have been consolidated to one UCS domain with only a few blades.  Its pretty cool and very impressive.

But they’re missing something as big as the App Store.  They’re missing out on the APIs.  This is where ROI is not being optimized in the data center in a big way.

IT is shifting (or has shifted) to a DevOps model. DevOps means that your IT infrastructure team is more tightly aligned with your developers/application people.  This is a management perspective.  But from a trenches perspective, the operations team is now turning into programmers.  Programmers of the data center.  The guy that manages the virtual environment, the guy who adds VLANs to switches, or the guy who creates another storage LUN: they’re all being told to automate and program what they do.

The group now treats the IT infrastructure like an application that is constantly adding features and doing bug fixes.

The programming of the IT infrastructure isn’t done in compiled languages like Java, C, or C++.  Its done in interpreted languages like Python, Ruby, Bash,  Powershell, etc.  But the languages alone don’t get you there.  You need a framework.  This is where things like Puppet or Chef come into play.  In fact, you even can look at it like you’re programming a data center operating system.  This is where OpenStack provides you a framework to develop your data center operating system.  Its analogous to the Web Application development world.  Twitter was originally developed in Ruby using a framework called Ruby on Rails.  (Twitter has since moved off Ruby on Rails).

Making this shift gives you unprecedented speed, agility, and standardization.  Those that don’t do it, will find their constituents looking elsewhere for IT services that can be delivered faster and cheaper.

The IT assembly line

Its hard for people to think of their IT professionals as assembly line workers.  After all, they are doing complex things like installing servers, configuring networks, and updating firmware.  These are CCIEs, VCPs, and Storage Gurus.  But that’s actually what people in the trenches are:  Workers of the virtual Assembly line.  IT managers should look at the way work enters the assembly line, understand the bottlenecks, and track how long it takes to get things through the line.  Naturally, there are exceptions that crop up.  But for the most part, the work required to deliver applications to the business are repetitive tasks.  They’re just complicated, multi-step, repetitive tasks.

To start with, we need to look at the common requests that come in:  Creating new servers, deploying new applications, delivering a new test environment.  Whatever it is, management really needs to understand how it gets done, and look at it like the manufacturing foreman sitting above the plant, looking down and watching a physical product make its way through.  Observe which processes are in place, where they are being side stepped, or where they don’t exist at all.

As an example, consider all the steps required to deploy a server.  It may look something like the flowchart below:

That sure looks like an assembly line to me.  If you can view work that enters the infrastructure like an assembly line, you can start measuring how long it takes for certain activities to get done.  Then you can figure out ways to optimize.

Standardization of the Infrastructure

Manufacturing lines optimize throughput by standardizing processes and equipment.  When I hear VMware tell everybody that “the hardware doesn’t matter”, I take exception.  It matters.  A lot.  Just like your virtualization software matters.  Cisco and other hardware venders come from it the opposite direction and say “the hypervisor doesn’t matter, we’ll support them all”.  What all parties are really telling you is that they want you to standardize on them.  All parties are trying to prove their value in a private cloud situation.

What an organization will standardize on depends on a lot of things: Budget, skill set of Admins, Relationship with vendors and consultants, etc.  In short, when considering the holy trinity of the data center: Servers, Storage, & Networking it usually gets into a religious discussion.

But whatever you do, the infrastructure needs to be robust.  This is why the emergence of Converged Infrastructures like Vblocks, FlexPods, and other reference architectures have become popular.  The  “One-Piece-At-A-Time” accidental/cobbled architecture is not a good play.

Consider the analogy that a virtualized workload is cargo on a Semi Truck.  Do you want that truck running over a 6 lane solid government highway like I-5 or do you want that stuff traveling at 60mph down a rinky bridge?

This?

Or This?

Similarly, if your virtualization team doesn’t have strong Linux skills, you probably don’t want them running OpenStack on KVM.  That’s why VMware and Hyper-V are so popular.  Its a lot easier for most people’s skill level.

What to Standardize On?

While the choice of infrastructure standardization is a religious one, there are role models we can look to when deciding.  Start out by looking at the big boys, or the people you aspire to be when you grow up.  Who are the big boys that are running a world class IT as a service infrastructure?  AWS, RackSpace, Yahoo, Google, Microsoft, Facebook, right?

What are they standardizing on?  Chances are its not what your organization is doing.  Instead of VMware, Cisco, IBM, HP, Dell, EMC, NetApp, etc, they’re using open source, building their own servers, and using their own distributed filesystems.  They do this because they have a large investment in their DevOps team that is able to put these things together.

A State organization that has already standardized on a FlexPod or Vblock with VMware is not going to throw away what they’ve done and start over just so they can match what the big boys do.  However, as they move forward, perhaps they can make future decisions based on emulating these guys.

Standardize Processes

The missing part is standardizing the processes once the infrastrucutre is in place.  Standardization is tedious because it involves looking at every detail of how things are done.  One of my customers has a repository of documentation they use every time they need to do something to their infrastructure.  For example, 2 weeks ago we added new blade servers to the UCS.  He pulled out the document and we walked through it.  There were still things we modified in the documentation, but for the most part the steps were exact.

Unfortunately, this was only one part of the process.  The Networking team had their own way of keeping notes (or not at all) on how to do things.  So the processes were documented in separate places.  What the IT manager needs to do is make sure they understand how the processes (or work centers) are put together and how long each one takes.

The manager should be able to have their own master process plan to be able to track work through the system.  (The system being the different individuals doing the work).  This is what is meant by “work flow”.  Even if they just do this by hand or as is commonly done with a Gantt chart, there should be some understanding.

Each job that comes in, should get its own workflow, or Gantt Chart, and entered into something like a Kanban board.  Once you understand this for the common requests, you can see how many one offs there are.

Whether these requests are for public cloud or private cloud, there is still a workflow.  It is an iterative process that may not be complete the first few times it is done, but over time will become better.  There is a great book called “The Phoenix Project” that talks about how the IT staff starts to standardize and work together between development and operations to get their processes better.  These ideas are based off an earlier business classic called “The Goal”

Automate the Processes

Once the processes are known we turn our assembly line into programmers of the processes.  I used to worked as a consulting engineer to help deploy High Performance Computing clusters.  On several occasions the RFPs required that the cluster be able to be deployed from scratch in less than 1 hour.  From bare metal, to running jobs.  We created scripts that would go through and deploy the OS, customize the user libraries, and even set up a job queuing system.  It was pretty amazing to see 1,200 bare metal rack mount servers do that.  When we would leave, if the customer had problems with a server then they could replace it, plug it in, and walk away.  The system would self provision.

While that was a complicated process and still is, it is still simpler than what virtualization has done to the management of the data center.  We never had to mess with the network once it was set up.  Workflows for a new development environment are pretty common and require provisioning several VMs with private networks and their own storage.  However, the same method of scripting the infrastructure can still be applied.  It just needs to be orchestrated.

Automate and Orchestrate with a Framework

Back when we did HPC systems, we used an open source management tool called xCAT.  That was the framework by which we managed the datacenter.  The tool had capabilities but really what it gave us was a framework to insert our customizations or our processes that were specific for each site.  The tool was an enabler of the solution, not the solution itself.

Today there are lots of “enterprise” private cloud management tools.  In fact, any company that wants to sell a “Private Cloud”  will have its own tool.  VMware vCloud Director, HP Cloud System, IBM Cloudburst, Cisco UCS Director, etc.  All of these products, regardless of how they are sold should be regarded as frameworks for automating your processes.

At a recent VMUG, the presenter asked “How many people are using vCloud Director or any other cloud orchestration tool?”  Nobody raised their hand.  Based on what I’ve seen its because most organizations haven’t yet standardized their IT processes.  There is no need for orchestration if you don’t know what you’re orchestrating.

Usually each framework will come with a part or all of what Cisco calls the “10 domains of cloud” which may include: A self service portal, chargeback/showback, service catalog, security, etc.  If you are using a public cloud, you are using their framework.

Once you select one, you’ll need to get the operations teams (network, storage, compute, virtualization) to sign off and use the tool.  Its not just a server thing.  Each part of the assembly line needs to use it.

Once the individual components are entered into the framework, then the orchestration comes to play.  To start with, codify the most common workloads:  Creating VLAN, Carving out a LUN, Provisioning a VM, etc.

To orchestrate means to arrange or control the elements of, as to achieve a desired overall effect.  With the Framework, we are looking to automate all of the components to deliver a self service model to our end customer.

Self Service and Chargeback

Once we have the processes codified in the framework, we can now present a catalog to our users.  With a self service portal we recommend it not being completely automated to start out with.  With some frameworks, as a workload moves through the automated assembly line, it can send an email to the correct IT department to validate whether a workflow can move through.  So for example, if the user as part of the workflow wants a new VLAN for their VM environment, the networking administrator will receive an email and will be able to approve or deny.  This way, the workflow is monitored, the end requester knows where they are in the queue, and  once it is approved, it gets created automatically, then gets passed along to the next item in the assembly line.

For chargeback, the recommendation is to keep the menu small, and the price simple.

Security all throughout then Monitor, Rinse, and Repeat

More workflows will come into the system and the catalog will need to continuously need updating and revisions.  This is the programmable data center.  Iterations should be checked into a code repository similarly to how application developers use systems like github.com to store code updates.  You will have to do bug fixes and patch up any exposed holes.  With virtualization comes the ability to integrate more software security services like the ASA 1000v, or the VSG.

Action Items

  • Realize that your IT infrastructure is a collection of APIs waiting to be harnessed and programmed.  Challenge the people you work with to learn to use those APIs to automate their respective areas of expertise.
  • Optimize the assembly line by understanding the workflows.  Any manufacturing manager can tell you the throughput of the system.  An IT manager should be able to tell you the same thing about their system.  Start by understanding the individual components, how long it takes, and where the bottlenecks in the system are.
  • Standardize your infrastructure with a solid architecture.  Converged architectures are popular for a reason.  Don’t reinvent the wheel.
  • Standardizing processes is the hardest part.  Start with the most common.  These are usually documented.  Take the documentation and think how you would change it into code.
  • Program the DataCenter using a Framework.  Most of the work will have to be done in house or with service contracts.  The framework could be something like a vendors cloud software or something free like OpenStack.

 

UCS Reverse Path Forwarding and Deja-Vu checks

UCS Fabric Interconnects are usually always run in end-host mode.  At this point in the story there really isn’t that many reasons to use switch-mode on the Fabric Interconnects.

Two checks, or features that make End Host Mode possible are Reverse Path Forwarding (RPF) checks and Deja-Vu checks.

RPF and Deja-Vu (from Cisco.com)

Reverse Path Forwarding Checks

Each server in the chassis is pinned dynamically (or you can set up pin groups and do it statically, but I don’t recommend that) to an uplink on Fabric Interconnect A and Fabric Interconnect B.  Let’s say you have 2 uplinks on port 31 and 32 of your Fabric Interconnect.  Server 1/1 (chassis 1 / blade 1)  may be pinned to port 31.  If a unicast packet is received for server 1/1 on uplink port 31, it will go through.  But if that same packet destined for server 1/1 is received on port 32, it will be dropped.  That’s because RPF checks to see if the destination for the unicast is actually forwarding its uplink traffic through that link.

Deja Vu Checks

The other check is called “Deja-Vu” .  In the Cisco documentation it says: “Server traffic received on any uplink port, except its pinned uplink port is dropped“.  That sounds a lot like RPF.  Another presentation from Cisco live states it this way: “Packet with source MAC belonging to a server received on an uplink port is dropped

An example to clear it up

VM A on server 1/1 wants to talk to VM B located somewhere else.  The Fabric Interconnects in this case are connected to a single Nexus 5500 switch.  The VM is pinned to one of the VNICs and that VNIC is pinned to go out port 31 of Fabric Interconnect A.  So what happens?

First the VM will send an ARP request.  An ARP request basically says:  I know the IP address but I want the MAC address.  (Obviously, this is in the same Layer 2 VLAN and subnet).  If Fabric Interconnect A doesn’t find the IP/MAC association in its CAM table, then it will not flood the server ports down stream.  That is something a switch would do.  The Fabric Interconnect is different.  The reason the Fabric Interconnect doesn’t send a broadcast down its server ports is because it is a source of truth and knows everyone connected on its server ports.

What it will do instead is forward the ARP request (unknown unicast) up the designated uplink (port 31).  Now the Nexus switch is a switch.  (And a very good one at that).  It will say:  “Hey, I don’t have a CAM table entry for VM B IP/MAC so I will do what we switches do best:  Flood all the ports! (except the port that the unknown unicast/ARP request came in on)

Remember Fabric Interconnect A port 32 is connected to this same switch as port 31 where the unknown unicast (ARP request) went out.  The Nexus 5500 will send this unknown unicast to port 32 just like every other port.  But port 32 says:  Wait a minute, the source address originated from me.  Deja-vu!  So he drops the packet.

Fabric Interconnect B has two ports 31 and 32 that will also receive the unknown unicast.  If VM B is pinned to a VNIC that is pinned to port 31 on Fabric Interconnect B, he will say:  I got this!  And the packet will go through.  Port 32, however on FI-B will look at the destination MAC and say:  This is not pinned to me, so I’ll drop the packet.  That is the RPF check.

To sum it up

Deja-Vu check:  don’t receive a packet from the upstream switch that originated from me.

Reverse Path Forward Check:  don’t receive a packet if there’s no server pinned to this uplink.

Backing up UCS

Backing up UCS can be a little confusing especially since it presents you a few options.  What you may be expecting is something simple like a one button easy “Back it up” button.  But in fact, that is not the case.  And the nice thing about it is there are lots of different things you can do with backup files.
From the Admin Tab under All in UCS Manager, under the general tab, you select “Backup Configuration”

But now, we have a few choices as to how we set this up.  Now you create a backup operation

Then you are presented the below screen and now things get a little bit complicated.

Let’s go through some of these seemingly confusing options:

Admin State

This is a bit confusing.  But here’s how to think about it:  If you want to run the backup now, right this second, when you click “OK” and don’t want to wait, select “Enabled”.  Most of the time this is what you want.  If instead, you just want to save this backup operation, so that you can click it on the Backup operations list and do it, then set disabled.

Type

There are 4 different configurations that can be backed up by UCS.  All of them deal with data that lives in the Fabric Interconnect.  They are illustrated in the diagram below

 

The brim of the triangle is the Full State.  This is a binary file that can be used to backup on any system to restore the settings that this Fabric Interconnect has.  Its different than all the other types.  Its the only one that can be used for system restore.  This is usually fun to backup off your own system.  I haven’t tried putting it into the platform emulator yet, but it might be fun to try.

The three other backups are just XML files.  They’re useful for importing into other systems.  The “All Configuration” is just a fancy way of saying “System Configuration” and “Logical Configuration”.  It does both.

The System Configuration is user names, roles, and locals.  This is useful if you are installing another UCS somewhere and you want to keep the same users and locales (if you are using some type of multi-tenancy) but in that case, why aren’t you using UCS Central?  Try it, its free for up to 5 domains.  And you can do global service profiles.

The Logical Configuration is all the pools and policies, service profiles, service profile templates you would expect to be backed up.  This is pretty good to put inside the emulator to fool around with different settings you are using.  Or, if you don’t have your UCS yet and you’re waiting to order it, then you can just create the pools and policies in the emulator.  Then when the real thing comes, import the logical configuration in and you are ready to rock.

The tricky button that shows up when you select the All Configuration or the Logical Configuration is the label:  Preserve Identities. This is only on logical and all configurations because it has to do with making service profiles that are already mapped to pools retain their mapping.  This is good if you’re going to move some service profiles from one fabric interconnect domain to another and want to keep the same setup.  Otherwise, it doesn’t really matter to keep those identities.

The other options presented for how you want to back up the system is pretty self explanatory.  You can either back this up to your local machine or some other machine that has another service running like SSH, TFTP, etc.

After you’ve created a backup operation, the nice thing is that it saves it for you in a backup operations list.  When you want to actually do it, just select it, then hit admin enable and it will perform the backup.

Performing Routine Periodic Backups

But wait you say, what if I want it to periodically backup itself?

Well, that’s where you move to the next tab which is the Policy Backup & Export

Here you have the option of backing up just the binary system restore button, or the all-configuration.  The all configuration is good for backing up XML files just in case some administrator accidentally changes a bunch of configs on you.

Here you can see, My XML and binary files will be backed up every day.  (That may be a little more than you need, as things don’t usually change so much in most environments, but hey, now you have it, use it.)

When it saves to those remote files you’ll get a timestamp on the name:

full-backup.bin.2013-07-28T22-55-11.555
all-config.xml.2013-07-28T22-57-11.559

So that’s backing up the system and all the ways it can be done.  There’s a few nerd nobs, but I wanted to make sure I understood it.

The last thing to cover is import operations.  Its important to understand that you can do two different types:  A merge or replace.  With merge, if you have a MAC pool called A and it has 30 MACs already, a merge will add the new MACs to it.  (So if there are 20 in the import, you will now have 50).  With replace, you’ll now just have 20.  You can only merge XML files.

Lastly, all of this information is found here in the latest  UCS GUI Configuration Guide It was nice to gain a more solid understanding of it.  Backing up is something I go over briefly in some of my tech days I do, but this flushes it out a little better if there are any further questions.

Thanks for reading!

 

 

Cisco UCS East-West Traffic Performance.

The worst thing you can do in tech is claim something positive or negative about some technology without anything to back it up.  Ever since UCS was first brought to market, other blade vendors have been quick to point out any flaw they can find.  This is mostly because their market share of the x86 blade space has been threatened and in some cases (IBM & Dell) surpassed by UCS.

One of the claims that I’ve heard while presenting UCS is that the major flaw with the architecture makes switching between to blades inferior to the legacy architectures that other hardware vendors use.  You see, (they told me) in order for one UCS blade to communicate to another UCS blade you have to leave the chassis, go into the Fabric Interconnects (that could be all the way at the top of rack, or even in another rack), and then come back into the chassis.  This must take an eternity.

Network traffic from one blade to another in the same chassis is called “East-West” traffic because the traffic doesn’t leave the chassis.  (Picture it going sideways) where as “Nort-West” traffic is network traffic that leaves the chassis and goes out to some other end point that doesn’t reside in the chassis.  The widely held belief was that UCS was a a huge disadvantage here.

After all, every other blade chassis on the market has network switches that sit inside the chassis and *must* be able to perform faster than UCS.  For a while now, I’ve wondered how much latency that adds.  Because, frankly, I thought the same way they did.  Surely the internal wires must be faster than twinax cables.

But science, that pesky disprover of legacy traditions and beliefs, has finally come to settle the argument.  And in fact has turned the argument on its head.  The east-west traffic inside UCS is faster than the legacy chassis.

The full blog can be read here.  There’s a link to a few great papers on this site that show how the measurements done.

Plus one for the scientific method!

Hacking UCS Manager to get pictures

I was reading the API for UCS manager the other day (hey, everybody has a hobby right?) and I found out a pretty cool place where the Java UCS Manager downloads the picture files.  I still haven’t found all the files (like the Fabric Interconnects and the Chassis, and IOMs) but most of the server models are found this way.  Substitute your UCS Manager IP address into the script below and it will download the pictures of the blades.  I wish I would have known this before I gathered pictures for UCS Tech Specs as these are great pictures.

#!/bin/bash
IP=10.93.234.241/pictures
wget http://$IP/blade/B230.png
wget http://$IP/blade/B230.png
wget http://$IP/blade/B440.png
wget http://$IP/blade/Blade_full_width_front.png
wget http://$IP/blade/Blade_full_width_front.png
wget http://$IP/blade/Blade_half_width_front.png
wget http://$IP/blade/Blade_half_width_front.png
wget http://$IP/blade/Blade_half_width_front_marin.png
wget http://$IP/blade/Blade_half_width_front_marin.png
wget http://$IP/blade/SfBlade.png
wget http://$IP/blade/SfBlade.png
wget http://$IP/blade/sequoia_front.png
wget http://$IP/blade/sequoia_top.png
wget http://$IP/blade/silver_creek_front.png
wget http://$IP/blade/silver_creek_top.png
wget http://$IP/blade/ucs_b200_m3_front.png
wget http://$IP/blade/ucs_b200_m3_top.png
wget http://$IP/fi/switch_psu_DC.png
wget http://$IP/rack/Alameda_1_front.png
wget http://$IP/rack/Alameda_1_top.png
wget http://$IP/rack/Alameda_2_front.png
wget http://$IP/rack/Alameda_2_top.png
wget http://$IP/rack/Alpine_M2.png
wget http://$IP/rack/Alpine_M2_front.png
wget http://$IP/rack/C220M3_front_small.png
wget http://$IP/rack/C220M3_top.png
wget http://$IP/rack/C420_front.png
wget http://$IP/rack/C420_internal.png
wget http://$IP/rack/SD1_Gen2_front.png
wget http://$IP/rack/SD1_Gen2_front.png
wget http://$IP/rack/SD1_Gen2_internal.png
wget http://$IP/rack/SD1_Gen2_internal.png
wget http://$IP/rack/san_mateo_front.png
wget http://$IP/rack/san_mateo_internal.png
wget http://$IP/rack/sl2_front.png
wget http://$IP/rack/sl2_front.png
wget http://$IP/rack/sl2_top.png
wget http://$IP/rack/sl2_top.png
wget http://$IP/rack/st_louis_1u_front.png
wget http://$IP/rack/st_louis_1u_top.png
wget http://$IP/rack/st_louis_2u_front.png
wget http://$IP/rack/st_louis_2u_top.png

VIFS in a UCS environment

First of all you may be asking if you stumbled upon this page:  “What is a VIF?”.  A VIF is a Virtual interface.  In UCS, its a virtual NIC.
Let’s first examine a standard rack server.  Usually you have 2 ethernet ports on the mother board itself.  Now days, the recent servers like the C240 M3 have 4 x 1GbE onboard interfaces.  Some servers even have 2x10GbE onboard NICs.  That’s all well and good and easy to understand because you can see it physically.
Now let’s look at a UCS blade.  You can’t really see the interfaces because there are no RJ-45 cables that connect to the server.  Its all internal.  If you could see it physically, then you’d see that you could add up to 8x10Gb physical NICs per half width blade.  Just like a rack mount server comes with a fixed amount of PCI slots, a blade has built in limits as well.  But Cisco blades work a little different.  Really, there are 2 sides:  Side A and Side B, each with up to 4x10GbE physical connections.  And those 4x10GbE are port channeled together, so it looks like one big pipe depending on what cards you put in there.
With these two big pipes (that are between two 10Gb and two 40Gb) we create virtual interfaces over these that are presented to the operating system.  That’s what a VIF is.  These VIFs can be used for some really interesting things.
VIF Use Cases
  1. It can be used to present NICs to the operating system.  This makes it so that the operating system thinks it has a TON of real NICs.  The most I’ve ever seen though is 8 NICs and 2 Fibre Channel adapters.  (Did I mention that Fibre Channel counts as a VIF?)  So 10 is probably the most you would use with this configuration.
  2. It can be used to directly attach virtual machines with a UCS DVS.  This is also one version of VM-Fex.  Here, UCS Manager acts as the Virtual supervisor and the VMs get real hardware for their NICs.  They can do vMotion and all that good stuff and remain consistent.  I don’t see too many people using this, but the performance is supposed to be really good.
  3. It can be used for VMware DirectPath IO.  This is where you tie the VM directly to the hardware using VMware DirectPath IO bypass method.  (Not the same as the UCS Distributed Virtual Switch I mentioned above.)  The advantage UCS has is that  you typically cannot do vMotion when you do VMware DirectPath IO.  With UCS, you can!
  4. USNIC (future!!!)  Unified NIC is where we can present one of these virtual interfaces directly to user space and create a low latency connection in our application.  This is something that will be enabled in the future on UCS, but it means we dynamically create these and can hopefully get latencies around 2-3 microseconds.  This is great for HPC apps and I can’t wait to get performance data on this.
  5. USNIC in VMs.  (future!!!)  This is where a user space application running in a VM will have the same latency as a physical machine.  That’s right.  This is where we really get VMs doing HPC low latency connections.
So now that we know the use cases, how can you tell how many virtual interfaces or VIFs you have for each server?  Well, it depends on the hardware and the software.  You see, they all allow for growth, but some instances have limitations.  So that’s what I’m hoping to explain below.
UCS Manager Limitations and Operating Systems Limitations
For 2.1 this is found here.  For other versions of UCS manager, just search for “UCS 2.x configuration limits”.
The Maximum VIFS per UCS domain today is 2,000

The document above also shows that for ESX 5.1 its 116 per host.  The document references UPT and PTS.
UPT – Uniform Pass Thru (this is configured in VMware with direct Path IO, use case 3 as I mentioned above)
PTS – Pass through Switching (this is UCS DVS, or use case 2 as I mentioned above)
Fabric Interconnect VIF Perspective
Let’s look at it from a hardware perspective.  The ASICs used on the Fabric Interconnects determine the limits as well.
6200
The UCS Fabric Interconnect 6248 uses the “Carmel” Unified Port Controller.  There is 1 “Carmel” port ASIC for every 8 ports.  So ports 1-8 are part of the first Carmel ASIC, etc.  In general, you want the FEX (or IO Module) connected to the same Carmel.
Each Carmel ASIC allows 4096 VIFs which are equally divided into all 8 switch ports.  Therefore, 512 VIFS per port.  Since one of those VIFs is dedicated to the CIMC, that gives 511 VIFS per port.  Consider that there are 8 slots in each chassis, so you would further divide that up between the 8 blade slots, so that’s 64 max in each slot.  Some are reserved, so it ends up being 63 VIFs per slot. That’s why the equation ends up being 63*n – 2 (2 are used for management)
Cisco Fabric Interconnect 6200
Uplinks Per FEX Number of VIFs per slot
1 61
2 124
4 250
8 502
6100
The 6100 uses the Gatos port controller ASIC.  There are 4 ports managed per Gatos ASIC.
Each Gatos ASIC allows 512 VIFs or 128 VIFS per port.  (512 VIFs per ASIC / 4 ports).  Each of those 4 ports gets divided by the 8 slots.  So, 128 / 8 = 16.  However, some of those are reserved, so it ends up being only 15 VIFs per slot.   That’s why the equation of VIFs per server is 15*n – 2  (the 2 are used for management)
Cisco Fabric Interconnect 6100
Uplinks per FEX Number of VIFS per slot
1 13
2 28
4 58
8 118 (obviously requres 2208)
VIFs from the Mezz Card Perspective
The M81KR card supports up to 128 VIFs.  So you can see from above that with the 6100 and 2104/2204/2208 its not the bottle neck.
The VIC 1280 which can be placed into the M1 and M2 servers can do up to 256 VIFs.
Hopefully that clarified VIFs a little and where the bottle necks are.  Its important to note as well that I/O modules don’t limit VIFs.  They’re just passthrough devices.