Beyond Virtual Desktop Infrastructure

I wrote a blog a few days ago that I wanted to modify because I didn’t get it right.  First of all, please note that everything I write here are my own thoughts and not those of my employer.

This article is about Virtual Desktop Infrastructure (VDI), end user computing, Desktop As a Service (DaaS), or whatever you want to call it.  Its very relevant to many organizations today and there are a lot of great solutions and people very vested in it.  Is this the year of the virtual desktop?  It is to some people!  To other people, it was 4 years ago and what’s the big deal?  But to some organizations, its not going to happen ever because there’s no use case.

What problems VDI solves

Let’s think about the problems VDI solves.  It gives us our enterprise environment remotely and allows Desktop support to control the image that workers get.  That’s what it does, not the problem it solves.  The problem it solves is giving us our enterprise applications anywhere.  You see, many of us could care less about having our mandated enterprise environment.  When I worked for my formal employer, the first thing I did when I got my corporate issued laptop was to promptly erase their blessed image install the whole thing from scratch.  Wipe it out, get rid of employer stuff and put Linux on it.  Then I was in control.  Then I’d worry about getting the apps on that I needed and used and not everything else that I didn’t need.

Desktop support probably didn’t like that, but I never called or used them and they never called or used me.  I got the apps I wanted and whenever I gave a presentation, I never had a little window at the bottom prompt me that my computer needed to reboot in 10 minutes to install some extremely important updates.  We lived separate happy lives.

Desktop support is not evil.  They need to control the operating system image to ensure the applications could run and run securely.  Plus they aren’t catering to people like me.  They’re catering to people who just want to get things done and not mess with things like I do.   So when you look at what VDI is today, its extending Desktop Support’s control into a virtual image.  I think this is great!  Then I can run my own image and whenever I need my corporate apps, I can log into a VDI image.  Perfect.

Why VDI is temporary for most Enterprises

But, VDI in most cases is a patch, or a temporary solution to getting today’s legacy applications to enterprise users.  Here’s where it works very well:  If you have an application that was written for Windows XP or Windows 7, then creating a virtual desktop to serve those apps can be very effective.  But applications have changed.  Most of my applications I use are web based.  I still use Excel,  and PowerPoint, but I store those now in Box that my company provided me as a secure place to put them.  (Think: DropBox for corporations)

My Desktop support is now application support.  They make applications available to me and I can use whatever device I want to access them.   They now have even greater control:  When my corporate support team updates our configuration tool that I use to create build of materials (BoM) for my customers, they control upgrades and revisions.  I never have to do it on my laptop.  Even if I liked the old way better, I have no control.  Application support now has more control than ever and ensures no one is running older apps.  Its great!  (You may have seen people complain against the new Facebook layout in the past.  Nothing they can do, because its not an app they run on their desktop)

Applications continue to migrate this way.  No one is continuing to build the next great desktop application.  They’re looking to make applications that run anywhere on anything.  Even Microsoft Office runs on my iPad!

In fact, if you look at it:  Desktop support (newly rechristened as : Application support) is actually getting more control while I feel like I’m getting more control!  What a great arrangement for two type-A personalities.

Skipping VDI 

I thought about this a little bit over the last few years but it wasn’t made super clear to me until about 2 weeks ago.  I happened upon a visit to a little known school district in the mountains of Utah.  Davis County school district  is the most advanced public school district I have ever seen.  I was blown away.  We started out talking about their applications and data center plans.  Mark Reid, the IT director, and several of his coworkers have been at the school district for the last 30 years.  Its a testament to see what the power of vision and long standing partnerships can achieve.  From the very beginning they’ve been writing their own applications to deliver IT services to the district.

Unlike many of the IT shops that I work with, Davis County employees a staff of developers that churn out their own applications for the school district.  From payroll, to financial, to grades they are doing it.  In fact, they even have an application myDSD that allows you to log in from the web or even on your iPhone or Android to check grades, notify if your child will be absent from school, and pretty much anything else you might need from a school district.  Wow.   I bet your school district doesn’t have anything like that.

The applications speak to each other through different software layers and protocols but they all come back to an Oracle RAC cluster.  This is where all the data is consolidated and backed up. They’ve already got Office 365 for the students out there.

During our meeting, one of the people in the room asked if Davis County School District was thinking about VDI.  Before Mark could answer, I already knew:  They didn’t have legacy apps.  There was no reason to deliver a virtual desktop.  All the applications could be accessed from the web or iOS/Android clients.  You see, if you already have apps that can live anywhere, you don’t need to serve a special desktop image.

The real problem they need to solve is a way to stitch together distributed data centers and develop a plan to source workloads to different clouds.

Where VDI will always be important

VDI is still and will be important to many organizations.  After all, it sure is better than installing and managing a bunch of desktops in a computer lab.  People still have legacy apps and there are license restrictions that may make you have to do it on a blessed desktop image.

But one area that is really growing the use of VDI to share powerful GPUs for heavy graphics applications.  As data continues to explode, visualizing it will be ever more important.  That is why I don’t see an obvious replacement or better way to do this.

Implications 

How do you think this transition of applications being centered on a desktop to being cloud enabled will effect the future? Back around 2006 when I was a remote worker at IBM they announced we could no longer expense our internet service.  They reasoned that most homes had this anyway and besides it was a great way for IBM to cut cost.   HP followed and so did others.   Soon all the tech companies started to do it.  Most companies don’t pay for remote access even though a significant amount of employees work from home.  (source:  my friends)

Today most companies will issue laptops to their knowledge workers.  Its great and they refresh every few years.  But could there be a time when employers say:  You already have a device (computer,  iPad, etc) we don’t need to pay for that anymore.  Just use our VPN service to get your applications and you are good.   Perhaps instead what they would do is give us an allowance of money that could be spent on a machine.

I don’t think this will happen at my employer soon because a nice laptop machine is a nice perk that makes employees happy.  But what about the universities and schools?  Would they eventually just shut down the computing labs and mandate all students bring their own?  Probably not for the engineering/art ones as I discussed above where they need GPUs.  But My friend’s kid 4 years ago went to a private school.  It was mandated  that  every student get a Mac Book.    The days may not be far off.

So next time you are evaluating whether or not this is the year of the virtual desktop, first look at your strategy for delivering applications anywhere.  Perhaps resources should be diverted towards new applications or BYOD initiatives to get off the legacy applications that are tying you down.  Remember:  Nobody wants or cares about your enterprise desktop image (nobody in their right mind).  They just want applications that work and allow them to get things done.

 

Distributed Data Centers

My thoughts on what cloud computing and the future of the data center has changed a bit in the last 3 years.  When I first started working on a cloud computing project for a large bank in America back in 2008 I was convinced that soon every enterprise would create their own private cloud and use xCAT (or something).  Then I thought they would instead all use OpenStack.  But I figured every organization would indeed build its own private cloud.  This has not panned out.  Not even close and its 6 years later.

Eventually, I thought, all enterprises would migrate to one public cloud provider, and it never occurred to me that people would see fit to use more than one public cloud provider.   I did form a concept of the InterCloud back then so I’m not too far off the mark.  But my vision is evolving and becoming more clear.  I finally see where IT is going.  (Or at least I think I do)

In my small sector of the world hardly anybody has a private cloud.  And when I say private cloud, I mean self service portals with completely automated provisioning.  Yeah, that’s just not happening.    The truth is, I don’t think it will for most organizations.  There’s not enough need there.  The only people that need VMs in a self service portal for most organizations are the VMware admins themselves and they are savvy enough to right click and make that happen without all your bloated self provisioning tools, thank you very much.

What I am seeing is that more and more are going to the public cloud.  This started out more as a shadow IT initiative, but more of the people I work with have in fact embraced it at central IT.   But its managed as a one off and people are still trying to figure it out.  People aren’t ditching their own data centers, and just like they’re not ditching their mainframes, in large enterprises there will always be some footprint on premise for IT services.

The other thing that seems completely obvious now is that people will want to use more than one public cloud provider.  The reason being some public clouds specialize in different things.  For example:  I might run Exchange/Office 365 on Azure, but I might run some development applications on AWS.  Similarly, I might have a backup as a service contract with SunGuard.  But I may not trust my data to anyone but my own 6 node Oracle RAC cluster that’s sitting in my very own datacenter.  Can you see where this leads us?

Central IT is now responsible for sourcing workloads.  The data center is distributed.  My organization’s data is all over the place.  My problem now is managing the sprawl.  Getting visibility to where the sprawl is and making sure I’m using it most effectively.

Another misconception I see is that people think using two or more public clouds  means VMs move between data centers.  Today, that’s pretty impractical.  Migrating VMs between data centers takes too long, even if the network problems weren’t a problem.  And besides, when you think that way, you are thinking more about pets in your data center instead of cattle like the future of applications is.  So forget about that right now.

Instead, focus on the real issue that needs to be solved.  And this is where I think Cisco can make big things happen.  That is:  How do you connect distributed data centers?

The Nexus 1000v InterCloud, or InterCloud Fabric I think is what Cisco is calling it now starts down this road.   It allows us to communicate with VMs in a public cloud with our own cloud using our same layer 2 address schema.  This is pretty cool, and a good start, but we’ll need more.  For example:  We might have our data base servers reside in our own data center.  (No self service portal here). Then we’ll develop apps that will be hosted in public clouds.  The application servers will need to communicate with each other and with the database.  The different applications may be in different clouds.  The real issue is how do they talk and communicate effectively, securely, and seamlessly.  That is the big issue that needs to be solved with distributed data centers.

Is this where you think we’re headed?  I feel like for the first time in five years I finally get what’s happening to IT.  So I’ll take comfort in that for now, until things change next month.

 

A few Nexus notes

I’ve been working with OTV and Fabricpath and I thought I’d put a few pointers down that I learned that had me scratching my head for a while.

1.  To makes sure that OTV is working, the join interfaces on each side must be able to ping each other.  Let’s say that site A has join interface 192.168.101.1 and site B has a join interface 192.168.102.1.  Before OTV can work, from the OTV VDC, you need to make sure that they can ping each other.  This will usually mean that routing is set up properly.

2.  For Fabricpath to work with OTV, I had my main aggregation VDC connected to the OTV VDC through an M1 interface.  This doesn’t work.  Instead, connect the OTV VDC to the aggregation VDC through an F1 interface.  This makes it so Fabricpath is terminated and moved into classical ethernet.  I scratched my head for probably 4 hours last night until I thought of trying that this morning.  Lessons learned.

Hopefully that helps someone.

QoS and Jumbo Frames on the Nexus 5500

Nexus 5548 UP

 

I’ve had the fortunate opportunity to have two Nexus 5548UPs in my lab to help test upgrade problems for one of my customers.  Its been great to have some gear to play with and really try to understand how it all works together.

One of the issues I’ve run up against in the past (and you may have too) is configuring jumbo frames on network switches.  Jumbo frames enable more bytes to be sent more efficiently through the data center.  The default maximum transmission unit size (MTU) on nearly all networks and server NICs is 1500 bytes.  That means if you want to send more traffic, then you need to send more frames.  When you increase the packet size to 9000 bytes then you send less headers, less frames, and more data.  The result is that it is supposed to decrease CPU load.  The adverse effects are that you may end up with higher latency and you may end up configuring a lot of end points.  That can get really complicated.

As an example of configuring jumbo frames in a data center consider all the endpoints that has to be configured:

  • Operating System:  The NIC must be set to MTU 9000.  On VMware, the vSwitch has to be set to this as well as the VMs if they will be supporting jumbo frames.
  • On UCS, the vNIC has to be mapped to a jumbo frames QoS policy.
  • The ports on the network switch must have jumbo frames enabled on the uplink.
  • The ports on the storage controller must have jumbo frames enabled.
  • The storage network interfaces must have jumbo frames configured.

So you can see there is a great deal of orchestration between several teams in the data center.  Everybody has to know what they are doing.

You can test if your jumbo frames are enabled on Linux by sending a simple ping:

ping -M do -s 8972
If that goes through without errors, then congratulations!  You have jumbo frames enabled from node to node.

Jumbo Frames on the Nexus 5500

There is a guide on Cisco’s web page that talks about enabling Jumbo frames.  But to do it, you have to do things like policy maps and class maps.  I’ve often thought:  Why is this so hard to do?  It seems like just an easy command would be more sufficient.

The reason goes back to the standard trade offs engineers make.  “Make it Easy” vs. “Make it Flexible”.   You can’t really have flexibility without more nerd knobs to turn.  Most of this is probably more applicable to application traffic.

The other problem I always think of:  Why can’t you just apply the MTU to the interface?  With the Nexus 5500 you don’t typically set it to the interface.  But can you?  Kind of.

QoS Class Map and Policy Maps

The architecture of Nexus 5500s is a bit different than that of Catalyst switches.  The buffering is done more on the ingress ports.  (The ports where traffic enters).  As such the first thing you’ll want to do is “tag” or “classify” the traffic that comes into the port.  You do this with a class-map command of type QoS.  How do you identify traffic?

The most common way to match traffic is with either an IP Access List, or by the protocol.  Let’s do it:

5k-top# conf
Enter configuration commands, one per line. End with CNTL/Z.
n5k-top(config)# class-map myTraffic
n5k-top(config-cmap-qos)# match protocol iscsi

Here we’re just matching iSCSI traffic. That’s one that we might want to do Jumbo Frames on. But we could also do something for IP addresses. Let’s say that all hosts on the 172.20.0.0/24 network should have jumbo frames. That would make sense if this network was for storage (NFS, iSCSI or whatever).

To do that we would use an access list:


n5k-top(config)# ip access-list jumbo-list
n5k-top(config-acl)# permit ip 127.20.0.0/24 any

Now we can put that on our QoS group:

n5k-top(config-acl)# class-map type qos myTraffic
n5k-top(config-cmap-qos)# no match protocol iscsi
n5k-top(config-cmap-qos)# match access-group name jumbo-list
n5k-top(config-cmap-qos)# sh class-map type qos myTraffic


Type qos class-maps
===================
class-map type qos match-all myTraffic
match access-group name jumbo-list

 

Great! Now we need to put this into one of the qos groups that we can work on. There’s a lot more we can do, but for simplicity, we’ll just put this into qos group 2.  That is a good place to be since this traffic could be NFS or iSCSI and may be more important than normal traffic.  We need to put it in a qos group because there are 2 other QoS types that we haven’t talked about, and those QoS types classify by qos-group numbers.   Here’s the other two:

network-qos: This is the QoS type that allows for Jumbo Frames (mtu), multicast, pause-no drop, and other settings that show how a packet gets through the network.  The shape of it and how it reacts.  The others do more for what happens when a packet enters or leaves the switch.

queuing: This is the last type of QoS and does things like allocate how much bandwidth certain traffic gets as it goes through the network.  By default, 100 percent of the bandwidth goes to the default class.

So now we know:  We have 3 types of QoS settings and each of these settings requires a class-map (to tag the traffic) and a policy-map: what to do with all the traffic when it comes in.  With policy-maps on each of these classes, you’ll associate multiple class-maps with behaviors.  Finally, once we have these policy-maps for each of the three types of classes, we assign this to the system QoS.

Let’s finish the qos type.  So far we’ve only made a class-map for it.  But now, we want a policy-map.  We may have three types of traffic:  Default traffic, myTraffic, and fcoe.  FCoE and the default traffic are there “by default”.  So let’s make a policy-map with those traffic types:

n5k-top(config-cmap-nq)# policy-map type qos myQoS-policy
n5k-top(config-pmap-qos)# class type qos myTraffic
n5k-top(config-pmap-c-qos)# set qos-group 2
n5k-top(config-pmap-c-qos)# class type qos class-fcoe
n5k-top(config-pmap-c-qos)# set qos-group 1

There, now we have three traffic lanes marked. The default was put in there for us. Check it out:

n5k-top(config-pmap-c-qos)# sh policy-map type qos myQoS-policy

Type qos policy-maps
====================

policy-map type qos myQoS-policy
class type qos myTraffic
set qos-group 2
class type qos class-fcoe
set qos-group 1
class type qos class-default
set qos-group 0

Ok, now we want to set our jumbo frames. That means we need to create a network-qos type of QoS. First we have to mark what we want. Our only options are to classify by qos-groups for the network-qos type of QoS. So let’s create a class-map:

n5k-top(config-cmap-que)# class-map type network-qos myTraffic
n5k-top(config-cmap-nq)# match qos-group 2

Notice that I kept the name the same in the network-qos type of QoS as I did for the qos type of QoS. This makes things a bit easier. Now we just need to create a policy-map using this class-map, as well as the fcoe class-map (the fcoe class-map is created by default)


n5k-top(config-cmap-nq)# policy-map type network-qos myNetwork-QoS-policy
n5k-top(config-pmap-nq)# class type network-qos myTraffic
n5k-top(config-pmap-nq-c)# mtu 9216
n5k-top(config-pmap-nq-c)# class type network-qos class-fcoe
n5k-top(config-pmap-nq-c)# pause no-drop
n5k-top(config-pmap-nq-c)# mtu 2158
n5k-top(config-pmap-nq-c)# sh policy-map type network-qos myNetwork-QoS-policy

Type network-qos policy-maps
===============================

policy-map type network-qos myNetwork-QoS-policy
class type network-qos myTraffic

mtu 9216
class type network-qos class-fcoe

pause no-drop
mtu 2158
class type network-qos class-default

mtu 1500
multicast-optimize

Ok! 2 down, 1 to go. The queueing policy. We just need to split these loads. First we need to mark the traffic. Again, we can only use qos-groups to do that. So this time we’ll create a queuing type of traffic. We’ll call the name the same as we did the last two class-maps:


n5k-top(config-pmap-c-qos)# class-map type queuing myTraffic
n5k-top(config-cmap-que)# match qos-group 2

Now that we have that, lets associate it to a policy map. (just like the others). Here we can just split it in half. We don’t really know what our traffic will be. (Maybe you do in your network). So we’re just going to give most of it to our jumbo frame traffic (50%), and then we’ll give 25% to the other two types (fcoe and default).


n5k-top(config-pmap-nq-c)# policy-map type queuing myQueuing-policy
n5k-top(config-pmap-que)# class type queuing myTraffic
n5k-top(config-pmap-c-que)# bandwidth percent 50
n5k-top(config-pmap-c-que)# class type queuing class-fcoe
n5k-top(config-pmap-c-que)# bandwidth percent 25
n5k-top(config-pmap-c-que)# class type queuing class-default
n5k-top(config-pmap-c-que)# bandwidth percent 25
n5k-top(config-pmap-c-que)# sh policy-map type queuing myQueuing-policy

Type queuing policy-maps
========================

policy-map type queuing myQueuing-policy
class type queuing myTraffic
bandwidth percent 50
class type queuing class-fcoe
bandwidth percent 25
class type queuing class-default
bandwidth percent 25

Boom! Just like that, we now have our three QoS policies created. One for type qos called myQos-policy. One for type network-qos called myNetwork-QoS-policy. And finally, the one we just created for type queuing called myQueuing-policy.

What’s left? Now we just need to apply this to the system:

n5k-top(config-sys-qos)# service-policy type qos input myQoS-policy
n5k-top(config-sys-qos)# service-policy type network-qos myNetwork-QoS-policy
n5k-top(config-sys-qos)# service-policy type queuing input myQueuing-policy
n5k-top(config-sys-qos)# service-policy type queuing output myQueuing-policy

Notice for the Queuing policy we applied it twice. Once to the input and once to the output. The QoS type is only applied to the input, because that’s where traffic comes in and is marked. The network-qos effects input and output but should be the same for all, so we only configure it once.

Well, hopefully that wasn’t too confusing.  You can obviously see the power in this.  If we had voice, video, and other types of traffic coming through here, we could add it to our policy-maps after we tag it.  With QoS, the settings should be the same on all switches in the datacenter.  If you are running a VPC, you’ll have to make sure that the other switch has the same set up, otherwise you’ll get a Type two error:


n5k-top(config-sys-qos)# sh vpc
Legend:
(*) - local vPC is down, forwarding via vPC peer-link

vPC domain id : 1
Peer status : peer adjacency formed ok
vPC keep-alive status : peer is alive
Configuration consistency status : success
Per-vlan consistency status : success
Type-2 consistency status : failed
Type-2 inconsistency reason : QoSMgr Network QoS configuration incompatible
vPC role : primary
Number of vPCs configured : 3
Peer Gateway : Disabled
Dual-active excluded VLANs : –
Graceful Consistency Check : Enabled
Auto-recovery status : Disabled

vPC Peer-link status
———————————————————————
id Port Status Active vlans
— —- —— ————————————————–
1 Po1 up 1,500

vPC status
—————————————————————————-
id Port Status Consistency Reason Active vlans
—— ———– —— ———– ————————– ———–
10 Po10 up success success 1,100,500
20 Po20 up success success 1,100,500
80 Po80 up success success 1,100,500

See? Nobody wants a type 2 error. Apply settings throughout the data center!

Hope this helps someone struggling with getting jumbo frames on a Nexus 5k.

Configure VMware from scratch without Windows

One of the things that bugs me about vCenter (still) is that it is still very tied to the Windows operating system.  You have to have Windows to set it up and trying to go about without Windows is still somewhat difficult.  In my lab I’m trying to get away from doing Windows.  I have xCAT installed to PXEboot UCS Blades to do what I want.  Its great, and its automated.  But when I installed 8 nodes to be ESXi hosts I quickly realized I needed vCenter to demonstrate this and use this as others would.

That requires vCenter.  VMware has had the vCenter appliance out for a few years now.  It runs on SLES and comes preconfigured.  The only problem is installing it when you have no vCenter client because today those clients are only made for the Windows Operating system.  How to get around this?

ovftool was the thing I found that did the job for me.  I found the link by reading the ever prolific Virtual Ghetto post on deploying ovftool on ESXi.  Since I had Linux, installing ovftool on the ESXi host wasn’t necessary for me.  Instead I just installed it on my Linux server (with some trouble since it deploys this stub and you have to make sure you don’t modify the file).

I ran the command:

ovftool -ds=NFS1 VMware-vCenter-Server-Appliance-5.0.5201-1476389_OVF10.ova vi://root:password@node01

After that, I watched my DHCP server and saw that it gave the vCenter appliance the IP address of 172.20.200.1.  Hopefully you have DHCP or you might be hosed.

Then after finding the docs, I intuitively opened my web browser to https://172.20.200.1:5480. (everyone knows that port number right?) I then logged in with user ‘root’ and password ‘vmware’ and started the auto setup.  After changing the IP address and restarting the appliance I was pretty golden.

Once configured, log into the appliance at https://172.20.1.101:9443/vsphere-client/ and then be stoked that you have flash player already installed and that it works.  Oh you didn’t have flash player installed on your linux server?  That sucks, I didn’t either.  Guess that’s another hoop we have to jump through. But wait, then you find that Flash 11.2.0 is the last Flash that has been released for Linux.  Guess what?  VMware requires Flash version 11.5.  Nice.

https://communities.vmware.com/message/2319263

At this point I just copied a Windows VM that I had laying around and started managing it from there.  The moral of the story is that you can’t do a Windows free VMware environment.  Sure, I could have done fancy scripting and managed it all remotely with some of their tools, but if I’m going to be doing all that, why should I pay for VMware?  I’d be better off just doing straight native KVM.  YMMV.

FCoE with UCS C-Series

I have in my lab a C210 that I want to turn into an FCoE target storage.  I’ll write more on that in another post.  The first challenge was to get it up with FCoE.  Its attached to a pair of Nexus 5548s.  I installed RedHat Linux 6.5 on the C210 and booted up.  The big issue I had was that even though RedHat Linux 6.5 comes with the fnic and enic drivers, the FCoE never happened.  It wasn’t until I installed the updated drivers from Cisco that I finally saw a flogi.  But there were other tricks that you had to do to make the C210 actually work with FCoE.

C210 CIMC

The first part to start is looking in the CIMC (with the machine powered on) and configure the vHBAs. From the GUI go to:

Server -> Inventory

Then on the work pane, the ‘Network Adapters’ tab, then down below select vHBAs.  Here you will see two vHBAs by default.  From here you have to set the VLAN that the vHBA will go over.  Clicking the ‘Properties’ on the interface you have to select the VLAN.  I set the MAC address to ‘AUTO’ based on a TAC case I looked at, but this never persisted.  From there I entered the VLAN.  VLAN 10 for the first interface and VLAN 20 for the second interface.  This VLAN 10 matches the FCoE VLAN and VSAN that I created on the Nexus 5548.  On the other Nexus I creed VLAN 20 to match FCoE VLAN 20 and VSAN 20.

This then seemed to require a reboot of the Linux Server for the VLANs to take effect.  In hindsight this is something I probably should have done first.

RedHat Linux 6.5

This needs to have the Cisco drivers for the fnic.  You might want to install the enic drivers as well.  I got these from cisco.com.  I used the B series drivers and it was a 1.2GB file that I had to download all to get a 656KB driver package.  I installed the kmod-fnic-1.6.0.6-1 RPM.  I had a customer who had updated to a later kernel and he had to install the kernel-devel rpm and recompile the driver.  After it came up, it worked for him.

With the C210 I wanted to bond the 10Gb NICs into a vPC.  So I did an LACP bond with Linux.  This was done as follows:

Created file: /etc/modprobe.d/bond.conf

alias bond0 bonding
options bonding mode=4 miimon=100 lacp_rate=1

Created file: /etc/sysconfig/network-scripts/ifcfg-bond0

DEVICE=bond0
IPADDR=172.20.1.1
ONBOOT=yes
NETMASK=255.255.0.0
STARTMODE=onboot
MTU=9000

Edited the /etc/sysconfig/network-scripts/ifcfg-eth2

DEVICE=eth2
MASTER=bond0
SLAVE=yes
HWADDR=58:8D:09:0F:14:BE
TYPE=Ethernet
UUID=8bde8c1f-926f-4960-87ff-c0973f5ef921
ONBOOT=yes
NM_CONTROLLED=no
BOOTPROTO=none

Edited the /etc/sysconfig/network-scripts/ifcfg-eth3

DEVICE=eth3
MASTER=bond0
SLAVE=yes
HWADDR=58:8D:09:0F:14:BF
TYPE=Ethernet
UUID=6e2e7493-c1a1-4164-9215-04f0584b338c
ONBOOT=yes
NM_CONTROLLED=no
BOOTPROTO=none

Next restart the network and you should have a bond. You may need to restart this after you configure the Nexus 5548 side.

service network restart

Nexus 5548 Top
Log in and create VPCs and stuff.  Also don’t forget to do the MTU 9000 system class.  I use this for jumbo frames in the data center.

policy-map type network-qos jumbo
class type network-qos class-default
mtu 9216
multicast-optimize
system qos
service-policy type network-qos jumbo

One thing that drives me crazy is that you can’t do sh int po 4 to see that the MTU is 9000. From the documents, you have to do

sh queuing int po 4

to see that your jumbo frames are enabled.

The C210 is attached to ethernet port 1 on each of the switches.  Here’s the Ethernet configuration:

The ethernet:

interface Ethernet1/1
switchport mode trunk
switchport trunk allowed vlan 1,10
spanning-tree port type edge trunk
channel-group 4

The port channel:

interface port-channel4
switchport mode trunk
switchport trunk allowed vlan 1,10
speed 10000
vpc 4

As you can see VLAN 10 is the VSAN. We need to create the VSAN info for that.

feature fcoe
vsan database
vsan 10
vlan 10
fcoe vsan 10

Finally, we need to create the vfc for the interface:

interface vfc1
bind interface Ethernet1/1
switchport description Connection to NFS server FCoE
no shutdown
vsan database
vsan 10 interface vfc1

Nexus 5548 Bottom
The other Nexus is similar configuration.  The difference is that instead of VSAN 10, VLAN 10, we use VSAN20, VLAN 20 and bind the FCoE to VSAN 20.  In the SAN world, we don’t cross the streams.  You’ll see that the VLANS are not the same in the two switches.

Notice that in the below configuration, VLAN 20 nor 10 is defined for through the peer link so you’ll only see VLAN 1 enabled on the vPC:

N5k-bottom# sh vpc consistency-parameters interface po 4

Legend:
Type 1 : vPC will be suspended in case of mismatch

Name Type Local Value Peer Value
————- —- ———————- ———————–
Shut Lan 1 No No
STP Port Type 1 Default Default
STP Port Guard 1 None None
STP MST Simulate PVST 1 Default Default
mode 1 on on
Speed 1 10 Gb/s 10 Gb/s
Duplex 1 full full
Port Mode 1 trunk trunk
Native Vlan 1 1 1
MTU 1 1500 1500
Admin port mode 1
lag-id 1
vPC card type 1 Empty Empty
Allowed VLANs – 1 1
Local suspended VLANs – – -

But on the individual nodes you’ll see that the VLAN is enabled in the VPC. VLAN 10 is carrying storage traffic.

# sh vpc 4

vPC status
—————————————————————————-
id Port Status Consistency Reason Active vlans
—— ———– —— ———– ————————– ———–
4 Po4 up success success 1,10

Success?

How do you know you succeeded?

N5k-bottom# sh flogi database
——————————————————————————–
INTERFACE VSAN FCID PORT NAME NODE NAME
——————————————————————————–
vfc1 10 0x2d0000 20:00:58:8d:09:0f:14:c1 10:00:58:8d:09:0f:14:c1

Total number of flogi = 1.

You’ll see the login. If not, then try restarting the interface on the Linux side. You should see a different WWPN in each Nexus. Another issue you might have is that the VLANS may be mismatched, so make sure you have the right node on the right server.

Let me know how it worked for you!

ACI Tech Specs

Cisco’s new Application Centric Infrastructure play and the introduction of the Nexus 9000 seems to have turned all this SDN talk in a new direction. Remember  Novell NetWare?  You see, before operating systems came with networking you would buy NetWare software so that your PCs could talk on the network.  But soon operating systems started including the network stack in the core operating systems.  So instead of buying Microsoft Windows 3.1 and NetWare, you just bought Windows 95 and you had all the networking built in.

That’s sort of what’s happened with the Nexus 9000.  Instead of buying a network switch and then buying an SDN component, you just buy the Nexus 9000 and it comes with SDN like capability and so much more!  I’m pretty happy with what I’ve seen with the Nexus 9000 and what it promises to deliver.  Its still a ways off.  The Nexus 9000s today run in “stand alone” mode, which means the whole SDN portion of it is not there.  However, its still a very cool platform and when it comes I’m hoping it will be very intuitive and simple to deploy complex networks.

But as cool as the Nexus 9000 series is, that’s not the point of this blog.  The point is to talk about a new iOS app that I have just submitted to Apple for evaluation.  Its called ACI Tech Specs.

Here’s a few things about this app.  Its a lot like UCS Tech Specs in terms of what it does.  But its written from the ground up for iOS 7.  Its the most complete software project I’ve ever done in my spare time and uses all the modern programming techniques and libraries.  In fact, this will be the bases of the next iteration of UCS Tech Specs (which I’ll try to get out by the end of January).  I’m hoping it will be even more useful than any of the other projects I’ve done.  Its more flexible, more responsive, and looks better, cleaner than any place else where you can get this information.

Why You’ll Love it

The size is tiny compared to UCS tech specs.  You’ll be able to download it over your mobile connection instead of requiring a wifi.  That was the trouble with UCS Tech Specs: The app was too big.  It was big because it had tons of pictures bundled in.  This one, no pictures are there.  Its on demand downloaded.

Another reason you’ll love it is that changes take place instantly.  No more updating your library by going to the obscure ‘i’ button and clicking ‘update library’.  The app checks for updates every time you open the page.  Its demand driven. This way if you email me telling me I forgot something, misspelled something, or that you’d really like to get more information on a certain part of the products, then I can go to my back end and do it, and it will just show up.

If you are offline and have gone to a page (e.g: you’re in airplane mode) then the data will all be there.  The data is stored on the phone (including pictures) but downloaded as the app progresses.

Why you might not like it

I’m trying to get better at figuring out how the app is being used.  As such, I am tracking you.  But not NSA style.  You see, what happens is when you install my app, it puts a random ID inside  your phone’s folder.  That random ID allows me to uniquely identify your device.  (Not you, or nothing else).  I’ll also be capturing what type of device you have.  Then I’ll be storing that.  I won’t be storing anything like names, passwords, etc.  I do this because I want to know how many unique devices are using the app and what types of devices they are.  If I find out more people are using iPads than iPhones, then I’ll redouble my efforts in the iPad.  (This first release will only do iPhone).  So hopefully that doesn’t make you too nervous.  If I do lose the data then all people will have are random strings and model types.  They’ll know nothing else about you.

You might not also like that an Internet connection will be required to get the app going.  The first time you open different pages, you’ll find that it will update.  If you’re on an airplane and you haven’t opened up a certain page, then that page will appear blank until you open it again with an internet connection.  (Cellularly or Wirelessly).  I made this choice because I wanted more people to download it without requiring wifi (so the image wasn’t so huge) and because I feel like I’m usually always connected anyway and most of my users fit my type of user profile.  (But maybe people will hate it and we’ll see.)  I did spend a lot of time working on caching the data and synchronizing seamlessly so any updates I make to the core server will show up on the app.

I’m also curious to see what type of scaling problems I run into.  I’ve got one server running a Ruby on Rails application on the back end that serves up the JSON that is consumed by the iOS app.   If the server goes down then nobody will get an update.  So if you open up the app and you are staring at a blank page, let me know and I’ll see what’s going on with my server.  I think I’m really going to have to scale this out and that may be the next huge task I tackle on this.  Its running on a friends server at xmission and I may need to migrate to AWS or something if I have issues with scaling.

What’s Next?

The backend still needs a lot more data.  I’ll be putting in more information about the platforms as they become available.  The nice thing is that in my role, its my job to be up to date on the latest Nexus 9000 products, so expect this app to be pretty much up to date along with the next UCS Tech Specs when it comes out.

Also, there will be a native Android application that I’ll release on Google Play hopefully mid 2014 at the latest.  This will be the first time I’ve written an Android application, so thanks for hanging in there.  That is part of the reason I spent so much time on the back end.  In fact the whole development process from creating the back end to the iOS client I spent 70% of the time on the back end.  The iOS client only took about 2 months to write where as the back end I spent all of summer and had to modify it as I wrote the app.

Finally, I would like to add a  live Twitter feed on it to see all ACI related posts, create some more interactiveness, but I think the rest of 2014 will be focused on Android, scaling, and seeing if this thing floats.  I hope you love it!  If not, let me know what sucks: vallard@benincosa.com  I’m all ears.

 

 

Changing UCS IP addresses

I have a UCS lab machine that I sometimes take to different locations for proof of concept work.  One of the things I regularly have to do is change the password and hostname.  Here’s how you do it on the command line:

KCTest-A# scope fabric-interconnect a
KCTest-A /fabric-interconnect # set out-of-band ip 10.1.1.23 netmask 255.255.255.0 gw 10.1.1.1
Warning: When committed, this change may disconnect the current CLI session
KCTest-A /fabric-interconnect* # scope fabric-interconnect b
KCTest-A /fabric-interconnect* # set out-of-band ip 10.1.1.24 netmask 255.255.255.0 gw 10.1.1.1
Warning: When committed, this change may disconnect the current CLI session
KCTest-A /fabric-interconnect* # scope system
KCTest-A /system* # set virtual-ip 10.1.1.25
KCTest-A /system* # set name ccielab
KCTest-A /system* # commit-buffer

 

It’s great because you can change all the IP addresses on each server, the virtual server, and the hostname in one shot.

Source of docs

 

1000v in and out of vCenter

I was setting up the Nexus 1110 (aka: virtual service appliance, aka: VSA) with one of our best customers and as we were doing it the appliance rebooted never to come up again without completely reinstalling the firmware from the remote media.  Most of this was probably my fault because I didn’t follow the docs exactly, and I think we can now move forward, but it made me realize I hadn’t written down an important way to reconnect to an orphaned 1000v from a new virtual supervisor module (VSM).
Here’s the situation:  When you lose the 1000v that is connecting into vCenter, there is no way to remove the virtual distributed switch (VDS or DVS) that the 1000v presented to vCenter.  You can remove hosts from the DVS but you can’t get rid of that switch.
In the above picture, there is my DVS.  If I try to remove it, I get the following error:
In my case, I didn’t want to get rid of it, I just wanted to reconnect a new VSM that I created with the same name.  But this operation can be used to remove the 1000v DVS from vCenter as well.
So here’s how you do it:
Adopt an  Orphaned Nexus 1000v DVS
Install a VSM.  I usually do mine manually, so that it doesn’t try to register with vCenter or one of the hosts.  Don’t do any configuration, other than an IP address.  Just get it so that you can log in.  Once you can log in, if you did create an SVS connection you’ll need to disconnect.  In mine, I made an svs connection and called it venter.  To disconnect from vCenter and erase the svs connection run:
# config
# svs connection vcenter
# no connect
# exit
# no svs connection venter
Trivia: What does SVS stand for?  “Service Virtual Switch
Step 2.  Change the hostname to match what is in vCenter
Looking at the error picture above, you can see there is a folder named nexus1000v with a DVS named nexus1000v.  To make vCenter think that this new 1000v is the same one, we need to change the name to match what is in vCenter
nexus1000v-a(config)# conf
nexus1000v-a(config)# hostname nexus1000v
nexus1000v(config)#
Step 3.  Build SVS Connection
Since we destroyed (or never built) the SVS connection in step 1, we’ll need to build one and try to connect.  The SVS connection should have the same name as the one you created when you first made you SVS.  So if you called your SVS ‘vCenter’, or ‘VCENTER’, or ‘VMware’ then you’ll need to name it the same thing.  I named mine ‘vcenter’ so that’s what I use.  Similarly, you’ll have to create the datacenter-name the same as what you had before.
nexus1000v(config)# svs connection vcenter
nexus1000v(config-svs-conn)# remote ip address 10.93.234.91 port 80
nexus1000v(config-svs-conn)# vmware dvs datacenter-name Lucky Lab
nexus1000v(config-svs-conn)# protocol vmware-vim
nexus1000v(config-svs-conn)# max-ports 8192
nexus1000v(config-svs-conn)# admin user n1kUser
nexus1000v(config-svs-conn)# connect
ERROR:  [VMware vCenter Server 5.0.0 build-455964] Cannot create a VDS of extension key Cisco_Nexus_1000V_1169242977 that is different than that of the login user session Cisco_Nexus_1000V_125266846. The extension key of the vSphere Distributed Switch (dvsExtensionKey) is not the same as the login session’s extension key (sessionExtensionKey)..
Notice that when I tried to connect I got an error.  This is because the extension key in my Nexus 1000v (that was created when it was installed) doesn’t match what the old one is.  The nice thing, is I can actually change that, and that is how I make this new 1000v take over the other one.

Step 4.  Change the extension key to match what is in vCenter.
To see what the current extension-key is (or the offending key is) run the following command:
nexus1000v(config-svs-conn)# show vmware vc extension-key
Extension ID: Cisco_Nexus_1000V_125266846
That is the one we need to change.  You can see the extension-key that vCenter wants from the error message we saw in the previous step.  In the previous step it showed that the extension key we wanted was ‘Cisco_Nexus_1000V_1169242977′.  So we need to make our extension-key on the 1000v match that.  No problem:
nexus1000v(config-svs-conn)# no connect
nexus1000v(config-svs-conn)# exit
nexus1000v(config)# no svs connection vcenter
nexus1000v(config)# vmware vc extension-key Cisco_Nexus_1000V_1169242977

Now we should be able to connect and run things as before.

Step 5. (Optional) Remove the 1000v

If you’re just trying to remove the 1000v because you had that orphaned one sitting around, we simply disconnect now from vCenter

nexus1000v(config)# svs connection vcenter
nexus1000v(config-svs-conn)# no connect
nexus1000v(config-svs-conn)# connect
nexus1000v(config-svs-conn)# no vmware dvs
This will remove the DVS from the vCenter Server and any associated port-groups. Do you really want to proceed(yes/no)? [yes] yes

Now, the orphaned Nexus 1000v is gone. If you want to remove it from your vCenter plugins then you will have to navigate the managed object browser and remove the extension key. Not a big deal. By opening a web browser to the host that manages vCenter (e.g.: http://10.93.234.91 ) then you can “Browse objects managed by vSphere”. From there go to “content” then “Extension Manager”. To unregister the 1000v plugin, select “UnregisterExtension” and enter in the vCenter Extension key. This will be the same extension key that you used in step 4. (In our example: Cisco_Nexus_1000V_1169242977 )

Hope that helps!

Cloud Computing: How Do I Get There?

This post comes from a talk that I’ll be presenting on at the Pacific Northwest Digital Government Summit Conference on October 2nd, 2013.

History shows us that those that embrace technology and change survive while those that resist and stick with “business as usual” get left behind.  If we have the technology and we don’t use it to make IT look like magic, then we’re probably doing it wrong. (Read “The Innovator’s Dilemma” and Clarke’s Three Law.)

I’ll be talking mainly about private cloud today, but many of these ideas can be taken into the public cloud as well.

Optimizing ROI on your Technology

My friend tells a story about when his wife first started using an iPhone.  To get directions on a map she’d open up Safari and go to http://maps.google.com.  To check Facebook she would open Safari and go to http://facebook.com.  To check her mail she’d open up Safari again and navigate to http://gmail.com.  You get the idea.

She was still getting great use of her iPhone.  She could now do things she could never do before.  But there was a big part she was missing out on.  She wasn’t using the App ecosystem that makes all of these things easier and delivers a richer experience.

Today, most organizations have virtualization in the data center.  Because of this IT is able to do things they’ve never been able to do before.  They’re shrinking their server footprints to once unimaginable levels saving money in capital and management costs.  I’ve been in many data centers  where people proudly point to where rows of racks have been consolidated to one UCS domain with only a few blades.  Its pretty cool and very impressive.

But they’re missing something as big as the App Store.  They’re missing out on the APIs.  This is where ROI is not being optimized in the data center in a big way.

IT is shifting (or has shifted) to a DevOps model. DevOps means that your IT infrastructure team is more tightly aligned with your developers/application people.  This is a management perspective.  But from a trenches perspective, the operations team is now turning into programmers.  Programmers of the data center.  The guy that manages the virtual environment, the guy who adds VLANs to switches, or the guy who creates another storage LUN: they’re all being told to automate and program what they do.

The group now treats the IT infrastructure like an application that is constantly adding features and doing bug fixes.

The programming of the IT infrastructure isn’t done in compiled languages like Java, C, or C++.  Its done in interpreted languages like Python, Ruby, Bash,  Powershell, etc.  But the languages alone don’t get you there.  You need a framework.  This is where things like Puppet or Chef come into play.  In fact, you even can look at it like you’re programming a data center operating system.  This is where OpenStack provides you a framework to develop your data center operating system.  Its analogous to the Web Application development world.  Twitter was originally developed in Ruby using a framework called Ruby on Rails.  (Twitter has since moved off Ruby on Rails).

Making this shift gives you unprecedented speed, agility, and standardization.  Those that don’t do it, will find their constituents looking elsewhere for IT services that can be delivered faster and cheaper.

The IT assembly line

Its hard for people to think of their IT professionals as assembly line workers.  After all, they are doing complex things like installing servers, configuring networks, and updating firmware.  These are CCIEs, VCPs, and Storage Gurus.  But that’s actually what people in the trenches are:  Workers of the virtual Assembly line.  IT managers should look at the way work enters the assembly line, understand the bottlenecks, and track how long it takes to get things through the line.  Naturally, there are exceptions that crop up.  But for the most part, the work required to deliver applications to the business are repetitive tasks.  They’re just complicated, multi-step, repetitive tasks.

To start with, we need to look at the common requests that come in:  Creating new servers, deploying new applications, delivering a new test environment.  Whatever it is, management really needs to understand how it gets done, and look at it like the manufacturing foreman sitting above the plant, looking down and watching a physical product make its way through.  Observe which processes are in place, where they are being side stepped, or where they don’t exist at all.

As an example, consider all the steps required to deploy a server.  It may look something like the flowchart below:

That sure looks like an assembly line to me.  If you can view work that enters the infrastructure like an assembly line, you can start measuring how long it takes for certain activities to get done.  Then you can figure out ways to optimize.

Standardization of the Infrastructure

Manufacturing lines optimize throughput by standardizing processes and equipment.  When I hear VMware tell everybody that “the hardware doesn’t matter”, I take exception.  It matters.  A lot.  Just like your virtualization software matters.  Cisco and other hardware venders come from it the opposite direction and say “the hypervisor doesn’t matter, we’ll support them all”.  What all parties are really telling you is that they want you to standardize on them.  All parties are trying to prove their value in a private cloud situation.

What an organization will standardize on depends on a lot of things: Budget, skill set of Admins, Relationship with vendors and consultants, etc.  In short, when considering the holy trinity of the data center: Servers, Storage, & Networking it usually gets into a religious discussion.

But whatever you do, the infrastructure needs to be robust.  This is why the emergence of Converged Infrastructures like Vblocks, FlexPods, and other reference architectures have become popular.  The  “One-Piece-At-A-Time” accidental/cobbled architecture is not a good play.

Consider the analogy that a virtualized workload is cargo on a Semi Truck.  Do you want that truck running over a 6 lane solid government highway like I-5 or do you want that stuff traveling at 60mph down a rinky bridge?

This?

Or This?

Similarly, if your virtualization team doesn’t have strong Linux skills, you probably don’t want them running OpenStack on KVM.  That’s why VMware and Hyper-V are so popular.  Its a lot easier for most people’s skill level.

What to Standardize On?

While the choice of infrastructure standardization is a religious one, there are role models we can look to when deciding.  Start out by looking at the big boys, or the people you aspire to be when you grow up.  Who are the big boys that are running a world class IT as a service infrastructure?  AWS, RackSpace, Yahoo, Google, Microsoft, Facebook, right?

What are they standardizing on?  Chances are its not what your organization is doing.  Instead of VMware, Cisco, IBM, HP, Dell, EMC, NetApp, etc, they’re using open source, building their own servers, and using their own distributed filesystems.  They do this because they have a large investment in their DevOps team that is able to put these things together.

A State organization that has already standardized on a FlexPod or Vblock with VMware is not going to throw away what they’ve done and start over just so they can match what the big boys do.  However, as they move forward, perhaps they can make future decisions based on emulating these guys.

Standardize Processes

The missing part is standardizing the processes once the infrastrucutre is in place.  Standardization is tedious because it involves looking at every detail of how things are done.  One of my customers has a repository of documentation they use every time they need to do something to their infrastructure.  For example, 2 weeks ago we added new blade servers to the UCS.  He pulled out the document and we walked through it.  There were still things we modified in the documentation, but for the most part the steps were exact.

Unfortunately, this was only one part of the process.  The Networking team had their own way of keeping notes (or not at all) on how to do things.  So the processes were documented in separate places.  What the IT manager needs to do is make sure they understand how the processes (or work centers) are put together and how long each one takes.

The manager should be able to have their own master process plan to be able to track work through the system.  (The system being the different individuals doing the work).  This is what is meant by “work flow”.  Even if they just do this by hand or as is commonly done with a Gantt chart, there should be some understanding.

Each job that comes in, should get its own workflow, or Gantt Chart, and entered into something like a Kanban board.  Once you understand this for the common requests, you can see how many one offs there are.

Whether these requests are for public cloud or private cloud, there is still a workflow.  It is an iterative process that may not be complete the first few times it is done, but over time will become better.  There is a great book called “The Phoenix Project” that talks about how the IT staff starts to standardize and work together between development and operations to get their processes better.  These ideas are based off an earlier business classic called “The Goal”

Automate the Processes

Once the processes are known we turn our assembly line into programmers of the processes.  I used to worked as a consulting engineer to help deploy High Performance Computing clusters.  On several occasions the RFPs required that the cluster be able to be deployed from scratch in less than 1 hour.  From bare metal, to running jobs.  We created scripts that would go through and deploy the OS, customize the user libraries, and even set up a job queuing system.  It was pretty amazing to see 1,200 bare metal rack mount servers do that.  When we would leave, if the customer had problems with a server then they could replace it, plug it in, and walk away.  The system would self provision.

While that was a complicated process and still is, it is still simpler than what virtualization has done to the management of the data center.  We never had to mess with the network once it was set up.  Workflows for a new development environment are pretty common and require provisioning several VMs with private networks and their own storage.  However, the same method of scripting the infrastructure can still be applied.  It just needs to be orchestrated.

Automate and Orchestrate with a Framework

Back when we did HPC systems, we used an open source management tool called xCAT.  That was the framework by which we managed the datacenter.  The tool had capabilities but really what it gave us was a framework to insert our customizations or our processes that were specific for each site.  The tool was an enabler of the solution, not the solution itself.

Today there are lots of “enterprise” private cloud management tools.  In fact, any company that wants to sell a “Private Cloud”  will have its own tool.  VMware vCloud Director, HP Cloud System, IBM Cloudburst, Cisco UCS Director, etc.  All of these products, regardless of how they are sold should be regarded as frameworks for automating your processes.

At a recent VMUG, the presenter asked “How many people are using vCloud Director or any other cloud orchestration tool?”  Nobody raised their hand.  Based on what I’ve seen its because most organizations haven’t yet standardized their IT processes.  There is no need for orchestration if you don’t know what you’re orchestrating.

Usually each framework will come with a part or all of what Cisco calls the “10 domains of cloud” which may include: A self service portal, chargeback/showback, service catalog, security, etc.  If you are using a public cloud, you are using their framework.

Once you select one, you’ll need to get the operations teams (network, storage, compute, virtualization) to sign off and use the tool.  Its not just a server thing.  Each part of the assembly line needs to use it.

Once the individual components are entered into the framework, then the orchestration comes to play.  To start with, codify the most common workloads:  Creating VLAN, Carving out a LUN, Provisioning a VM, etc.

To orchestrate means to arrange or control the elements of, as to achieve a desired overall effect.  With the Framework, we are looking to automate all of the components to deliver a self service model to our end customer.

Self Service and Chargeback

Once we have the processes codified in the framework, we can now present a catalog to our users.  With a self service portal we recommend it not being completely automated to start out with.  With some frameworks, as a workload moves through the automated assembly line, it can send an email to the correct IT department to validate whether a workflow can move through.  So for example, if the user as part of the workflow wants a new VLAN for their VM environment, the networking administrator will receive an email and will be able to approve or deny.  This way, the workflow is monitored, the end requester knows where they are in the queue, and  once it is approved, it gets created automatically, then gets passed along to the next item in the assembly line.

For chargeback, the recommendation is to keep the menu small, and the price simple.

Security all throughout then Monitor, Rinse, and Repeat

More workflows will come into the system and the catalog will need to continuously need updating and revisions.  This is the programmable data center.  Iterations should be checked into a code repository similarly to how application developers use systems like github.com to store code updates.  You will have to do bug fixes and patch up any exposed holes.  With virtualization comes the ability to integrate more software security services like the ASA 1000v, or the VSG.

Action Items

  • Realize that your IT infrastructure is a collection of APIs waiting to be harnessed and programmed.  Challenge the people you work with to learn to use those APIs to automate their respective areas of expertise.
  • Optimize the assembly line by understanding the workflows.  Any manufacturing manager can tell you the throughput of the system.  An IT manager should be able to tell you the same thing about their system.  Start by understanding the individual components, how long it takes, and where the bottlenecks in the system are.
  • Standardize your infrastructure with a solid architecture.  Converged architectures are popular for a reason.  Don’t reinvent the wheel.
  • Standardizing processes is the hardest part.  Start with the most common.  These are usually documented.  Take the documentation and think how you would change it into code.
  • Program the DataCenter using a Framework.  Most of the work will have to be done in house or with service contracts.  The framework could be something like a vendors cloud software or something free like OpenStack.