Cloud Computing: How Do I Get There?

This post comes from a talk that I’ll be presenting on at the Pacific Northwest Digital Government Summit Conference on October 2nd, 2013.

History shows us that those that embrace technology and change survive while those that resist and stick with “business as usual” get left behind. If we have the technology and we don’t use it to make IT look like magic, then we’re probably doing it wrong. (Read “The Innovator’s Dilemma” and Clarke’s Three Law.)

I’ll be talking mainly about private cloud today, but many of these ideas can be taken into the public cloud as well.

Optimizing ROI on your Technology

My friend tells a story about when his wife first started using an iPhone. To get directions on a map she’d open up Safari and go to http://maps.google.com. To check Facebook she would open Safari and go to http://facebook.com. To check her mail she’d open up Safari again and navigate to http://gmail.com. You get the idea.

She was still getting great use of her iPhone. She could now do things she could never do before. But there was a big part she was missing out on. She wasn’t using the App ecosystem that makes all of these things easier and delivers a richer experience.

Today, most organizations have virtualization in the data center. Because of this IT is able to do things they’ve never been able to do before. They’re shrinking their server footprints to once unimaginable levels saving money in capital and management costs. I’ve been in many data centers where people proudly point to where rows of racks have been consolidated to one UCS domain with only a few blades. Its pretty cool and very impressive.

But they’re missing something as big as the App Store. They’re missing out on the APIs. This is where ROI is not being optimized in the data center in a big way.

IT is shifting (or has shifted) to a DevOps model. DevOps means that your IT infrastructure team is more tightly aligned with your developers/application people. This is a management perspective. But from a trenches perspective, the operations team is now turning into programmers. Programmers of the data center. The guy that manages the virtual environment, the guy who adds VLANs to switches, or the guy who creates another storage LUN: they’re all being told to automate and program what they do.

The group now treats the IT infrastructure like an application that is constantly adding features and doing bug fixes.

The programming of the IT infrastructure isn’t done in compiled languages like Java, C, or C++. Its done in interpreted languages like Python, Ruby, Bash, Powershell, etc. But the languages alone don’t get you there. You need a framework. This is where things like Puppet or Chef come into play. In fact, you even can look at it like you’re programming a data center operating system. This is where OpenStack provides you a framework to develop your data center operating system. Its analogous to the Web Application development world. Twitter was originally developed in Ruby using a framework called Ruby on Rails. (Twitter has since moved off Ruby on Rails).

Making this shift gives you unprecedented speed, agility, and standardization. Those that don’t do it, will find their constituents looking elsewhere for IT services that can be delivered faster and cheaper.

The IT assembly line

Its hard for people to think of their IT professionals as assembly line workers. After all, they are doing complex things like installing servers, configuring networks, and updating firmware. These are CCIEs, VCPs, and Storage Gurus. But that’s actually what people in the trenches are: Workers of the virtual Assembly line. IT managers should look at the way work enters the assembly line, understand the bottlenecks, and track how long it takes to get things through the line. Naturally, there are exceptions that crop up. But for the most part, the work required to deliver applications to the business are repetitive tasks. They’re just complicated, multi-step, repetitive tasks.

To start with, we need to look at the common requests that come in: Creating new servers, deploying new applications, delivering a new test environment. Whatever it is, management really needs to understand how it gets done, and look at it like the manufacturing foreman sitting above the plant, looking down and watching a physical product make its way through. Observe which processes are in place, where they are being side stepped, or where they don’t exist at all.

As an example, consider all the steps required to deploy a server. It may look something like the flowchart below:

That sure looks like an assembly line to me. If you can view work that enters the infrastructure like an assembly line, you can start measuring how long it takes for certain activities to get done. Then you can figure out ways to optimize.

Standardization of the Infrastructure

Manufacturing lines optimize throughput by standardizing processes and equipment. When I hear VMware tell everybody that “the hardware doesn’t matter”, I take exception. It matters. A lot. Just like your virtualization software matters. Cisco and other hardware venders come from it the opposite direction and say “the hypervisor doesn’t matter, we’ll support them all”. What all parties are really telling you is that they want you to standardize on them. All parties are trying to prove their value in a private cloud situation.

What an organization will standardize on depends on a lot of things: Budget, skill set of Admins, Relationship with vendors and consultants, etc. In short, when considering the holy trinity of the data center: Servers, Storage, & Networking it usually gets into a religious discussion.

But whatever you do, the infrastructure needs to be robust. This is why the emergence of Converged Infrastructures like Vblocks, FlexPods, and other reference architectures have become popular. The “One-Piece-At-A-Time” accidental/cobbled architecture is not a good play.

Consider the analogy that a virtualized workload is cargo on a Semi Truck. Do you want that truck running over a 6 lane solid government highway like I-5 or do you want that stuff traveling at 60mph down a rinky bridge?

This?

Or This?

Similarly, if your virtualization team doesn’t have strong Linux skills, you probably don’t want them running OpenStack on KVM. That’s why VMware and Hyper-V are so popular. Its a lot easier for most people’s skill level.

What to Standardize On?

While the choice of infrastructure standardization is a religious one, there are role models we can look to when deciding. Start out by looking at the big boys, or the people you aspire to be when you grow up. Who are the big boys that are running a world class IT as a service infrastructure? AWS, RackSpace, Yahoo, Google, Microsoft, Facebook, right?

What are they standardizing on? Chances are its not what your organization is doing. Instead of VMware, Cisco, IBM, HP, Dell, EMC, NetApp, etc, they’re using open source, building their own servers, and using their own distributed filesystems. They do this because they have a large investment in their DevOps team that is able to put these things together.

A State organization that has already standardized on a FlexPod or Vblock with VMware is not going to throw away what they’ve done and start over just so they can match what the big boys do. However, as they move forward, perhaps they can make future decisions based on emulating these guys.

Standardize Processes

The missing part is standardizing the processes once the infrastrucutre is in place. Standardization is tedious because it involves looking at every detail of how things are done. One of my customers has a repository of documentation they use every time they need to do something to their infrastructure. For example, 2 weeks ago we added new blade servers to the UCS. He pulled out the document and we walked through it. There were still things we modified in the documentation, but for the most part the steps were exact.

Unfortunately, this was only one part of the process. The Networking team had their own way of keeping notes (or not at all) on how to do things. So the processes were documented in separate places. What the IT manager needs to do is make sure they understand how the processes (or work centers) are put together and how long each one takes.

The manager should be able to have their own master process plan to be able to track work through the system. (The system being the different individuals doing the work). This is what is meant by “work flow”. Even if they just do this by hand or as is commonly done with a Gantt chart, there should be some understanding.

Each job that comes in, should get its own workflow, or Gantt Chart, and entered into something like a Kanban board. Once you understand this for the common requests, you can see how many one offs there are.

Whether these requests are for public cloud or private cloud, there is still a workflow. It is an iterative process that may not be complete the first few times it is done, but over time will become better. There is a great book called “The Phoenix Project” that talks about how the IT staff starts to standardize and work together between development and operations to get their processes better. These ideas are based off an earlier business classic called “The Goal”

Automate the Processes

Once the processes are known we turn our assembly line into programmers of the processes. I used to worked as a consulting engineer to help deploy High Performance Computing clusters. On several occasions the RFPs required that the cluster be able to be deployed from scratch in less than 1 hour. From bare metal, to running jobs. We created scripts that would go through and deploy the OS, customize the user libraries, and even set up a job queuing system. It was pretty amazing to see 1,200 bare metal rack mount servers do that. When we would leave, if the customer had problems with a server then they could replace it, plug it in, and walk away. The system would self provision.

While that was a complicated process and still is, it is still simpler than what virtualization has done to the management of the data center. We never had to mess with the network once it was set up. Workflows for a new development environment are pretty common and require provisioning several VMs with private networks and their own storage. However, the same method of scripting the infrastructure can still be applied. It just needs to be orchestrated.

Automate and Orchestrate with a Framework

Back when we did HPC systems, we used an open source management tool called xCAT. That was the framework by which we managed the datacenter. The tool had capabilities but really what it gave us was a framework to insert our customizations or our processes that were specific for each site. The tool was an enabler of the solution, not the solution itself.

Today there are lots of “enterprise” private cloud management tools. In fact, any company that wants to sell a “Private Cloud” will have its own tool. VMware vCloud Director, HP Cloud System, IBM Cloudburst, Cisco UCS Director, etc. All of these products, regardless of how they are sold should be regarded as frameworks for automating your processes.

At a recent VMUG, the presenter asked “How many people are using vCloud Director or any other cloud orchestration tool?” Nobody raised their hand. Based on what I’ve seen its because most organizations haven’t yet standardized their IT processes. There is no need for orchestration if you don’t know what you’re orchestrating.

Usually each framework will come with a part or all of what Cisco calls the “10 domains of cloud” which may include: A self service portal, chargeback/showback, service catalog, security, etc. If you are using a public cloud, you are using their framework.

Once you select one, you’ll need to get the operations teams (network, storage, compute, virtualization) to sign off and use the tool. Its not just a server thing. Each part of the assembly line needs to use it.

Once the individual components are entered into the framework, then the orchestration comes to play. To start with, codify the most common workloads: Creating VLAN, Carving out a LUN, Provisioning a VM, etc.

To orchestrate means to arrange or control the elements of, as to achieve a desired overall effect. With the Framework, we are looking to automate all of the components to deliver a self service model to our end customer.

Self Service and Chargeback

Once we have the processes codified in the framework, we can now present a catalog to our users. With a self service portal we recommend it not being completely automated to start out with. With some frameworks, as a workload moves through the automated assembly line, it can send an email to the correct IT department to validate whether a workflow can move through. So for example, if the user as part of the workflow wants a new VLAN for their VM environment, the networking administrator will receive an email and will be able to approve or deny. This way, the workflow is monitored, the end requester knows where they are in the queue, and once it is approved, it gets created automatically, then gets passed along to the next item in the assembly line.

For chargeback, the recommendation is to keep the menu small, and the price simple.

Security all throughout then Monitor, Rinse, and Repeat

More workflows will come into the system and the catalog will need to continuously need updating and revisions. This is the programmable data center. Iterations should be checked into a code repository similarly to how application developers use systems like github.com to store code updates. You will have to do bug fixes and patch up any exposed holes. With virtualization comes the ability to integrate more software security services like the ASA 1000v, or the VSG.

Action Items

Realize that your IT infrastructure is a collection of APIs waiting to be harnessed and programmed. Challenge the people you work with to learn to use those APIs to automate their respective areas of expertise.
Optimize the assembly line by understanding the workflows. Any manufacturing manager can tell you the throughput of the system. An IT manager should be able to tell you the same thing about their system. Start by understanding the individual components, how long it takes, and where the bottlenecks in the system are.
Standardize your infrastructure with a solid architecture. Converged architectures are popular for a reason. Don’t reinvent the wheel.
Standardizing processes is the hardest part. Start with the most common. These are usually documented. Take the documentation and think how you would change it into code.
Program the DataCenter using a Framework. Most of the work will have to be done in house or with service contracts. The framework could be something like a vendors cloud software or something free like OpenStack.