{"id":622,"date":"2012-09-04T18:49:45","date_gmt":"2012-09-05T00:49:45","guid":{"rendered":"http:\/\/benincosa.com\/blog\/?p=622"},"modified":"2014-11-19T11:24:33","modified_gmt":"2014-11-19T17:24:33","slug":"ucs-monitoring-part-1-collecting-and-analyzing-ucsm-data","status":"publish","type":"post","link":"https:\/\/benincosa.com\/?p=622","title":{"rendered":"UCS Monitoring Part 1: Collecting and Analyzing UCSM Data"},"content":{"rendered":"<p>Whenever we discuss monitoring systems, we usually need to start by understanding what we mean by monitoring. Usually its two related definitions: \u00a0Monitoring on one hand means looking at data, gaining visibility into what is happening on the system and being able to analyze it. \u00a0Monitoring also means alerting: \u00a0Let me know when something happens. \u00a0You may then respond to the event in some way.<\/p>\n<p>UCS can do both definitions of monitoring. \u00a0And since monitoring has two parts, this blog will have two parts. \u00a0In this part (part 1) we&#8217;ll examine how to look at UCS and understand what is happening in the system. \u00a0The next post (<a href=\"http:\/\/benincosa.com\/blog\/?p=626\">part 2<\/a>) will talk about how to be alerted.<\/p>\n<p>Lets examine the data by answering one of the most common questions we run across with UCS: \u00a0How many connections do you need from the Fabric Extenders (aka: FEX aka IO Module aka 2104\/2204\/2208) to the Fabric Interconnects. \u00a0Mostly what I see is from 2 to 4 connections per FEX to Fabric Interconnect. \u00a0But it would be great if you could determine how much bandwidth is actually being used to scientifically decide whether you need more or less cables. \u00a0And it turns out you can free of charge with UCS Manager. \u00a0Since we are trying to answer this question, we&#8217;ll be focusing on monitoring the network in UCS. \u00a0Keep in mind, however, that you can also monitor the power consumption, temperature, and error statistics of many of the other components.<\/p>\n<p>To answer this question it takes a little math and a little bit of poking around to figure it out. \u00a0Steve McQuerry presented at Cisco Live session ID BRKCOM-2004 in San Diego earlier this year. \u00a0My blog is based off some of his slides which you can get at <a href=\"http:\/\/ciscolive365.com\">http:\/\/ciscolive365.com<\/a> (free login required), but my math is daringly original, so please let me know if I&#8217;ve made errors.<\/p>\n<p>Let&#8217;s first look and see how UCS collects data. \u00a0On UCS manager navigate to Admin, then filter by Stats Management. \u00a0From here you will see the collection policies.\u00a0 By default each collection policy has a collection interval of 1 minute and a reporting interval of 15 minutes.<\/p>\n<p><a href=\"http:\/\/benincosa.com\/blog\/wp-content\/uploads\/2012\/09\/ucs-mon1.jpg\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-623\" title=\"ucs-mon1\" src=\"http:\/\/benincosa.com\/blog\/wp-content\/uploads\/2012\/09\/ucs-mon1.jpg\" alt=\"\" width=\"500\" \/><\/a><\/p>\n<p>So what does that actually mean?<\/p>\n<p><strong>Collection Interval<\/strong>: \u00a0How often the data will be collected. \u00a0We are encouraged to change the collection interval to 30 seconds to get more granulated data. \u00a0This means that every 30 seconds, the device will be queried by the UCSM subprocess responsible for gathering statistics from the underlying NXOS.<\/p>\n<p><strong>Reporting Interval: <\/strong>How often data will be stored to the UCS Manager. \u00a0While we set the collection interval to 30 seconds, the reporting interval is how often it is stored in UCS Manager. \u00a0So we might take our first interval at 9:11AM then the next would be at 9:26, and then every 15 minutes after that. \u00a0UCS can only hold up to 5 of these records. \u00a0That alone should tell you that UCS is not good for long term trend analysis. \u00a0It is recommended that another monitoring solution be used for greater detail.<\/p>\n<p>Cisco recommends that you <strong><em>change the collection interval to 30<\/em><\/strong> seconds for the things you&#8217;re interested in. \u00a0The reporting interval doesn&#8217;t really matter for what we&#8217;re doing here.<\/p>\n<p><a href=\"http:\/\/benincosa.com\/blog\/wp-content\/uploads\/2012\/09\/ucs-mon2.png\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-623\" title=\"ucs-mon2\" src=\"http:\/\/benincosa.com\/blog\/wp-content\/uploads\/2012\/09\/ucs-mon2.png\" alt=\"\" width=\"500\" \/><\/a><\/p>\n<h2>Examining FEX bandwidth<\/h2>\n<p>I have a first generation IOM so the traffic is not trunked from blade to Fabric Interconnect. \u00a0It follows a defined path based on the number of uplinks. \u00a0(see this great post:\u00a0<a href=\"http:\/\/jeremywaldrop.wordpress.com\/2010\/06\/30\/cisco-ucs-ethernet-frame-flows\/\">http:\/\/jeremywaldrop.wordpress.com\/2010\/06\/30\/cisco-ucs-ethernet-frame-flows\/<\/a> for information on how its connected internally)<\/p>\n<p>I have 2 chassis, each connected with 2 ports. \u00a0Ports 1 &amp; 2 connect to chassis 2 and Ports 3 &amp; 4 connect to chassis 1. \u00a0(Yes, this is not good form, but hey, I inherited this lab so that&#8217;s just the way it is and I haven&#8217;t bothered to fix it). \u00a0To see how your chassis are connected to the Fabric Interconnect, click on the Equipment tab, select the Chassis, and then select Hybrid display from the work pane<br \/>\n<a href=\"http:\/\/benincosa.com\/blog\/wp-content\/uploads\/2012\/09\/ucs-mon3.png\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-623\" title=\"ucs-mon3\" src=\"http:\/\/benincosa.com\/blog\/wp-content\/uploads\/2012\/09\/ucs-mon3.png\" alt=\"\" width=\"500\" \/><\/a><\/p>\n<p>That should tell you how the connections are made from FEX to Fabric Interconnect.<\/p>\n<p>Now let&#8217;s now look at one of the FEX uplinks. \u00a0Navigate to the Equipment tab, filter by Fabric Interconnects and look at the server ports that are connected to Fabric Interconnect A:<\/p>\n<p><a href=\"http:\/\/benincosa.com\/blog\/wp-content\/uploads\/2012\/09\/ucs-mon4.png\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-623\" title=\"ucs-mon4\" src=\"http:\/\/benincosa.com\/blog\/wp-content\/uploads\/2012\/09\/ucs-mon4.png\" alt=\"\" width=\"500\" \/><\/a><\/p>\n<p>Select the first port and lets look at the statistics tab in the work pane:<\/p>\n<p><a href=\"http:\/\/benincosa.com\/blog\/wp-content\/uploads\/2012\/09\/ucs-mon5.png\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-623\" title=\"ucs-mon5\" src=\"http:\/\/benincosa.com\/blog\/wp-content\/uploads\/2012\/09\/ucs-mon5.png\" alt=\"\" width=\"500\" \/><\/a><\/p>\n<p>To measure bandwidth, we are interested in the delta of total bytes received (Rx) and transmitted (Tx) on each of the FEX uplinks.\u00a0This particular uplink for Received and Transmitted Total Bytes shows\u00a0837,101 \u00a0and 691,921 bytes respectively.<\/p>\n<p>We typically measure I\/O in Gbps, Mbps, or Kbps. \u00a0So we need to translate these numbers. \u00a0This is where the math comes in. \u00a0First, remember, that our collection interval is 30 seconds. \u00a0That means, that number reported is <em>x bytes in 30 seconds<\/em>. \u00a0To get bytes per second, just divide that number by 30. \u00a0From there, do the type of multiplication you may have learned in your physics class when converting between different forms of measurements. \u00a0Here&#8217;s the formula for Gbps and Mbps:<\/p>\n<p><strong>Bytes to Gbps from 30 second interval collection period<\/strong><\/p>\n<p>= x * 0.000000000248 Gbps<\/p>\n<p>(x bytes \/ 30 seconds) * (8 bits \/ 1 byte ) * ( 1 Gb \/\u00a01,073,741,824 bits )<\/p>\n<p>** Note: \u00a0You could argue that there are only 1 million bits in a Gigabit, go ahead and use that if it makes you more comfortable.<\/p>\n<p><strong>Bytes to Mbps from 30 second interval collection period<\/strong><\/p>\n<p>Probably easier to do this in Mbps:<\/p>\n<p>= x * 0.000000254 Mbps<\/p>\n<p>(x bytes \/ 30 seconds) * (8 bits \/ 1 byte) * (1 Mb \/ 1,048,576 bits)<\/p>\n<p>Just looking at those formulas (or multipliers as they really are), there are some simple rules we can follow:<\/p>\n<p><strong>Rule 1: \u00a0If the delta is not a 10 digit number or greater then you are not even doing a Gigabit per second on a 10 Gigabit link.<\/strong><\/p>\n<p><strong>Rule 2: \u00a0If the delta is not a 7 digit number or greater then you are not even doing a Megabit per second on a 10 Gigabit link.<\/strong><\/p>\n<p>Armed with this knowledge, we do our math:<\/p>\n<p>Rx: 837,101 * 0.000000254 = .212 Mbps = 212 kbps<\/p>\n<p>Tx: 691,921 * 0.000000254 = \u00a0.1757 Mbps = 175.7kbps<\/p>\n<p>Not a lot going on in this link is there?<\/p>\n<p>After looking at the rest of the links on the system they were all in the same 6 figure range with one exception: \u00a0One link (Fabric B, port 1) had Rx at 13,082,674 and Tx at 3,241,484 which is about 1.5 Mbps and 823 kbps<\/p>\n<p>Now, how can I find out what server is generating all that traffic? \u00a0(Let&#8217;s just suppose that 1.5 Mbps is a lot for pedagogical purposes)<\/p>\n<h2><strong>Examining Server vNIC bandwidth<\/strong><\/h2>\n<p>Since I have 2 cables per FEX I know that Fabric B uplink 1 is connected to all the B-side uplinks on odd slots. \u00a0(Remember \u00a0\u00a0<a href=\"http:\/\/jeremywaldrop.wordpress.com\/2010\/06\/30\/cisco-ucs-ethernet-frame-flows\/\">this post?<\/a>)<\/p>\n<p>All the even slots are connected to the 2nd one. \u00a0So this has to be either blade 1, 3, 5, or 7. \u00a0So what I have to do is check which Service Profiles are in those slots. \u00a0From the equipment tab I determine that I have:<\/p>\n<p>Slot 1: ESXi-1000v-02 -&gt; Slot 1<\/p>\n<p>Slot 3: Empty<\/p>\n<p>Slot 5: CIAC-ESXi4.1-02<\/p>\n<p>Slot 7: Empty<\/p>\n<p>I only have to check 2 servers. \u00a0On each server I have assigned a LAN connectivity connection so I know which vNIC is going out the B side. \u00a0From here its just a matter of finding the chatty one. \u00a0Here&#8217;s how I found my Most Chatty Server Port (MCSP):<\/p>\n<p>From the Servers tab, navigate to the service profile of the machine. \u00a0I have 6 vNICs in each one:<br \/>\n<a href=\"http:\/\/benincosa.com\/blog\/wp-content\/uploads\/2012\/09\/ucs-mon6.png\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-623\" title=\"ucs-mon6\" src=\"http:\/\/benincosa.com\/blog\/wp-content\/uploads\/2012\/09\/ucs-mon6.png\" alt=\"\" width=\"500\" \/><\/a><\/p>\n<p>Since I&#8217;ve labeled them, its pretty obvious which ones go out the B side. \u00a0Click on each vNIC and from the work pane, select statistics. \u00a0We expand the statistics and see a familiar screen. \u00a0But this time, we look under vNIC stats:<\/p>\n<p>&nbsp;<\/p>\n<p><a href=\"http:\/\/benincosa.com\/blog\/wp-content\/uploads\/2012\/09\/ucs-mon7.png\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-623\" title=\"ucs-mon7\" src=\"http:\/\/benincosa.com\/blog\/wp-content\/uploads\/2012\/09\/ucs-mon7.png\" alt=\"\" width=\"500\" \/><\/a><\/p>\n<p>After examining each of them I can see that the chatty interface is my NFSB vNIC. \u00a0Its doing a lot of work! And accounts for most of the change in deltas. \u00a0This is one of the reasons I recommend on UCS doing more than just the two default vNICs. \u00a0You get to see in hardware what is happening. \u00a0We found our most chatty server port and gained a lot of insight as to what this idle system is doing.<\/p>\n<p>If you did not find any chatty activity in the vNICs it might be the Fibre Channel. \u00a0Remember, we are doing FCoE from the Adapter to the Fabric Interconnects. \u00a0Try checking the counters there.<\/p>\n<h2>Examining UCS Uplink Bandwidth<\/h2>\n<p>To finish off this post, lets look at the uplinks coming out of the Fabric Interconnect. \u00a0This works differently if you have a Port-Channel or standard uplinks. \u00a0For Port-Channel, you would go to the LAN tab, select the port-channel from the LAN cloud and then look at the statistics there.<\/p>\n<p>If you do not have a port-channel configured, you can do it from the Equipment tab like we did before with the Server Ports (aka: FI to FEX ports). \u00a0From the equipment tab, filter by Fabric Interconnect and select the uplink ports:<\/p>\n<p><a href=\"http:\/\/benincosa.com\/blog\/wp-content\/uploads\/2012\/09\/ucs-mon8.png\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-623\" title=\"ucs-mon8\" src=\"http:\/\/benincosa.com\/blog\/wp-content\/uploads\/2012\/09\/ucs-mon8.png\" alt=\"\" width=\"500\" \/><\/a><\/p>\n<p>From here, look at the Rx and Tx total bytes delta to get an idea of how things are changing. \u00a0Pretty simple right? \u00a0Just look for greater than 10 digit deltas for hot spots.<\/p>\n<h2>Part 1 Summary<\/h2>\n<p>The purpose of this post was to help you understand what total network traffic looks like inside your UCS environment. \u00a0There are 3 spots to consider when understanding traffic patterns: \u00a0The server adapters, the FEX, and the uplinks. \u00a0Knowing how to read the statistics and make sense of them can help you quickly find hot spots. \u00a0The basic rule is that any delta in the Total Bytes Rx or Tx that has more than 10 is worth looking at and multiplying by\u00a00.000000000248 to get the total Gbps.<\/p>\n<p>It is worth pointing out that you can also select the &#8216;Chart&#8217; option under any of the statistics tool to see a trend. \u00a0When dealing with Rx and Tx deltas, you&#8217;ll have to modify the range of the scale otherwise it will seem that there is no data.<\/p>\n<p>Lastly, for long term analysis a different tool is needed. \u00a0UCSM only gives you a brief snapshot as there is not room to store it all in UCS Manager. \u00a0Open source tools like Cacti, Nagios, Zenoss, and Zabbix can help do this. \u00a0Solarwinds is also a popular commercial product that helps in performance tracking.<\/p>\n<p>In my next post, I&#8217;ll talk about monitoring thresholds so that you can have UCS generate an alarm if network traffic gets too high.<\/p>\n<p>Credits: \u00a0Steve McQuerry, Craig Schaff, David Nguyen, and Dan Hanson. \u00a0Thanks guys!<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Whenever we discuss monitoring systems, we usually need to start by understanding what we mean by monitoring. Usually its two related definitions: \u00a0Monitoring on one hand means looking at data, gaining visibility into what is happening on the system and being able to analyze it. \u00a0Monitoring also means alerting: \u00a0Let me know when something happens&#8230;.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[149,992],"tags":[],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/benincosa.com\/index.php?rest_route=\/wp\/v2\/posts\/622"}],"collection":[{"href":"https:\/\/benincosa.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/benincosa.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/benincosa.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/benincosa.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=622"}],"version-history":[{"count":5,"href":"https:\/\/benincosa.com\/index.php?rest_route=\/wp\/v2\/posts\/622\/revisions"}],"predecessor-version":[{"id":703,"href":"https:\/\/benincosa.com\/index.php?rest_route=\/wp\/v2\/posts\/622\/revisions\/703"}],"wp:attachment":[{"href":"https:\/\/benincosa.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=622"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/benincosa.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=622"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/benincosa.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=622"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}