{"id":626,"date":"2012-09-05T23:39:40","date_gmt":"2012-09-06T05:39:40","guid":{"rendered":"http:\/\/benincosa.com\/blog\/?p=626"},"modified":"2014-11-19T11:24:33","modified_gmt":"2014-11-19T17:24:33","slug":"ucs-monitoring-part-2-alerts-for-bandwidth","status":"publish","type":"post","link":"https:\/\/benincosa.com\/?p=626","title":{"rendered":"UCS Monitoring Part 2: Alerts for Bandwidth"},"content":{"rendered":"<p>This part 2 in my two part series on monitoring UCS. \u00a0<a href=\"http:\/\/benincosa.com\/blog\/?p=622\">Part one<\/a> dealt with analyzing data and making sense of what UCS Manager already collects and displays for you. \u00a0This part will focus on alerting. \u00a0In particular, our objective is to give us a warning when the bandwidth utilization goes above 80% and a critical alert when bandwidth goes above 90%.<\/p>\n<p>Once again I will be following the slides presented at Cisco Live by\u00a0Steve McQuerry session ID BRKCOM-2004 in San Diego earlier this year. \u00a0You can get those too by visiting <a href=\"http:\/\/ciscolive365.com\">http:\/\/ciscolive365.com<\/a> (login required).<\/p>\n<h2>First&#8230; some math<\/h2>\n<p>We will assume our links are simple 10GbE links. \u00a0If we hit 80% and 90% then we are looking to monitor when bandwidth hits 8Gbps and 9Gbps. \u00a0Easy math right? \u00a0But unfortunately UCS reports new bytes collected every 30 seconds. \u00a0Therefore, we need to convert Gbps into Bytes \/ 30 seconds and monitor for that number.<\/p>\n<p>The math is still simple but the concept of converting units can be a little frustrating. \u00a0Here is how we do it:<\/p>\n<p>x Gbps * \u00a0(30 seconds ) * (1,000,000,000 bits \/ 1Gb ) \u00a0*\u00a0(1 byte \/ 8 bits) ~ 3,750,000,000<\/p>\n<p>or you\u00a0<a href=\"http:\/\/en.wikipedia.org\/wiki\/Data_rate_units\">could argue<\/a> there are 1,073,741,824 bits per gigabit. \u00a0In which case you would have:<\/p>\n<p>x Gbps * (30 seconds) * (1,073,741,824 bits \/ 1 Gb ) * ( 1 byte \/ 8 bits) ~ 4,026531840<\/p>\n<p>I&#8217;ve seen it both ways and I&#8217;m not going to argue with it. \u00a0To be consistent with the previous post I&#8217;ll use 4,026,531,840 as my multiplier. \u00a0So multiply the expected Gbps by that number 4,026,531,840.<\/p>\n<p>Here&#8217;s a table that takes common speeds that we&#8217;ll be interested and converts them:<\/p>\n<table>\n<tbody>\n<tr>\n<td>Bytes\/30 second multiplier<\/td>\n<td>4,026,531,840<\/td>\n<\/tr>\n<tr>\n<td>Speed in Gbps<\/td>\n<td>Bytes \/ 30 seconds<\/td>\n<\/tr>\n<tr>\n<td>1<\/td>\n<td>4,026,531,840<\/td>\n<\/tr>\n<tr>\n<td>5<\/td>\n<td>20,132,659,200<\/td>\n<\/tr>\n<tr>\n<td>7.5<\/td>\n<td>30,198,988,800<\/td>\n<\/tr>\n<tr>\n<td>8<\/td>\n<td>32,212,354,720<\/td>\n<\/tr>\n<tr>\n<td>8.5<\/td>\n<td>34,225,520,640<\/td>\n<\/tr>\n<tr>\n<td>9<\/td>\n<td>36,238,786,560<\/td>\n<\/tr>\n<tr>\n<td>10<\/td>\n<td>40,265,318,400<\/td>\n<\/tr>\n<tr>\n<td>16<\/td>\n<td>64,424,509,440<\/td>\n<\/tr>\n<tr>\n<td>18<\/td>\n<td>72,477,573,120<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h2>Creating Alerts<\/h2>\n<p>Now that we know what we are looking for, lets create some alerts. \u00a0There are 3 hotspots to consider in UCS: \u00a0The bits leaving the server adapter, the FEX to Fabric Interconnect, and the Fabric Interconnect to upstream switch. \u00a0Let&#8217;s start by looking at the server adapter.<\/p>\n<h3><strong>Step 1: Create the Threshold Policies<\/strong><\/h3>\n<p>From the LAN tab, filter by policies and navigate to Threshold Policies<\/p>\n<p><a href=\"http:\/\/benincosa.com\/blog\/wp-content\/uploads\/2012\/09\/ucs-aler-1.png\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter size-full wp-image-628\" title=\"ucs-aler-1\" src=\"http:\/\/benincosa.com\/blog\/wp-content\/uploads\/2012\/09\/ucs-aler-1.png\" alt=\"\" width=\"307\" height=\"288\" \/><\/a><\/p>\n<p>Right click the Threshold Policies and select &#8220;Create Threshold Policy&#8221;. \u00a0We&#8217;re going to create a new Threshold Policy and call it 10Gb-Policy<\/p>\n<p><a href=\"http:\/\/benincosa.com\/blog\/wp-content\/uploads\/2012\/09\/ucs-aler-2.png\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-629\" title=\"ucs-aler-2\" src=\"http:\/\/benincosa.com\/blog\/wp-content\/uploads\/2012\/09\/ucs-aler-2.png\" alt=\"\" width=\"500\" \/><\/a><\/p>\n<p>Select &#8216;Next&#8217; and add a Stat Class. \u00a0We&#8217;re going to add Vnic Stats:<\/p>\n<p><a href=\"http:\/\/benincosa.com\/blog\/wp-content\/uploads\/2012\/09\/ucs-aler-3.png\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-630\" title=\"ucs-aler-3\" src=\"http:\/\/benincosa.com\/blog\/wp-content\/uploads\/2012\/09\/ucs-aler-3.png\" alt=\"\" width=\"500\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>The next screen is for creating our definitions. \u00a0We&#8217;re going to create 2 definitions: \u00a01 for Rx Bytes Delta and 1 for Tx Bytes Delta. \u00a0We&#8217;ll create a major event (when network bandwidth hits 90% of 10Gbps) and a minor event (When network bandwidth hits 80% of 10Gbps). \u00a0We also need to put a value in for when the alarm will stop. \u00a0We can use 85% for the major alarm and 75% for the minor alarm. \u00a0This means if network bandwidth hits 80%, then we&#8217;ll trigger a warning and that minor alarm won&#8217;t go away until network bandwidth goes down to 75%. \u00a0Similarly, if network bandwidth hits 90% then we&#8217;ll trigger an alert and it won&#8217;t subside until network bandwidth utilization goes below 85% or 8.5Gbps in this case.<\/p>\n<p>Using our table from above we now fill in the blanks for the Tx Delta:<\/p>\n<p><a href=\"http:\/\/benincosa.com\/blog\/wp-content\/uploads\/2012\/09\/ucs-alar-4.png\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-631\" title=\"ucs-alar-4\" src=\"http:\/\/benincosa.com\/blog\/wp-content\/uploads\/2012\/09\/ucs-alar-4.png\" alt=\"\" width=\"500\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>We also need to do this for the Rx Delta after saving this off. \u00a0This should look identical to the Tx Delta with the Property Type being the only difference. \u00a0When we&#8217;re done we have a nice Threshold Policy:<\/p>\n<p><a href=\"http:\/\/benincosa.com\/blog\/wp-content\/uploads\/2012\/09\/ucs-aler-6.png\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-632\" title=\"ucs-aler-6\" src=\"http:\/\/benincosa.com\/blog\/wp-content\/uploads\/2012\/09\/ucs-aler-6.png\" alt=\"\" width=\"500\" \/><\/a><\/p>\n<h3><strong>Step 2: \u00a0Associate the Threshold Policy to a vNIC Template<\/strong><\/h3>\n<p>Since we use LAN connectivity templates, we only need to modify our LAN connectivity templates on the nodes we are using to include our new 10Gb-Policy. \u00a0If you don&#8217;t, you&#8217;ll have to go modify every vNIC on every service profile.<\/p>\n<p>From the LAN tab, filter by Policies open the VNIC Templates and select the VNIC Template you used on your virtual machines. \u00a0Change the Stats threshold policy to match the 10Gb-Policy we just created and save changes:<\/p>\n<p><a href=\"http:\/\/benincosa.com\/blog\/wp-content\/uploads\/2012\/09\/ucs-aler-7.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter size-full wp-image-633\" title=\"ucs-aler-7\" src=\"http:\/\/benincosa.com\/blog\/wp-content\/uploads\/2012\/09\/ucs-aler-7.jpg\" alt=\"\" width=\"515\" height=\"406\" \/><\/a><\/p>\n<p>Do this for all VNICs Templates. \u00a0If you configured them as updating templates (hopefully) then you shouldn&#8217;t have to do anything else and they&#8217;ll all be monitored.<\/p>\n<h3><strong>Step 3: \u00a0Repeat for Uplinks<\/strong><\/h3>\n<p>From the LAN tab, filter by LAN Cloud. \u00a0Add to the default policy the same steps you did in step 1. \u00a0You should have etherRxStats and etherTxStats when you are done. \u00a0This will then be applied to the uplinks provided they are not port channels. \u00a0This applies to single links. \u00a0To deal with port channels, you simply click on the port channel and edit it there.<\/p>\n<p><a href=\"http:\/\/benincosa.com\/blog\/wp-content\/uploads\/2012\/09\/ucs-aler-8.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter size-full wp-image-634\" title=\"ucs-aler-8\" src=\"http:\/\/benincosa.com\/blog\/wp-content\/uploads\/2012\/09\/ucs-aler-8.jpg\" alt=\"\" width=\"296\" height=\"377\" \/><\/a><\/p>\n<h3><strong>Step 4: \u00a0Repeat for FEX connections<\/strong><\/h3>\n<p><strong> <\/strong><\/p>\n<p>From the LAN tab, filter by Internal LAN. \u00a0Add to the default policy (you won&#8217;t be able to create a new policy). \u00a0This will be the same values as you had in the previous step.<\/p>\n<p><a href=\"http:\/\/benincosa.com\/blog\/wp-content\/uploads\/2012\/09\/ucs-aler-9.png\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter size-full wp-image-635\" title=\"ucs-aler-9\" src=\"http:\/\/benincosa.com\/blog\/wp-content\/uploads\/2012\/09\/ucs-aler-9.png\" alt=\"\" width=\"290\" height=\"252\" \/><\/a><\/p>\n<p>Good! \u00a0That was a lot of typing. \u00a0You are now ready to be alerted!<\/p>\n<h2><strong>Testing Alerts<\/strong><\/h2>\n<p><strong> <\/strong><\/p>\n<p>To see if this really works we used the\u00a0<a href=\"http:\/\/sourceforge.net\/projects\/iperf\/\">iperf benchmark<\/a>. \u00a0For the Windows operating system you can use the\u00a0<a href=\"http:\/\/iperf.sourceforge.net\/\">jperf benchmark<\/a>.\u00a0 In my lab I created 2 Red Hat Linux VMs named iperf1 and iperf2. \u00a0I then loaded them up on two different vSphere ESXi hosts. \u00a0I created an anti-affinity policy so that they would not be migrated to the same host. \u00a0The hosts were located at chassis 1 blade 1 and chassis 2 blade 1. \u00a0We made the traffic leave the Fabric Interconnects by tying one VM to the vNIC on the A side and the other VM tied to the VM on the B side. \u00a0This looks similar to the logical diagram below:<\/p>\n<p><a href=\"http:\/\/benincosa.com\/blog\/wp-content\/uploads\/2012\/09\/perf-route.png\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-636\" title=\"perf-route\" src=\"http:\/\/benincosa.com\/blog\/wp-content\/uploads\/2012\/09\/perf-route.png\" alt=\"\" width=\"515\" \/><\/a><\/p>\n<p>On iperf1 I ran:<\/p>\n<pre>[root@iperf1 ~]# iperf -s -f m<\/pre>\n<p>That is the server. \u00a0Then on the other host I ran:<\/p>\n<pre>[root@iperf2 ~]# while iperf -c 192.168.50.151; do true; done<\/pre>\n<p>It wasn&#8217;t long before we saw errors going all the way up through the stack:<\/p>\n<p><a href=\"http:\/\/benincosa.com\/blog\/wp-content\/uploads\/2012\/09\/ucs-mon2-last.png\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-637\" title=\"ucs-mon2-last\" src=\"http:\/\/benincosa.com\/blog\/wp-content\/uploads\/2012\/09\/ucs-mon2-last.png\" alt=\"\" width=\"500\" \/><\/a><\/p>\n<p>Looks like our alerting works!<\/p>\n<h2><strong>Conclusion<\/strong><\/h2>\n<p><strong> <\/strong><\/p>\n<p>In this post we showed how to get alerts when bandwidth gets too high. \u00a0We used a constant of\u00a04,026,531,840 to multiply with the desired Gigabits per second that we are interested in monitoring. \u00a0We created threshold policies on the NICs, the FEXs and the Fabric Interconnect uplinks. \u00a0We then tested to see that errors were generated all the way through when the bandwidth got to high.<\/p>\n<p>Hopefully this helps you get a better idea of what is happening inside your UCS. \u00a0Now you can decide whether you really need all those uplinks or not. \u00a0If not, then you can use those ports for other things.<\/p>\n<p>I want to mention here that we only focused on the Ethernet side of things. The Fibre Channel network follows a very similar process. \u00a0When troubleshooting suspected bandwidth issues, be sure to examine your fibre channel traffic as well.<\/p>\n<p>Finally, I want to thank Steve McQuerry (the coolest last name any database guru could ever have) for helping me understand how UCS monitoring and alerting works. \u00a0He&#8217;s written some great slides, given great presentations, and has some other things in the works.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This part 2 in my two part series on monitoring UCS. \u00a0Part one dealt with analyzing data and making sense of what UCS Manager already collects and displays for you. \u00a0This part will focus on alerting. \u00a0In particular, our objective is to give us a warning when the bandwidth utilization goes above 80% and a&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[149,150,992],"tags":[],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/benincosa.com\/index.php?rest_route=\/wp\/v2\/posts\/626"}],"collection":[{"href":"https:\/\/benincosa.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/benincosa.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/benincosa.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/benincosa.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=626"}],"version-history":[{"count":7,"href":"https:\/\/benincosa.com\/index.php?rest_route=\/wp\/v2\/posts\/626\/revisions"}],"predecessor-version":[{"id":2768,"href":"https:\/\/benincosa.com\/index.php?rest_route=\/wp\/v2\/posts\/626\/revisions\/2768"}],"wp:attachment":[{"href":"https:\/\/benincosa.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=626"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/benincosa.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=626"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/benincosa.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=626"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}