{"id":360,"date":"2011-04-11T15:28:14","date_gmt":"2011-04-11T21:28:14","guid":{"rendered":"http:\/\/benincosa.com\/blog\/?p=360"},"modified":"2014-11-19T11:25:16","modified_gmt":"2014-11-19T17:25:16","slug":"another-reason-for-xcat","status":"publish","type":"post","link":"https:\/\/benincosa.com\/?p=360","title":{"rendered":"Another Reason for xCAT"},"content":{"rendered":"<p>Another testament of the power of xCAT was shown to me today. \u00a0We had a machine with an amber light on it, meaning: \u00a0&#8220;Something is wrong with this server&#8221;. \u00a0The system engineer came out and reseated everything. \u00a0Then they went through and replaced the entire system board thinking that would help. \u00a0When that didn&#8217;t solve it, they replaced the power supplies. \u00a0When that didn&#8217;t solve it, I finally said: \u00a0Ok, let me take a look. \u00a0This grasping for straws in the dark is quite frustrating when managing hardware.<\/p>\n<p>I ran the xCAT reventlog command and cleared the hardware log. \u00a0Then we ran it again after the amber light turned back on. \u00a0WeI then got the following:<\/p>\n<pre># reventlog n316\r\nn316: 04\/11\/2011 06:57:59 Event Logging Disabled, Log Area Reset\/Cleared (SEL Fullness)\r\nn316: 04\/11\/2011 06:58:05 System Firmware Progress, Unspecified (Progress)\r\nn316: 04\/11\/2011 07:01:02 Fan, Lower Critical - going low (Fan 3B Tach reading 0 RPM with threshold 1872 RPM)<\/pre>\n<p>So I said: \u00a0Replace that fan. \u00a0They did. \u00a0Problem solved. \u00a0Then I looked at my syslog and found:<\/p>\n<pre>Apr 11 07:01:19 xcat1 xCATMon Event: SNMP CRITICAL received from hs316-imm(UDP: [172.20.3.16]:623). CRITICAL: Fan, Lower Critical - going low (Sensor 0x45)<\/pre>\n<p>In other words, xCAT already had notified us that the there was a problem!<\/p>\n<p>Most system management tools focus on deploying hardware and ignore the very real problem of managing your hardware as well. \u00a0xCAT does this on multiple levels and is extremely helpful for debugging hardware errors. \u00a0That SNMP trap had come as an SEL. \u00a0xCAT has built in functionality to decode IPMI events into real meaningful messages. \u00a0Like: \u00a0There&#8217;s a problem with the fan.<\/p>\n<p>Had we turned to xCAT before the system engineers got on site, we would have saved 1-2 man days. \u00a0In addition, we would have saved a perfectly good planar board. \u00a0Hardware vendors gain a lot by using xCAT. \u00a0Those savings alone, had I not been busy on other things, or had someone else known the power of xCAT could have saved about $5,000. \u00a0This is just one case and there are others as well.<\/p>\n<p>Our job at Sumavi is to make it so the power of xCAT can be easily packaged, harnessed, and\u00a0digested\u00a0by everyone easier. \u00a0Its a difficult task and one that we&#8217;re constantly working on, but we&#8217;re getting much better at it.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Another testament of the power of xCAT was shown to me today. \u00a0We had a machine with an amber light on it, meaning: \u00a0&#8220;Something is wrong with this server&#8221;. \u00a0The system engineer came out and reseated everything. \u00a0Then they went through and replaced the entire system board thinking that would help. \u00a0When that didn&#8217;t solve&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[59,916],"tags":[],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/benincosa.com\/index.php?rest_route=\/wp\/v2\/posts\/360"}],"collection":[{"href":"https:\/\/benincosa.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/benincosa.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/benincosa.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/benincosa.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=360"}],"version-history":[{"count":2,"href":"https:\/\/benincosa.com\/index.php?rest_route=\/wp\/v2\/posts\/360\/revisions"}],"predecessor-version":[{"id":362,"href":"https:\/\/benincosa.com\/index.php?rest_route=\/wp\/v2\/posts\/360\/revisions\/362"}],"wp:attachment":[{"href":"https:\/\/benincosa.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=360"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/benincosa.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=360"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/benincosa.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=360"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}