Windows Server 2008 iSCSI on xCAT

I wrote a little last year on installing Windows iSCSI with xCAT. Its a great trick and there’s more that Windows has come out with since then to make their HPC product do similar things.
The only problem with doing this on xCAT is that its a huge landmind of problems. Coupled with the fact that doing this on VMware makes it a slow process I thought I’d list all the things that can go wrong.

Here are the issues I ran into that took me quite a while to go through and debug:

1.  Corrupt ISO image.  A corrupt ISO image will actually copy with xCAT’s copycds and then you’ll actually see it expand just fine.  It isn’t until you get to the setup.exe when you start seeing messages like:

“this application has failed to start because SPWIZENG.DLL
was not found. Re-installing the application may fix this problem.”

“the file ‘autorun.dll’ could not be loaded or is corrupt. setup
cannot continue. error code is [0x7E]”

These errors were both due to a bad ISO.  I found the windows ISO on their website, downloaded it and problem solved.

2.  VMware DHCP server

You have to disable this!  Then you can let xCAT do all the DHCP work.  Even if xCAT serves the first DHCP and you get the iSCSI, there’s still some DHCP requests that happen after the install.  If you don’t get it then you have problems.

3.  Wrong or bad WinPE file

I was using a WinPE file that I had made from a Windows 7 install to do Windows 2008.  They are supposed to be backwards compatible, but this one didn’t work for me.  It could be that I forgot to include the right drivers.  But it worked just fine until all the sudden it dropped the iSCSI connection.  (I saw this in my syslog as it tgtd would get an unexpected disconnect)

4.  / file system full!

I didn’t realize that I had filled it up!  But apparently I did.  Total bummer.  So I had to clear out some data.  I found out this was so because the samba server wasn’t making any connections.  When I trolled through my logs I saw that it was because there was no space left!  Yikes.  I should have done something better to take care of that.

5.  xCAT tables…

This is where you can really be thrown off.  Especially if your noderes.netboot is set to pxe.  It should be set to xnba for it to work properly!

6.  gPXE or xNBA?

I couldn’t tell if gPXE was the problem so I tried to upgrade to 1.0.  This only made matters worse because of certain things xCAT does with DHCP.  (xNBA by the way means xCAT NetBoot Agent, which is gPXE with some patches)

So after a day or so of hacking around, we’re back to having xCAT deploy Windows Server 2008 over iSCSI without any special hardware.  Still a pretty decent solution for anyone looking for Windows Stateless.  Its as close as it gets right now.

Comments are closed.