Fixing Problem with Fencing with IPMI-L
I noticed some errors with fencing on the cluster at client’s site.
From the /var/log/messages:
Feb 14 09:59:52 pdb01 fenced[4819]: fencing node "pdb-node2" Feb 14 09:59:52 pdb01 fenced[4819]: agent "fence_ipmilan" reports: Rebooting machine @ IPMI:192.168.2.140...Failed
You can manually fence a node using redhat cluster tool “fence_node”:
[root@pdb01~]# fence_node pdb-node2 agent "fence_ipmilan" reports: Rebooting machine @ IPMI:192.168.2.140...Failed
You can also call the fence_ipmilan program manually, with many “-v” to get additional debug mesages:
$ fence_ipmilan -P imb -vvv -a 192.168.2.140 -l clusterpower -p XXX -o reboot Rebooting machine @ IPMI:192.168.2.140...Spawning: '/usr/bin/ipmitool -I lanplus -H '192.168.2.140' -U 'clusterpower' -P 'p0w3r0ff' -v -v -v chassis power status'... Spawned: '/usr/bin/ipmitool -I lanplus -H '192.168.2.140' -U 'clusterpower' -P 'p0w3r0ff' -v -v -v chassis power status' - PID 22333 Looking for: 'Password:', val = 1 'Unable to establish LAN', val = 11 'IPMI mutex', val = 14 'Unsupported cipher suite ID', val = 2048 'read_rakp2_message: no support for', val = 2048 'Chassis Power is off', val = 4096 'Chassis Power is on', val = 8192 ExpectToken returned -1 ExpectToken failed. Info returned: >>>>> IPMI LAN host 192.168.2.140 port 623^M ^M >> Sending IPMI command payload^M >> netfn : 0x06^M >> command : 0x38^M >> data : 0x8e 0x04 BUILDING A v1.5 COMMAND^M >> IPMI Request Session Header^M >> Authtype : NONE^M >> Sequence : 0x00000000^M >> Session ID : 0x00000000^M >> IPMI Request Message Header^M >> Rs Addr : 20^M >> NetFn : 06^M >> Rs LUN : 0^M >> Rq Addr : 81^M >> Rq Seq : 00^M >> Rq Lun : 0^M >> Command : 38^M << IPMI Response Session Header^M << Authtype : NONE^M << Payload type : IPMI (0)^M << Session ID : 0x00000000^M << Sequence : 0x00000000^M << IPMI Msg/Payload Length : 16^M << IPMI Response Message Header^M << Rq Addr : 81^M << NetFn : 07^M << Rq LUN : 0^M << Rs Addr : 20^M << Rq Seq : 00^M << Rs Lun : 0^M << Command : 38^M << Compl Code : 0x00^M >> SENDING AN OPEN SESSION REQUEST ^M >> Console generated random number (16 bytes)^M 87 44 95 28 f1 1d 00 ef e8 9f e3 74 0e 2a cb a6^M >> SENDING A RAKP 1 MESSAGE ^M bmc_rand (16 bytes)^M 2c f7 2c 41 c6 91 c6 ea ae 79 ae 93 78 43 78 73^M >> rakp2 mac input buffer (70 bytes)^M a4 a3 a2 a0 03 67 00 03 87 44 95 28 f1 1d 00 ef^M e8 9f e3 74 0e 2a cb a6 2c f7 2c 41 c6 91 c6 ea^M ae 79 ae 93 78 43 78 73 ed ac 88 08 9d 28 11 e0^M 96 88 e4 1f 13 bc ca f0 14 0c 63 6c 75 73 74 65^M 72 70 6f 77 65 72^M >> rakp2 mac key (20 bytes)^M 70 30 77 33 72 30 66 66 00 00 00 00 00 00 00 00^M 00 00 00 00^M >> rakp2 mac as computed by the remote console (20 bytes)^M b0 dc 36 53 e7 cd 31 3c f6 e5 e2 ef 11 8e 3f 1b^M ce 43 04 4a^M >> rakp3 mac input buffer (34 bytes)^M 2c f7 2c 41 c6 91 c6 ea ae 79 ae 93 78 43 78 73^M a4 a3 a2 a0 14 0c 63 6c 75 73 74 65 72 70 6f 77^M 65 72^M >> rakp3 mac key (20 bytes)^M 70 30 77 33 72 30 66 66 00 00 00 00 00 00 00 00^M 00 00 00 00^M generated rakp3 mac (20 bytes)^M 74 b5 e4 9e fa 14 00 0d 38 4e b6 88 87 4f ad 00^M 9b 0a 99 c6^M session integrity key input (46 bytes)^M 87 44 95 28 f1 1d 00 ef e8 9f e3 74 0e 2a cb a6^M 2c f7 2c 41 c6 91 c6 ea ae 79 ae 93 78 43 78 73^M 14 0c 63 6c 75 73 74 65 72 70 6f 77 65 72^M Generated session integrity key (20 bytes)^M 6c b1 0b 77 c5 5a 12 87 5c 03 48 03 13 b5 bf a7^M ad 15 0e 9a^M Generated K1 (20 bytes)^M b7 8d af 9a 90 9f 66 a3 6b 95 2d 84 82 35 37 0e^M 24 75 22 f1^M Generated K2 (20 bytes)^M 32 47 ca fc 7f 01 4e 6e c7 26 02 ed 7a f2 4b 53^M d6 9c 96 b6^M >> SENDING A RAKP 3 MESSAGE ^M >> rakp4 mac input buffer (36 bytes)^M 87 44 95 28 f1 1d 00 ef e8 9f e3 74 0e 2a cb a6^M 03 67 00 03 ed ac 88 08 9d 28 11 e0 96 88 e4 1f^M 13 bc ca f0^M >> rakp4 mac key (sik) (20 bytes)^M 6c b1 0b 77 c5 5a 12 87 5c 03 48 03 13 b5 bf a7^M ad 15 0e 9a^M >> rakp4 mac as computed by the BMC (20 bytes)^M 52 17 2e f7 50 7a 65 57 9a ef da f3 78 43 78 73^M ed ac 88 08^M >> rakp4 mac as computed by the remote console (20 bytes)^M 52 17 2e f7 50 7a 65 57 9a ef da f3 3a 4c 3f e5^M 9e e3 4b d7^M IPMIv2 / RMCP+ SESSION OPENED SUCCESSFULLY ^M ^M >> Sending IPMI command payload^M >> netfn : 0x06^M >> command : 0x3b^M >> data : 0x04 BUILDING A v2 COMMAND^M >> Initialization vector (16 bytes)^M 36 67 1a 57 a9 22 63 9e 06 d6 30 54 71 ce a7 80^M authcode input (48 bytes)^M 06 c0 03 67 00 03 03 00 00 00 20 00 36 67 1a 57^M a9 22 63 9e 06 d6 30 54 71 ce a7 80 48 e8 ed 98^M b7 a7 be 06 83 04 4f f4 9a 09 e3 7f ff ff 02 07^M authcode output (12 bytes)^M 94 dd 65 59 04 6b 90 fa ab d0 99 ba^M << IPMI Response Session Header^M << Authtype : Unknown (0x06)^M << Payload type : IPMI (0)^M << Session ID : 0xa0a2a3a4^M << Sequence : 0x00000001^M << IPMI Msg/Payload Length : 32^M << IPMI Response Message Header^M << Rq Addr : 81^M << NetFn : 07^M << Rq LUN : 0^M << Rs Addr : 20^M << Rq Seq : 01^M << Rs Lun : 0^M << Command : 3b^M << Compl Code : 0x81^M Set Session Privilege Level to ADMINISTRATOR failed: Unknown (0x81)^M Error: Unable to establish IPMI v2 / RMCP+ session^M Unable to get Chassis Power Status^M <<OPEN SESSION RESPONSE << Message tag : 0x00 << RMCP+ status : no errors << Maximum privilege level : admin << Console Session ID : 0xa0a2a3a4 << BMC Session ID : 0x03006703 << Negotiated authenticatin algorithm : hmac_sha1 << Negotiated integrity algorithm : hmac_sha1_96 << Negotiated encryption algorithm : aes_cbc_128 <<RAKP 2 MESSAGE << Message tag : 0x00 << RMCP+ status : no errors << Console Session ID : 0xa0a2a3a4 << BMC random number : 0x2cf72c41c691c6eaae79ae9378437873 << BMC GUID : 0xedac88089d2811e09688e41f13bccaf0 << Key exchange auth code [sha1] : 0xb0dc3653e7cd313cf6e5e2ef118e3f1bce43044a <<RAKP 4 MESSAGE << Message tag : 0x00 << RMCP+ status : no errors << Console Session ID : 0xa0a2a3a4 << Key exchange auth code [sha1] : 0x52172ef7507a65579aefdaf3 <<<<< Error = 2 (No such file or directory) Reaping pid 22333 Failed The "No such file or directory" is a red herring, the real error is "Set Session Privilege Level to ADMINISTRATOR failed" earlier on.
I tried with a different password and it didn’t get as far as the command above. So the ID and password pair were correct. The ID also works if I manually telnet to the management interface and reboot the server:
Welcome to the server management network terminal! login : clusterpower Password: Legacy CLI Application system>reset ok
Checked the documentation and there are two user IDs in the Management Interface, USERID and clusterpower. Tried “fence_ipmilan” again with user USERID and and it rebooted the node 2.
So … likely that clusterpower doesn’t have enough privilege to reboot the server.
Browsed over to the Management Interface and looked up user “clusterpower”, it has “Remote Server Power/Restart Access” but not “Remote Console Access”. Ticked that, and now
[root@pdb01~]# fence_node pdb-node2
Reboots the other server.
Comments (2):
- 2013-03-01 12:42:16+0800 Namran Hussin Noted with thanks.
- 2014-08-09 12:03:03+0800 Saudkhan Masoodkhan Xxxv
This post was originally published publicly on Google+ at 2013-02-14 13:30:16+0800