Moneycontrol Brokerage Recos

Wednesday, August 7, 2019

Useful Exadata InfiniBand (IB) Switches commands


Here in this article, I am listing some useful Exadata IB switches commands summarized from DataCenter InfiniBand switch doc that we have been using while dealing with IB issues in our Exadata environments  and how we perform Exadata IB switch health checks.


1 - [ibswitches]

This InfiniBand command is a script the discovers the InfiniBand fabric topology or uses an existing topology file to extract the switch nodes.

The ibswitches command is available from the /SYS/Switch_Diag and /SYS/Fabric_Mgmt Linux shell targets of the Oracle ILOM CLI interface.


[root@exahostdbadm01 ~]# ibswitches
Switch  : <switch_gid> ports 36 "SUN DCS 36P QDR exahostibs01.example.com and IP" enhanced port 0 lid 2 lmc 0
Switch  : <switch_gid> ports 36 "SUN DCS 36P QDR exahostiba01.example.com and IP" enhanced port 0 lid 3 lmc 0
Switch  : <switch_gid> ports 36 "SUN DCS 36P QDR exahostibb01.example.com and IP" enhanced port 0 lid 1 lmc 0
[root@exahostdbadm01 ~]#


2 - [version]

To check IB switch firmware/software version, login to the IB switch as root user and run the command - $version

[root@exahostibs01 ~]# version
SUN DCS 36p version: 2.2.11-2
Build time: Aug 27 2018 11:18:39
SP board info:
Manufacturing Date: N/A
Serial Number: "XXXXXXX"
Hardware Revision: 0x0100
Firmware Revision: 0x0000
BIOS version: NUP1R918
BIOS date: 01/19/2016
[root@exahostibs01 ~]#


3 - [checkpower]

This hardware command checks the status of the power supplies. Output is a simplified OK.

The checkpower command is available from the /SYS/Switch_Diag and /SYS/Fabric_Mgmt Linux shell targets of the Oracle ILOM CLI interface.


[root@exahostibs01 ~]# checkpower
PSU 0 present OK
PSU 1 present OK
All PSUs OK



4 - [env_test]

This hardware command performs a series of hardware and environmental tests of the switch. This command is an amalgamation of the following commands:

checkpower

checkvoltages

showtemps

getfanspeed

connector

checkboot

The command output provides voltage and temperature values, pass-fail results, and error messages.

The env_test command is available from the /SYS/Switch_Diag and /SYS/Fabric_Mgmt Linux shell targets of the Oracle ILOM CLI interface.


[root@exahostibs01 ~]# env_test

[root@exahostibs01 ~]# env_test
Environment test started:
Starting Environment Daemon test:
Environment daemon running
Environment Daemon test returned OK <<<<<<<<<<<<<< ENV Daemon test is OK
Starting Voltage test:
Voltage ECB OK
Measured 3.3V Main = 3.28 V
Measured 3.3V Standby = 3.35 V
Measured 12V = 12.03 V
Measured 5V = 4.99 V
Measured VBAT = 3.03 V
Measured 2.5V = 2.47 V
Measured 1.8V = 1.78 V
Measured I4 1.2V = 1.22 V
Voltage test returned OK        <<<<<<<<<<<<<< Voltage test is OK
Starting PSU test:
PSU 0 present OK
PSU 1 present OK
PSU test returned OK            <<<<<<<<<<<<<< PSU test is OK
Starting Temperature test:
Back temperature 23
Front temperature 26
SP temperature 43
Switch temperature 43, maxtemperature 53
Temperature test returned OK    <<<<<<<<<<<<<< Temperature for IB switch is OK
Starting FAN test:
Fan 0 not present
Fan 1 running at rpm 9483
Fan 2 running at rpm 9483
Fan 3 running at rpm 9483
Fan 4 not present
FAN test returned OK             <<<<<<<<<<<<<< All fans test is OK
Starting Connector test:
Connector test returned OK
Starting Onboard ibdevice test:
Switch OK
All Internal ibdevices OK
Onboard ibdevice test returned OK <<<<<<<<<<<<<< all onboard IB devices are OK
Starting SSD test:
SSD test returned OK        <<<<<<<<<<<<<< SSD test is OK
Starting Auto-link-disable test:
Auto-link-disable test returned OK
Environment test PASSED     <<<<<<<<<<<<<< Here finally overall environment test is PASSED.
[root@exahostibs01 ~]#



5 - [fwverify]

fwverify command is used to verify the updated switch image after the patching/upgrade.

[root@exahostibs01 ~]# fwverify

Checking all present packages:
.............................................................................................................................................................................................................................................. OK

Checking if any packages are missing:
............................................................................................................................................................................................................................................. OK

Verifying installed files:
............................................................................................................................................................................................................................................. OK

Checking FW Coreswitch:
  FW Version: 7.4.3002 OK
  PSID: SUN_NM2-36p_006 OK
  Verifying image integrity OK

[root@exahostibs01 ~]#



6 - [getfanspeed]

This hardware command displays the speed of the fans. The command also indicates if the fan is not present or has stopped.

The getfanspeed command is available from the /SYS/Switch_Diag and /SYS/Fabric_Mgmt Linux shell targets of the Oracle ILOM CLI interface.


[root@exahostibs01 ~]# getfanspeed
Fan 0 not present
Fan 1 running at rpm 9483
Fan 2 running at rpm 9483
Fan 3 running at rpm 9483
Fan 4 not present
[root@exahostibs01 ~]#


7 - [getmaster -l]

This hardware command returns information about the node that hosts the primary (or master) Subnet Manager of the InfiniBand fabric. The -l option provides a short historical list of Subnet Manager activity.

The getmaster command is available from the /SYS/Switch_Diag and /SYS/Fabric_Mgmt Linux shell targets of the Oracle ILOM CLI interface.


[root@exahostibs01 ~]# getmaster -l
Local SM enabled and running, state STANDBY
Last change in Master SubnetManager status detected at: Wed Oct  3 00:03:40 GMT 2018
Master SubnetManager on sm lid 2 sm guid <#####>: SUN DCS 36P QDR exahostibs01 xxx.xxx.xxx.xxx
Master SubnetManager Activity Count: 159722821 Priority: 14
[root@exahostibs01 ~]#



8 - [listlinkup]

This hardware command lists the presence of links and the up-down state of the associated ports on the switch chip. The listlinkup command is available from the /SYS/Switch_Diag and SYS/Fabric_Mgmt Linux shell targets of the Oracle ILOM CLI interface.

[root@exahostibs01 ~]# listlinkup

[root@exahostibs01 ~]# listlinkup
Connector  0A Not present
Connector  1A Not present
Connector  2A Not present
Connector  3A Not present
Connector  4A Not present
Connector  5A Not present
Connector  6A Not present
Connector  7A Not present
Connector  8A Present <-> Switch Port 31 is up (Enabled)
Connector  9A Present <-> Switch Port 14 is up (Enabled)
Connector 10A Present <-> Switch Port 16 is up (Enabled)
Connector 11A Present <-> Switch Port 18 is up (Enabled)
Connector 12A Not present
Connector 13A Present <-> Switch Port 09 is up (Enabled)
Connector 14A Present <-> Switch Port 07 is up (Enabled)
Connector 15A Present <-> Switch Port 05 is up (Enabled)
Connector 16A Present <-> Switch Port 03 is up (Enabled)
Connector 17A Present <-> Switch Port 01 is up (Enabled)
Connector  0B Not present
Connector  1B Not present
Connector  2B Not present
Connector  3B Not present
Connector  4B Not present
Connector  5B Not present
Connector  6B Not present
Connector  7B Not present
Connector  8B Present <-> Switch Port 32 is up (Enabled)
Connector  9B Present <-> Switch Port 13 is up (Enabled)
Connector 10B Present <-> Switch Port 15 is up (Enabled)
Connector 11B Present <-> Switch Port 17 is up (Enabled)
Connector 12B Present <-> Switch Port 12 is up (Enabled)
Connector 13B Present <-> Switch Port 10 is up (Enabled)
Connector 14B Present <-> Switch Port 08 is up (Enabled)
Connector 15B Present <-> Switch Port 06 is up (Enabled)
Connector 16B Present <-> Switch Port 04 is up (Enabled)
Connector 17B Present <-> Switch Port 02 is up (Enabled)
[root@exahostibs01 ~]#



9 - [ibhosts]

This InfiniBand command is a script that discovers the InfiniBand fabric topology or uses the existing topology file to extract the channel adapter nodes.


[root@exahostdbadm01 ~]# ibhosts


10 - [showunhealthy]

This hardware command shows a list of switch components that appear to have a problem. Unlike the env_test command, the showunhealty command only displays messages for components that have failed testing.

The showunhealthy command is available from the /SYS/Switch_Diag and /SYS/Fabric_Mgmt Linux shell targets of the Oracle ILOM CLI interface.


[root@exahostibs01 ~]# showunhealthy
OK - No unhealthy sensors
[root@exahostibs01 ~]#


11 - [ibclearcounters]

This InfiniBand command is a script that clears the Performance Manager agent port counters by either discovering the InfiniBand fabric topology or using an existing topology file. The counters are:

XmtData
RcvData

Syntex:
--------
ibclearcounters [-h][topology|-C ca_name][-P ca_port][-t timeout]

where:
topology is the topology file.
ca_name is the channel adapter name.
ca_port is the channel adapter port.
timeout is the timeout in milliseconds.


[root@exahostibs01 ~]# ibclearcounters

## Summary: 14 nodes cleared 0 errors
[root@exahostibs01 ~]#


12 - [connector]

This hardware command performs a pass-fail test to verify that an InfiniBand cable is connected to a particular connector and to the switch chip port that the link routes. The command can also read the data registers of the cable and report FRU ID information.

[root@exahostdbadm01 ~]# connector
[root@exahostdbadm01 ~]#

connector name present|portstate|info|dump [-h] where name is the name of the connector (0A–17B).


13 - [ibqueryerrors.pl]

Run the ibqueryerrors.pl command to report on switch port error counters and port configuration information using the command:


[root@exahostibs01 ~] ibqueryerrors.pl -rR -s RcvSwRelayErrors,XmtDiscards,XmtWait,VL15Dropped

Errors such as LinkDowned, RcvSwRelayErrors, XmtDiscards, and XmtWait are ignored when using the preceding command.


14 - [checkboot]

This hardware command checks the boot status of the switch chip. Output is a simplified OK.
The checkboot command is available from the /SYS/Switch_Diag and /SYS/Fabric_Mgmt Linux shell targets of the Oracle ILOM CLI interface


[root@exahostibs01  ~]# checkboot
Switch OK
All Internal ibdevices OK


15 - [checkvoltages]

This hardware command displays the internal voltages for the main board. On the left side of the equals sign is the expected voltage. On the right side of the equals sign is the measured voltage. If the difference between the expected voltage and the measured voltage is more than 10%, the cause should be investigated. The command also provides a summary of the voltage conditions.


[root@exahostibs01 ~]# checkvoltages
Voltage ECB OK
Measured 3.3V Main = 3.28 V
Measured 3.3V Standby = 3.39 V
Measured 12V = 11.97 V
Measured 5V = 5.02 V
Measured VBAT = 3.09 V
Measured 2.5V = 2.47 V
Measured 1.8V = 1.78 V
Measured I4 1.2V = 1.22 V
All voltages OK
[root@exahostibs01 ~]#


16 - [ibnetdiscover]

Exadata Storage Servers are discovered using ibnetdiscover instead of kfod. I have trimmed all the output for this command since it results very detailed output about IB association targets.

The /opt/oracle.SupportTools/ibdiagtools/verify-topology utility is used to validate the InfiniBand cabling within your Exadata Database Machine. It uses ibnetdiscover, ibhosts, and ibswitches commands to conduct its tests.

[root@exahostdbadm01 ~]# ibnetdiscover
#
# Topology file: generated on Sat Mar 30 01:07:29 2019


Stay connected for further updates and new releases....!!

Twitter : @rajsoft8899
Linkedin : https://www.linkedin.com/in/raj-kumar-kushwaha-5a289219/


6 comments:

  1. My spouse and I stumbled over here by a different web address and thought I might as well check things out.

    I like what I see so i am just following you. Look forward to checking out your web page repeatedly.

    ReplyDelete
  2. Hello there, just became aware of your blog through Google, and found
    that it's really informative. I'm gonna watch out for brussels.
    I'll appreciate if you continue this in future. Lots of people will be benefited from your
    writing. Cheers!

    ReplyDelete
  3. I'd like to find out more? I'd want to find out some additional information.

    ReplyDelete
  4. Great article! We will be linking to this great post on our site.
    Keep up the great writing.

    ReplyDelete
  5. hey there and thank you for your information – I have certainly picked up anything new from right here.
    I did however expertise some technical points using this website,
    as I experienced to reload the site lots of times previous to I could get it to
    load properly. I had been wondering if your hosting
    is OK? Not that I'm complaining, but slow loading instances times will sometimes affect your placement in google and could damage your high-quality score if ads and marketing with Adwords.
    Anyway I am adding this RSS to my e-mail and could look
    out for much more of your respective fascinating content.
    Ensure that you update this again soon.

    ReplyDelete
  6. Hi, the whole thing is going sound here and ofcourse every one is sharing facts, that's really good, keep up writing.

    ReplyDelete