Today I was just playing with 2 node Oracle RAC cluster and 1st node went down suddenly. I could see the following entries in the CRS log file.
CRS log file:
========
2016-05-17 09:58:18.944: [ CSSCLNT][3038205648]clssscConnect: gipc request failed with 29 (0x16)
2016-05-17 09:58:18.944: [ CSSCLNT][3038205648]clsssInitNative: connect failed, rc 29
2016-05-17 09:58:18.944: [ CRSRTI][3038205648] CSS is not ready. Received status 3 from CSS. Waiting for good status ..
2016-05-17 09:58:19.945: [ CSSCLNT][3038205648]clssscConnect: gipc request failed with 29 (0x16)
2016-05-17 09:58:19.945: [ CSSCLNT][3038205648]clsssInitNative: connect failed, rc 29
2016-05-17 09:58:19.945: [ CRSRTI][3038205648] CSS is not ready. Received status 3 from CSS. Waiting for good status ..
From OCSSD log file:
==============
==> ocssd.log <==
2016-05-17 10:27:16.996: [ CSSD][2998332304]clssgmclientlsnr: listening on clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac1_)(GIPCID
=ef2b0e79-00000000-5893))
2016-05-17 10:27:16.996: [ GPnP][3038959296]clsgpnp_Init: [at clsgpnp0.c:404] gpnp tracelevel 3, component tracelevel 0
2016-05-17 10:27:16.996: [ GPnP][3038959296]clsgpnp_Init: [at clsgpnp0.c:534] '/u01/app/11.2.0/grid' in effect as GPnP home base.
2016-05-17 10:27:17.079: [GIPCCLSA][2998332304]gipcmodClsaCompleteAccept: failed on clsaauthstart ret clsaretOSD (8), endp 0x8b26a10 [0
00000000000002f] { gipcEndpoint : localAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac1_)(GIPCID=fa51a6a2-77bcb030-5893))', remot
eAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac1_)(GIPCID=77bcb030-fa51a6a2-5834))', numPend 5, numReady 0, numDone 0, numDead 0
, numTransfer 0, objFlags 0x16ca, pidPeer 0, flags 0x603710, usrFlags 0x14000 }
2016-05-17 10:27:17.079: [GIPCCLSA][2998332304]gipcmodClsaCompleteAccept: slos op : mkdir
2016-05-17 10:27:17.079: [GIPCCLSA][2998332304]gipcmodClsaCompleteAccept: slos dep : No space left on device (28)
2016-05-17 10:27:17.079: [GIPCCLSA][2998332304]gipcmodClsaCompleteAccept: slos loc : authprep6
2016-05-17 10:27:17.079: [GIPCCLSA][2998332304]gipcmodClsaCompleteAccept: slos info: failed to make dir /u01/app/11.2.0/grid/auth/css/
rac1/A2526853
2016-05-17 10:27:17.079: [GIPCXCPT][2998332304]gipcmodMuxTransferAccept: internal accept request failed endp 0x8b25c30 [000000000000001
b] { gipcEndpoint : localAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac1_)(GIPCID=ef2b0e79-00000000-5893))', remoteAddr '', numP
end 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, flags 0x30613, usrFlags 0x10010 }, ret gipcretAuthFail
(22)
2016-05-17 10:27:17.079: [ GIPCMUX][2998332304]gipcmodMuxTransferAccept: EXCEPTION[ ret gipcretAuthFail (22) ] error during accept on
endp 0x8b25c30 [000000000000001b] { gipcEndpoint : localAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rac1_)(GIPCID=ef2b0e79-000000
00-5893
[root@rac1 cssd]#
Cause : In the ocssd log file we can see that "No space left on device (28)" and when I checked at mount point /u01 found it was 100% full.
Resolution : I Just released space inside the mount /u01 and did reboot node1 and all cluster services came up automatically as expected at subsequent reboot.
Another cause and resolution:
====================
Sometimes, after maintenance tasks when we try to brinng up all CRS services then we get similary kind of issues due to lock files does exist inside /tmp or /var/tmp/.oracle directory
[root@rac1 tmp]# pwd
/tmp
We can see following files are there inside /tmp directory, just login as root user and navigate to /tmp or /var/tmp/ and delete(rm -rf *) these all and reboot the server.
[root@rac1 tmp]# ls -ltr /var/tmp/.oracle/
total 0
prw-r--r-- 1 root root 0 Mar 13 15:37 npohasd
srwxrwxrwx 1 oragrid oinstall 0 Mar 16 01:28 s#6509.2
srwxrwxrwx 1 oragrid oinstall 0 Mar 16 01:28 s#6509.1
srwxrwxrwx 1 oragrid oinstall 0 May 12 08:33 s#6543.2
srwxrwxrwx 1 oragrid oinstall 0 May 12 08:33 s#6543.1
srwxrwxrwx 1 oragrid oinstall 0 May 12 08:33 s#6571.2
srwxrwxrwx 1 oragrid oinstall 0 May 12 08:33 s#6571.1
srwxrwxrwx 1 oragrid oinstall 0 May 12 08:41 s#6481.2
srwxrwxrwx 1 oragrid oinstall 0 May 12 08:41 s#6481.1
srwxrwxrwx 1 oragrid oinstall 0 May 12 08:48 s#7342.2
srwxrwxrwx 1 oragrid oinstall 0 May 12 08:48 s#7342.1
-rw-r--r-- 1 oragrid oinstall 0 May 12 10:08 ora_gipc_GPNPD_rac1_lock
srwxrwxrwx 1 oragrid oinstall 0 May 12 10:09 s#6486.2
srwxrwxrwx 1 oragrid oinstall 0 May 12 10:09 s#6486.1
srwxrwxrwx 1 oragrid oinstall 0 May 12 10:11 s#7133.2
srwxrwxrwx 1 oragrid oinstall 0 May 12 10:11 s#7133.1
srwxrwxrwx 1 oragrid oinstall 0 May 15 14:29 ora_gipc_GPNPD_rac1
srwxrwxrwx 1 root root 0 May 15 14:30 srac1DBG_CTSSD
srwxrwxrwx 1 oragrid oinstall 0 May 15 14:30 srac1DBG_EVMD
srwxrwxrwx 1 root root 0 May 15 14:30 sprocr_local_conn_0_PROC
srwxrwxrwx 1 oragrid oinstall 0 May 15 14:30 sSYSTEM.evm.acceptor.auth
srwxrwxrwx 1 oragrid oinstall 0 May 15 14:30 sCevm
srwxrwxrwx 1 oragrid oinstall 0 May 15 14:30 sAevm
srwxrwxrwx 1 root root 0 May 15 14:30 sCRSD_IPC_SOCKET_11
srwxrwxrwx 1 root root 0 May 15 14:30 sora_crsqs
srwxrwxrwx 1 root root 0 May 15 14:30 sCRSD_UI_SOCKET
srwxrwxrwx 1 oragrid oinstall 0 May 15 19:07 sLISTENER
srwxrwxrwx 1 oragrid oinstall 0 May 15 19:07 s#10780.2
srwxrwxrwx 1 oragrid oinstall 0 May 15 19:07 s#10780.1
-rw-r--r-- 1 oragrid oinstall 0 May 17 09:52 sOCSSD_LL_rac1__lock
srwxrwxrwx 1 root root 0 May 17 09:57 srac1DBG_CRSD
srwxrwxrwx 1 root root 0 May 17 10:10 srac1DBG_OHASD
srwxrwxrwx 1 root root 0 May 17 10:10 sprocr_local_conn_0_PROL
srwxrwxrwx 1 root root 0 May 17 10:10 sOHASD_UI_SOCKET
srwxrwxrwx 1 root root 0 May 17 10:10 sOHASD_IPC_SOCKET_11
srwxrwxrwx 1 oragrid oinstall 0 May 17 10:10 srac1DBG_MDNSD
srwxrwxrwx 1 oragrid oinstall 0 May 17 10:10 srac1DBG_GIPCD
srwxrwxrwx 1 oragrid oinstall 0 May 17 10:12 srac1DBG_GPNPD
srwxrwx--- 1 oragrid oinstall 0 May 17 10:14 master_diskmon
srwxrwxrwx 1 oragrid oinstall 0 May 17 10:14 srac1DBG_CSSD
srwxrwxrwx 1 oragrid oinstall 0 May 17 10:14 sOCSSD_LL_rac1_
[root@rac1 tmp]# cd /var/tmp/.oracle/
[root@rac1 .oracle]# pwd
/var/tmp/.oracle
[root@rac1 .oracle]# rm -rf *
[root@rac1 .oracle]# ls -ltr
total 0
Note : In my case underline mount /u01 of GRID_HOME was exhausted and relasing the space resolved my issue.
No comments:
Post a Comment