Moneycontrol Brokerage Recos

Tuesday, August 30, 2016

SQL Tuning - 100% CPU Utilization






SQL Using 100% CPU
-----------------------------

After a weekend maintenance production activity(DB was bounced), EBS application started behaving abnormally, CPU utilisation was continuously being utilised 100%, whole production system was likely to be down for end users as they were not able to submit and process their requests.

I wondered around and get the AWR report for the problematic duration and found that TX – contention lock was there in the TOP 5 wait events and CBC latches as well. We cleared CBC latches but no help with that.

Investigated further and found a single SQL SELECT Statement (based on a view) was consuming TOP CPU order by CPU elapsed time and ordered by executions.


Same SQL was working fine usually but suddenly it started hogging whole Database server CPU. I found that Execution plan for that SQL statement was changed since DB server was bounced after maintenance activity.  



Please use below SQL to find the SQL Plan History of a particular SQL ID.
------------------------------------------------------------------------------------


set pagesize 1000
set linesize 200
column begin_interval_time format a20
column milliseconds_per_execution format 999999990.999
column rows_per_execution format 999999990.9
column buffer_gets_per_execution format 999999990.9
column disk_reads_per_execution format 999999990.9
break on begin_interval_time skip 1

SELECT
  to_char(s.begin_interval_time,'mm/dd hh24:mi')
    AS begin_interval_time,
  ss.plan_hash_value,
  ss.executions_delta,
  CASE
    WHEN ss.executions_delta > 0
    THEN ss.elapsed_time_delta/ss.executions_delta/1000
    ELSE ss.elapsed_time_delta
  END AS milliseconds_per_execution,
  CASE
    WHEN ss.executions_delta > 0
    THEN ss.rows_processed_delta/ss.executions_delta
    ELSE ss.rows_processed_delta
  END AS rows_per_execution,
  CASE
    WHEN ss.executions_delta > 0
    THEN ss.buffer_gets_delta/ss.executions_delta
    ELSE ss.buffer_gets_delta
  END AS buffer_gets_per_execution,
  CASE
    WHEN ss.executions_delta > 0
    THEN ss.disk_reads_delta/ss.executions_delta
    ELSE ss.disk_reads_delta
  END AS disk_reads_per_execution
FROM wrh$_sqlstat ss
INNER JOIN wrm$_snapshot s ON s.snap_id = ss.snap_id
WHERE ss.sql_id = '&sql_id'
AND ss.buffer_gets_delta > 0

ORDER BY s.snap_id, ss.plan_hash_value;





Below is the OLD SQL Plan which was working fine.
------------------------------------------------

BEGIN_INTERVAL_TIME  PLAN_HASH_VALUE EXECUTIONS_DELTA MILLISECONDS_PER_EXECUTION ROWS_PER_EXECUTION BUFFER_GETS_PER_EXECUTION DISK_READS_PER_EXECUTION
-------------------- --------------- ---------------- -------------------------- ------------------ ------------------------- ------------------------
08/14 05:15               3139549555                6                    758.744              345.0                   47354.0                      0.0
08/14 05:30               3139549555                6                    762.433              345.0                   47350.0                      0.0
08/14 05:45               3139549555               10                    763.121              345.0                   47350.0                      0.0
08/14 09:00               3139549555                4                   3081.986              348.0                  103319.3                   2141.3
08/14 10:30               3139549555                2                   1821.184              370.0                  126928.5                      2.5
08/14 11:15               3139549555                2                   1926.320              372.0                  135198.0                      0.0




Below is the BAD SQL Plan which optimiser chooses after maintenance activity. You can see dramatic increase in time per execution for the SQL Statement.
----------------------------------------------------------------------------------------

BEGIN_INTERVAL_TIME  PLAN_HASH_VALUE EXECUTIONS_DELTA MILLISECONDS_PER_EXECUTION ROWS_PER_EXECUTION BUFFER_GETS_PER_EXECUTION DISK_READS_PER_EXECUTION
-------------------- --------------- ---------------- -------------------------- ------------------ ------------------------- ------------------------
08/15 05:15               4140799271                1                 215288.221                1.0                13049295.0                   8088.0
08/15 21:00               4140799271                1                 292584.136                0.0                16131485.0                      8.0
08/15 21:15               4140799271                0              883156121.000                0.0                48242380.0                      0.0
08/15 21:30               4140799271                0              892885844.000                0.0                48965395.0                      0.0
08/15 21:45               4140799271                0              887075745.000                0.0                48552866.0                      0.0
08/15 22:01               4140799271                0              830391270.000                1.0                45468887.0                      0.0
08/15 22:15               4140799271                0              890962782.000                0.0                49497369.0                      0.0
08/15 22:30               4140799271                0              890992388.000                0.0                48430711.0                      0.0





Investigated further and found that base tables of underline view had STALE statistics due to which optimizer was not able to choose better execution plan for that SQL statement.

We gathered table statistics for those underline tables and immediately after that Oracle Optimizer chosen better execution plan and whole CPU utilization on database sever dragged down to normal.


BEGIN_INTERVAL_TIME  PLAN_HASH_VALUE EXECUTIONS_DELTA MILLISECONDS_PER_EXECUTION ROWS_PER_EXECUTION BUFFER_GETS_PER_EXECUTION DISK_READS_PER_EXECUTION
-------------------- --------------- ---------------- -------------------------- ------------------ ------------------------- ------------------------
08/17 10:31                576393603                8                   1476.348              366.0                   82579.3                      0.0
08/17 10:45                576393603               16                   1530.348              366.0                   86192.0                      0.0




Sunday, August 14, 2016

CRS-4046: Invalid Oracle Clusterware configuration



Received below error while running root.sh script in Oracle 11g R2 (11.2.0.4) on node one in a two node RAC configuration. It was due to leftover processes from previous cluster install.


[root@RAC1 Clusterware]# /u01/app/11.2.0/grid/root.sh
Performing root user operation for Oracle 11g

The following environment variables are set as:
    ORACLE_OWNER= oracle
    ORACLE_HOME=  /u01/app/11.2.0/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The file "dbhome" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]:
The file "oraenv" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]:
The file "coraenv" already exists in /usr/local/bin.  Overwrite it? (y/n)
[n]:

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.

Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
User ignored Prerequisites during installation
Installing Trace File Analyzer
CRS-4046: Invalid Oracle Clusterware configuration.
CRS-4000: Command Create failed, or completed with errors.
Failure initializing entries in /etc/oracle/scls_scr/rac1
/u01/app/11.2.0/grid/perl/bin/perl -I/u01/app/11.2.0/grid/perl/lib -I/u01/app/11.2.0/grid/crs/install /u01/app/11.2.0/grid/crs/install/rootcrs.pl execution failed




Below illustrated steps resolved the issue.


Clean up the leftover cluster processes by executing below command on the node where root.sh script fails.


[root@RAC1 install]# ./rootcrs.pl -deconfig -force -verbose
Using configuration parameter file: ./crsconfig_params
****Unable to retrieve Oracle Clusterware home.
Start Oracle Clusterware stack and try again.
CRS-4046: Invalid Oracle Clusterware configuration.
CRS-4000: Command Stop failed, or completed with errors.
################################################################
# You must kill processes or reboot the system to properly #
# cleanup the processes started by Oracle clusterware          #
################################################################
Removing Trace File Analyzer
Failure in execution (rc=-1, 256, No such file or directory) for command /etc/init.d/ohasd deinstall
error: package cvuqdisk is not installed
Successfully deconfigured Oracle clusterware stack on this node




=>> Once de-configuration for Oracle clusterware completes successful reboot the server once de-config of clusterware is done as above and re-try executing root.sh again.



[root@RAC1 grid]# ./root.sh
Performing root user operation for Oracle 11g

The following environment variables are set as:
    ORACLE_OWNER= oracle
    ORACLE_HOME=  /u01/app/11.2.0/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]:
The contents of "dbhome" have not changed. No need to overwrite.
The contents of "oraenv" have not changed. No need to overwrite.
The contents of "coraenv" have not changed. No need to overwrite.

Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
User ignored Prerequisites during installation
Installing Trace File Analyzer
OLR initialization - successful
  root wallet
  root wallet cert
  root cert export
  peer wallet
  profile reader wallet
  pa wallet
  peer wallet keys
  pa wallet keys
  peer cert request
  pa cert request
  peer cert
  pa cert
.
.
.
.
Adding Clusterware entries to upstart
CRS-2672: Attempting to start 'ora.mdnsd' on 'rac1'
CRS-2676: Start of 'ora.mdnsd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'rac1'
CRS-2676: Start of 'ora.gpnpd' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac1'
CRS-2672: Attempting to start 'ora.gipcd' on 'rac1'
CRS-2676: Start of 'ora.gipcd' on 'rac1' succeeded
CRS-2676: Start of 'ora.cssdmonitor' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'rac1'
CRS-2672: Attempting to start 'ora.diskmon' on 'rac1'
CRS-2676: Start of 'ora.diskmon' on 'rac1' succeeded
CRS-2676: Start of 'ora.cssd' on 'rac1' succeeded

ASM created and started successfully.

Disk Group VOTE_DATA created successfully.

clscfg: -install mode specified
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
CRS-4256: Updating the profile
Successful addition of voting disk ae113dc9ff954f7bbf65bccce936df15.
Successful addition of voting disk 7d464ed047e64fb8bf349015e04dc69f.
Successful addition of voting disk 8724064160f74fe7bf8f85bf258d13e9.
Successfully replaced voting disk group with +VOTE_DATA.
CRS-4256: Updating the profile
CRS-4266: Voting file(s) successfully replaced
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   ae113dc9ff954f7bbf65bccce936df15 (/dev/oracleasm/disks/DISK1) [VOTE_DATA]
 2. ONLINE   7d464ed047e64fb8bf349015e04dc69f (/dev/oracleasm/disks/DISK2) [VOTE_DATA]
 3. ONLINE   8724064160f74fe7bf8f85bf258d13e9 (/dev/oracleasm/disks/DISK3) [VOTE_DATA]
Located 3 voting disk(s).

CRS-2672: Attempting to start 'ora.asm' on 'rac1'
CRS-2676: Start of 'ora.asm' on 'rac1' succeeded
CRS-2672: Attempting to start 'ora.VOTE_DATA.dg' on 'rac1'
CRS-2676: Start of 'ora.VOTE_DATA.dg' on 'rac1' succeeded
Preparing packages for installation...
cvuqdisk-1.0.9-1
Configure Oracle Grid Infrastructure for a Cluster ... succeeded



Hope it would help someone....!!