After a weekend maintenance production activity(DB was bounced), EBS application started behaving abnormally, CPU utilisation was continuously being utilised 100%, whole production system was likely to be down for end users as they were not able to submit and process their requests.
I wondered around and get the AWR report for the problematic duration and found that TX – contention lock was there in the TOP 5 wait events and CBC latches as well. We cleared CBC latches but no help with that.
Investigated further and found a single SQL SELECT Statement (based on a view) was consuming TOP CPU order by CPU elapsed time and ordered by executions.
Same SQL was working fine usually but suddenly it started hogging whole Database server CPU. I found that Execution plan for that SQL statement was changed since DB server was bounced after maintenance activity.
Please use below SQL to find the SQL Plan History of a particular SQL ID.
------------------------------------------------------------------------------------
set pagesize 1000
set linesize 200
column begin_interval_time format a20
column milliseconds_per_execution format 999999990.999
column rows_per_execution format 999999990.9
column buffer_gets_per_execution format 999999990.9
column disk_reads_per_execution format 999999990.9
break on begin_interval_time skip 1
SELECT
to_char(s.begin_interval_time,'mm/dd hh24:mi')
AS begin_interval_time,
ss.plan_hash_value,
ss.executions_delta,
CASE
WHEN ss.executions_delta > 0
THEN ss.elapsed_time_delta/ss.executions_delta/1000
ELSE ss.elapsed_time_delta
END AS milliseconds_per_execution,
CASE
WHEN ss.executions_delta > 0
THEN ss.rows_processed_delta/ss.executions_delta
ELSE ss.rows_processed_delta
END AS rows_per_execution,
CASE
WHEN ss.executions_delta > 0
THEN ss.buffer_gets_delta/ss.executions_delta
ELSE ss.buffer_gets_delta
END AS buffer_gets_per_execution,
CASE
WHEN ss.executions_delta > 0
THEN ss.disk_reads_delta/ss.executions_delta
ELSE ss.disk_reads_delta
END AS disk_reads_per_execution
FROM wrh$_sqlstat ss
INNER JOIN wrm$_snapshot s ON s.snap_id = ss.snap_id
WHERE ss.sql_id = '&sql_id'
AND ss.buffer_gets_delta > 0
ORDER BY s.snap_id, ss.plan_hash_value;
Please use below SQL to find the SQL Plan History of a particular SQL ID.
------------------------------------------------------------------------------------
set pagesize 1000
set linesize 200
column begin_interval_time format a20
column milliseconds_per_execution format 999999990.999
column rows_per_execution format 999999990.9
column buffer_gets_per_execution format 999999990.9
column disk_reads_per_execution format 999999990.9
break on begin_interval_time skip 1
SELECT
to_char(s.begin_interval_time,'mm/dd hh24:mi')
AS begin_interval_time,
ss.plan_hash_value,
ss.executions_delta,
CASE
WHEN ss.executions_delta > 0
THEN ss.elapsed_time_delta/ss.executions_delta/1000
ELSE ss.elapsed_time_delta
END AS milliseconds_per_execution,
CASE
WHEN ss.executions_delta > 0
THEN ss.rows_processed_delta/ss.executions_delta
ELSE ss.rows_processed_delta
END AS rows_per_execution,
CASE
WHEN ss.executions_delta > 0
THEN ss.buffer_gets_delta/ss.executions_delta
ELSE ss.buffer_gets_delta
END AS buffer_gets_per_execution,
CASE
WHEN ss.executions_delta > 0
THEN ss.disk_reads_delta/ss.executions_delta
ELSE ss.disk_reads_delta
END AS disk_reads_per_execution
FROM wrh$_sqlstat ss
INNER JOIN wrm$_snapshot s ON s.snap_id = ss.snap_id
WHERE ss.sql_id = '&sql_id'
AND ss.buffer_gets_delta > 0
ORDER BY s.snap_id, ss.plan_hash_value;
BEGIN_INTERVAL_TIME PLAN_HASH_VALUE EXECUTIONS_DELTA MILLISECONDS_PER_EXECUTION ROWS_PER_EXECUTION BUFFER_GETS_PER_EXECUTION DISK_READS_PER_EXECUTION
-------------------- --------------- ---------------- -------------------------- ------------------ ------------------------- ------------------------
08/14 05:15 3139549555 6 758.744 345.0 47354.0 0.0
08/14 05:30 3139549555 6 762.433 345.0 47350.0 0.0
08/14 05:45 3139549555 10 763.121 345.0 47350.0 0.0
08/14 09:00 3139549555 4 3081.986 348.0 103319.3 2141.3
08/14 10:30 3139549555 2 1821.184 370.0 126928.5 2.5
08/14 11:15 3139549555 2 1926.320 372.0 135198.0 0.0
Below is the BAD SQL Plan which optimiser chooses after maintenance activity. You can see dramatic increase in time per execution for the SQL Statement.
----------------------------------------------------------------------------------------
BEGIN_INTERVAL_TIME PLAN_HASH_VALUE EXECUTIONS_DELTA MILLISECONDS_PER_EXECUTION ROWS_PER_EXECUTION BUFFER_GETS_PER_EXECUTION DISK_READS_PER_EXECUTION
-------------------- --------------- ---------------- -------------------------- ------------------ ------------------------- ------------------------
08/15 05:15 4140799271 1 215288.221 1.0 13049295.0 8088.0
08/15 21:00 4140799271 1 292584.136 0.0 16131485.0 8.0
08/15 21:15 4140799271 0 883156121.000 0.0 48242380.0 0.0
08/15 21:30 4140799271 0 892885844.000 0.0 48965395.0 0.0
08/15 21:45 4140799271 0 887075745.000 0.0 48552866.0 0.0
08/15 22:01 4140799271 0 830391270.000 1.0 45468887.0 0.0
08/15 22:15 4140799271 0 890962782.000 0.0 49497369.0 0.0
08/15 22:30 4140799271 0 890992388.000 0.0 48430711.0 0.0
We gathered table statistics for those underline tables and immediately after that Oracle Optimizer chosen better execution plan and whole CPU utilization on database sever dragged down to normal.
BEGIN_INTERVAL_TIME PLAN_HASH_VALUE EXECUTIONS_DELTA MILLISECONDS_PER_EXECUTION ROWS_PER_EXECUTION BUFFER_GETS_PER_EXECUTION DISK_READS_PER_EXECUTION
-------------------- --------------- ---------------- -------------------------- ------------------ ------------------------- ------------------------
08/17 10:31 576393603 8 1476.348 366.0 82579.3 0.0
08/17 10:45 576393603 16 1530.348 366.0 86192.0 0.0