生产上数据库大量的latch free 导致的CPU资源耗尽的问题的解决
中午的时候,我们生产上的某个数据库,cpu一直居高不下
通过如下的sql语句,我们查看当时数据库的等待,争用的情况:
select s.SID, s.SERIAL#, 'kill -9 ' || p.SPID, s.MACHINE, s.OSUSER, s.PROGRAM, s.USERNAME, s.last_call_et, a.SQL_ID, s.LOGON_TIME, a.SQL_TEXT, a.SQL_FULLTEXT, w.EVENT, a.DISK_READS, a.BUFFER_GETS from v$process p, v$session s, v$sqlarea a, v$session_wait w where p.ADDR = s.PADDR and s.SQL_ID = a.sql_id and s.sid = w.SID and s.STATUS = 'ACTIVE' order by s.last_call_et desc;
从event可以看到,是latch 的争用导致的原因
通过如果的sql,查看是什么样的latch
select * from v$session_wait where event like 'latch free';
P2就是 这个latch的name,通过v$latchname这个视图就可以知道哪个具体的latch
1:45:55 PM SQL> select * from v$latchname where latch#=164; LATCH# NAME HASH ---------- ---------------------------------------------------------------- ---------- 164 simulator hash latch 2233208730
查看latch的历史情况
2:11:59 PM SQL> select name,gets,misses,sleeps from v$latch where sleeps >0 order by sleeps desc; NAME GETS MISSES SLEEPS ---------------------------------------------------------------- ---------- ---------- ---------- simulator hash latch 4827860212 135426899 10890947 cache buffers chains 1619822817 2850976006 4747728 gc element 4660052091 25748270 175073 resmgr:schema config 91872524 153968 95708 ges resource hash list 174151449 1070556 55459 Real-time plan statistics latch 40953155 651496 44527 call allocation 3301878 265908 43501 row cache objects 336300485 4970324 19366
这个simulator hash latch已经是显著的latch部分
eagle在他的网站上有篇文章讲到了关于simulator这个
http://www.eygle.com/archives/2011/11/simulator_lru_latch.html
simulator意为模拟,也就是说当Oracle在内存中进行数据块处理时,实际上还会在预先分配的Buffer中进行相关信息记录,如DBA信息,当数据块被老化之后,下次读取时,如果请求的数据在Simulator内存中存在,则认为继续缓存该数据块是有意义的,通过监控并模拟统计这些操作,并对计算结果加权运算,就可以实现对于内存的调整建议。
在模拟过程中,也是通过Latch来实现的,相关的Latch就有 simulator lru latch 、 simulator hash latch等.
就Buffer Cache而言,如果系统中该类争用严重,则可以考虑关闭db_cache_advice,消除这部分内部操作对于性能的影响。
以下是一个相关BUG,在该Bug中,由于DB_CACHE_ADVICE的开启导致了严重的simulator lru latch的竞争:
Bug 5918642 Heavy latch contention with DB_CACHE_ADVICE on
This note gives a brief overview of bug 5918642.
The content was last updated on: 01-APR-2008
Click here for details of each of the sections below.
Affects:
Product ( Component) Oracle Server ( Rdbms) Range of versions believed to be affected Versions < 11.2 Versions confirmed as being affected Platforms affected Generic (all / most platforms affected) Fixed:
This issue is fixed in
Symptoms:
Related To:
- Performance Monitoring
- DB_CACHE_ADVICE
Description
High simulator lru latch contention can occur when db_cache_advice is set to ON if there is a large buffer cache. Workaround: Set db_cache_advice to OFF
当然,这个只是治标不治本的做法,这个是显现的表象的问题,根源的问题还是这个sql语句有问题
当一个数据块读入到sga中时,该块的块头(buffer header)会放置在一个hash bucket的链表(hash chain)中。该内存结构由一系列cache buffers chains子latch保护(又名hash latch或者cbc latch)。对Buffer cache中的块,要select或者update、insert,delete等,都得先获得cache buffers chains子latch,以保证对chain的排他访问。若在过程中发生争用,就会等待latch:cache buffers chains事件。
产生原因: 1. 低效率的SQL语句(主要体现在逻辑读过高) 在某些环境中,应用程序打开执行相同的低效率SQL语句的多个并发会话,这些SQL语句都设法得到相同的数据集,每次执行都带有高 BUFFER_GETS(逻辑读取)的SQL语句是主要的原因。相反,较小的逻辑读意味着较少的latch get操作,从而减少锁存器争用并改善性能。注意v$sql中BUFFER_GETS/EXECUTIONS大的语句。 2.Hot block 当多个会话重复访问一个或多个由同一个子cache buffers chains锁存器保护的块时,热块就会产生。当多个会话争用cache buffers chains子锁存器时,就会出现这个等待事件。有时就算调优了SQL,但多个会话同时执行此SQL,那怕只是扫描特定少数块,也是也会出现HOT BLOCK的。
SELECT P935.SEQUENCEID, null FA_SEQUENCEID, P935.ORDERID, P935.ORGORDERID, P935.PRODUCTNAME, P935.PRODUCTNUM, P935.ORDERTIME, P935.LASTUPDATETIME, P935.ORDERSTATUS, P935.MEMO, 935 orderCode, P935.PAYERACCTCODE, P935.PAYERACCTTYPE, P935.PAYEEACCTCODE PLATACCTCODE, P935.PAYEEACCTTYPE PLATACCTTYPE, P936.PAYEEACCTCODE, P936.PAYEEACCTTYPE, EXT935.PAYER_DISPLAYNAME, EXT935.PAYER_NAME, EXT935.PAYER_IDC, EXT935.PAYER_MEMBERTYPE, EXT936.PAYER_DISPLAYNAME PLAT_DISPLAYNAME, EXT936.SUBMITNAME PLAT_NAME, EXT936.PAYER_IDC PLAT_IDC, EXT936.PAYER_MEMBERTYPE PLAT_MEMBERTYPE, EXT936.PAYEE_DISPLAYNAME, EXT936.PAYEE_NAME, EXT936.PAYEE_IDC, EXT936.PAYEE_MEMBERTYPE, P935.PAYEEDISPLAYNAME WEBSITENAME, CASE WHEN (SELECT count(*) FROM PAYMENTORDER P936 WHERE P936.Ordercode = 936 and P936.Orderstatus = 0 AND <span style="color:#ff0000;">P936.Relatedsequenceid = P935.SEQUENCEID</span>) > 0 THEN 0 ELSE 1 END AS SHARINGRESULT, CASE D935.Dealcode WHEN 210 then 14 else D935.DEALTYPE end PAYMETHOD, D935.DEALAMOUNT, G935.EXT1, G935.Ext2, G935.PAYERCONTACTTYPE, G935.PAYERCONTACT, NVL(D935.PAYEEFEE, 0) PAYEEFEE, NVL(D935.PAYERFEE, 0) PAYERFEE, nvl(MS936.PAYEEFEE, 0) PLATFORMFEE, P935.VERSION FROM PAYMENTORDER P935, PAYMENTORDER P936, DEAL D935, GATEWAYORDER G935, MSGATEWAYSHARINGORDER MS936, PAYMENTORDEREXT EXT935, PAYMENTORDEREXT EXT936 WHERE P936.ORDERCODE = 936 AND P935.ORDERCODE = 935 AND P936.RELATEDSEQUENCEID = to_char(P935.SEQUENCEID) AND P935.SEQUENCEID = G935.SEQUENCEID(+) AND P935.SEQUENCEID = D935.ORDERSEQID(+) AND P935.SEQUENCEID = EXT935.ORDERSEQID(+) AND P936.SEQUENCEID = EXT936.ORDERSEQID(+) AND P936.SEQUENCEID = MS936.SEQUENCEID(+) AND MS936.SHARINGTYPE = 1 AND P935.SEQUENCEID = :1 UNION SELECT P938.SEQUENCEID, P935.SEQUENCEID FA_SEQUENCEID, P938.ORDERID, P938.ORGORDERID, P935.PRODUCTNAME, P935.PRODUCTNUM, P938.ORDERTIME, P938.LASTUPDATETIME, P938.ORDERSTATUS, P938.MEMO, 938 orderCode, P938.PAYERACCTCODE, P938.PAYERACCTTYPE, P938.PAYEEACCTCODE PLATACCTCODE, P938.PAYEEACCTTYPE PLATACCTTYPE, P938.PAYEEACCTCODE, P938.PAYEEACCTTYPE, EXT938.PAYER_DISPLAYNAME, EXT938.PAYER_NAME, EXT938.PAYER_IDC, EXT938.PAYER_MEMBERTYPE, EXT938.PAYEE_DISPLAYNAME PLAT_DISPLAYNAME, EXT938.SUBMITNAME PLAT_NAME, EXT938.PAYEE_IDC PLAT_IDC, EXT938.PAYEE_MEMBERTYPE PLAT_MEMBERTYPE, EXT938.PAYEE_DISPLAYNAME, EXT938.PAYEE_NAME, EXT938.PAYEE_IDC, EXT938.PAYEE_MEMBERTYPE, P935.PAYEEDISPLAYNAME WEBSITENAME, null SHARINGRESULT, D938.DEALTYPE PAYMETHOD, D938.DEALAMOUNT, G935.EXT1, G935.Ext2, G935.PAYERCONTACTTYPE, G935.PAYERCONTACT, NVL(D938.PAYEEFEE, 0) PAYEEFEE, NVL(D938.PAYERFEE, 0) PAYERFEE, 0 PLATFORMFEE, P935.VERSION FROM PAYMENTORDER P935, PAYMENTORDER P938, DEAL D938, GATEWAYORDER G935, PAYMENTORDEREXT EXT938 WHERE P935.ORDERCODE = 935 AND P938.ORDERCODE = 938 AND P938.RELATEDSEQUENCEID = to_char(P935.SEQUENCEID) AND P935.SEQUENCEID = G935.SEQUENCEID(+) AND P938.SEQUENCEID = D938.ORDERSEQID(+) AND P938.SEQUENCEID = EXT938.ORDERSEQID(+) AND P935.SEQUENCEID = :2
分析上面的sql,上面标红的地方,等号左边是varchar2的数据类型,括号右边是number的数据类型,会导致数据类型的隐式转换,造成极大的性能影响
联系研发,修改了sql语句,问题解决