在数据库层面分析系统性能(原创)
系统级别信息
v$sysstat
按照OracleDocument中的描述,v$sysstat存储自数据库实例运行那刻起就开始累计全实例(instance-wide)的资源使用情况。
该视图存储下列的统计信息:
1>.事件发生次数的统计(如:user commits)
2>.数据产生,存取或者操作的total列(如:redo size)
3>.如果TIMED_STATISTICS值为true,则统计花费在执行操作上的总时间(如:CPU used by this session)
v$sysstat视图常用列介绍:
STATISTIC#: 标识
NAME: 统计项名称
VALUE: 资源使用量
该视图还有一列class-统计类别但极少会被使用,各类信息如下:
1 代表事例活动
2 代表Redo buffer活动
4 代表锁
8 代表数据缓冲活动
16 代表OS活动
32 代表并行活动
64 代表表访问
128 代表调试信息
注意:Statistic#的值在不同版本中各不相同,使用时要用Name做为查询条件而不要以statistic#的值做为条件。
V$SYSSTAT中包含多个统计项,这部分介绍了一些关键的v$sysstat统计项,在调优方面相当有用。下列按字母先后排序:
数据库使用状态的一些关键指标:
CPU used by this session:所有session的cpu占用量,不包括后台进程。这项统计的单位是百分之x秒.完全调用一次不超过10ms
db block changes:那部分造成SGA中数据块变化的insert,update或delete操作数 这项统计可以大概看出整体数据库状态。在各项事务级别,这项统计指出脏缓存比率。
execute count:执行的sql语句数量(包括递归sql)
logons current:当前连接到实例的Sessions。如果当前有两个快照则取平均值。
logons cumulative:自实例启动后的总登陆次数。
parse count (hard):在shared pool中解析调用的未命中次数。当sql语句执行并且该语句不在shared pool或虽然在shared pool但因为两者存在部分差异而不能被使用时产生硬解析。如果一条sql语句原文与当前存在的相同,但查询表不同则认为它们是两条不同语句,则硬解析即会发生。硬解析会带来cpu和资源使用的高昂开销,因为它需要oracle在shared pool中重新分配内存,然后再确定执行计划,最终语句才会被执行。
parse count (total):解析调用总数,包括软解析和硬解析。当session执行了一条sql语句,该语句已经存在于shared pool并且可以被使用则产生软解析。当语句被使用(即共享) 所有数据相关的现有sql语句(如最优化的执行计划)必须同样适用于当前的声明。这两项统计可被用于计算软解析命中率。
parse time cpu:总cpu解析时间(单位:10ms)。包括硬解析和软解析。
parse time elapsed:完成解析调用的总时间花费。
physical reads:OS blocks read数。包括插入到SGA缓存区的物理读以及PGA中的直读这项统计并非i/o请求数。
physical writes:从SGA缓存区被DBWR写到磁盘的数据块以及PGA进程直写的数据块数量。
redo log space requests:在redo logs中服务进程的等待空间,表示需要更长时间的log switch。
redo size:redo发生的总次数(以及因此写入log buffer),以byte为单位。这项统计显示出update活跃性。
session logical reads:逻辑读请求数。
sorts (memory) and sorts (disk):sorts(memory)是适于在SORT_AREA_SIZE(因此不需要在磁盘进行排序)的排序操作的数量。sorts(disk)则是由于排序所需空间太大,SORT_AREA_SIZE不能满足而不得不在磁盘进行排序操作的数量。这两项统计通常用于计算in-memory sort ratio。
sorts (rows): 列排序总数。这项统计可被'sorts (total)'统计项除尽以确定每次排序的列。该项可指出数据卷和应用特征。
table fetch by rowid:使用ROWID返回的总列数(由于索引访问或sql语句中使用了'where rowid=&rowid'而产生)
table scans (rows gotten):全表扫描中读取的总列数
table scans (blocks gotten):全表扫描中读取的总块数,不包括那些split的列。
user commits + user rollbacks:系统事务起用次数。当需要计算其它统计中每项事务比率时该项可以被做为除数。例如,计算事务中逻辑读,可以使用下列公式:session logical reads / (user commits + user rollbacks)。
注意:物理I/O,oracle报告物理读也许并未导致实际物理磁盘I/O操作。这完全有可能因为多数操作系统都有缓存文件,可能是那些块在被读取。块也可能存于磁盘或控制级缓存以再次避免实际I/O。Oracle报告有物理读也许仅仅表示被请求的块并不在缓存中。
v$sysstat中提供的系统信息是实例自启动以来的所有信息,并且没有想关的时间记录,在需要解决性能问题时,笔者一般将其中的信息作为baseline,如下列查询
SQL>select s.name,s.value/((sysdate-i.startup_time)*24*60*60) from v$instance i,v$sysstat s
where s.name='physical writes'
NAME S.VALUE/((SYSDATE-I.STARTUP_TIME)*24*60*60)
------------------------------ -------------------------------------------
physical writes .014322113
通过下列视图查询出所需信息,以判断性能瓶颈。
V$SYSMETRIC –记录了上15s和上1min钟的性能指标记录
SQL>Select VALUE , METRIC_UNIT,INTSIZE_CSEC
from v$sysmetric
where metric_name='Physical Reads Per Sec';
VALUE METRIC_UNIT INTSIZE_CSEC
---------- ----------------- ------------
654.6736 Reads Per Second 5959
134.9835 Reads Per Second 1515
V$SYSMETRIC_SUMMARY – 对上一小时的性能指标记录进行了统计,如 avg, max, min etc
SQL>select MAXVAL,MINVAL,AVERAGE,STANDARD_DEVIATION
from V$SYSMETRIC_SUMMARY
where metric_name='Physical Reads Per Sec';
MAXVAL MINVAL AVERAGE STANDARD_DEVIATION
---------- ---------- ---------- ------------------
3.71784232 0 .076930034 .478529283
V$SYSMETRIC_HISTORY – 记录了上一小时里每1分钟的指标信息,上3分钟里每15s的指标信息
Select metric_name,VALUE , METRIC_UNIT,INTSIZE_CSEC
from v$sysmetric_history
where metric_name='Physical Reads Per Sec'
and begin_time < sysdate - 5/(24*60)
and begin_time > sysdate - 15/(24*60);
DBA_HIST_SYSMETRIC_SUMMARY – hour summaries for last week.
系统级别的等待事件视图
V$SYSTEM_EVENT – 显示了系统的当前等待项,v$system_event则提供了自实例启动后各个等待事件的概括。常用于获取系统等待信息的历史影象。常用列如下
EVENT:等待事件名称
TOTAL_WAITS:此项事件总等待次数
TIME_WAITED:此项事件的总等待时间(单位:百分之一秒),由于该字段反应的是实例自启动以来等待事件的累加值,并不能真正反应性能瓶颈。
AVERAGE_WAIT:此项事件的平均等待用时(单位:百分之一秒)(time_waited/total_waits)
TOTAL_TIMEOUTS:此项事件的总等待超时次数
V$EVENTMETRIC – 上一分钟内等待事件的详细信息
DBA_HIST_SYSTEM_EVENT – 以一个snapshot为单位,包含了v$system_event的汇总信息
系统级别的等待事件类别视图
V$SYSTEM_WAIT_CLASS – cumulative since start up
V$WAITCLASSMETRIC – last 60 seconds deltas
V$WAITCLASSMETRIC_HISTORY – 60 seconds deltas for last hour
Remember DBA_HIST_WAITCLASSMETRIC_HISTORY is used for alerts and or baselines not everyday stats.
我们一般可以在出现IO延迟时使用上述视图。
The data in V$ACTIVE_SESSION_HISTORY (ASH) is probably better for seeing load because load is made up of both wait time and cpu time. Cpu information is not in the event/waitclass views but is in ASH along with the waits.
The second part, the latencies, specifically I/O latencies, are only available in the wait event and waitclass views (and the filestat views on a per file basis)
查看系统的平均延迟情况
SQL> select
n.wait_class,10*m.time_waited/nullif(m.wait_count,0) avgms
from v$waitclassmetric m,
v$system_wait_class n
where m.wait_class_id=n.wait_class_id order by avgms desc;
WAIT_CLASS AVGMS
-------------------- ----------
Other
Application
Configuration
Commit
User I/O
Network
Idle 3547.55358
Concurrency 22.827
System I/O .364
Another issue is that the documentation for 11gR2 says that the TIME_WAITED is microseconds but in my tests it’s actually centisecs
desc V$SYSTEM_WAIT_CLASS
Name Type
----------------------------------------- ----------------------------
WAIT_CLASS_ID NUMBER
WAIT_CLASS# NUMBER
WAIT_CLASS VARCHAR2(64)
TOTAL_WAITS NUMBER
TIME_WAITED NUMBER - centi-seconds
系统等待事件类别描述如下
Administrative
Waits resulting from DBA commands that cause users to wait (for example, an index
rebuild)
Application
Waits resulting from user application code (for example, lock waits caused by row
level locking or explicit lock commands)
Cluster
Waits related to Real Application Cluster resources (for example, global cache
resources such as 'gc cr block busy'
Commit
This wait class only comprises one wait event - wait for redo log write confirmation
after a commit (that is, 'log file sync')
Concurrency
Waits for internal database resources (for example, latches)
Configuration
Waits caused by inadequate configuration of database or instance resources (for
example, undersized log file sizes, shared pool size)
Idle
Waits that signify the session is inactive, waiting for work (for example, 'SQL*Net
message from client')
Network
Waits related to network messaging (for example, 'SQL*Net more data to dblink')
Other
Waits which should not typically occur on a system (for example, 'wait for EMON to
spawn')
Scheduler
Resource Manager related waits (for example, 'resmgr: become active')
System I/O
Waits for background process IO (for example, DBWR wait for 'db file parallel write')
User I/O
Waits for user IO (for example 'db file sequential read')
确认IO延迟情况
For specific I/O latencies there are two choices – v$eventmetric and v$system_event. With v$system_event it requires running multiple queries and taking the deltas but the deltas are are already calculated in v$eventmetric. Here is an example of getting I/O latencies from v$eventmetric
过去一分钟的延迟情况
col name for a25
select m.intsize_csec,
n.name ,
round(m.time_waited,3) time_waited,
m.wait_count,
round(10*m.time_waited/nullif(m.wait_count,0),3) avgms
from v$eventmetric m,
v$event_name n
where m.event_id=n.event_id
and n.name in (
'db file sequential read',
'db file scattered read',
'direct path read',
'direct path read temp',
'direct path write',
'direct path write temp',
'log file sync',
'log file parallel write'
);
INTSIZE_CSEC NAME TIME_WAITED WAIT_COUNT AVGMS
------------ ------------------------- ----------- ---------- ----------
6017 log file parallel write 2.538 4 6.345
6017 log file sync 2.329 1 23.287
6017 db file sequential read 0 0
6017 db file scattered read 0 0
6017 direct path read 0 0
6017 direct path read temp 0 0
6017 direct path write 0 0
6017 direct path write temp 0 0
以小时为单位查看延迟情况
select
e.event_name,btime,
round((time_ms_end-time_ms_beg)/nullif(count_end-count_beg,0),3) avg_ms
from (
select
to_char(s.BEGIN_INTERVAL_TIME,'DD-MON-YY HH24:MI') btime,
total_waits count_end,
time_waited_micro/1000 time_ms_end,
Lag (e.time_waited_micro/1000)
OVER( PARTITION BY e.event_name ORDER BY s.snap_id) time_ms_beg,
Lag (e.total_waits)
OVER( PARTITION BY e.event_name ORDER BY s.snap_id) count_beg
from
DBA_HIST_SYSTEM_EVENT e,
DBA_HIST_SNAPSHOT s
where
s.snap_id=e.snap_id
and e.event_name like '%&1%'
order by begin_interval_time
)
order by btime;
BTIME AVG_MS
--------------- ----------
20-JUL-11 06:00 5.854
20-JUL-11 07:00 4.116
20-JUL-11 08:00 21.158
20-JUL-11 09:02 5.591
20-JUL-11 10:00 4.116
20-JUL-11 11:00 6.248
20-JUL-11 12:00 23.634
20-JUL-11 13:00 22.529
20-JUL-11 14:00 21.62
20-JUL-11 15:00 18.038
20-JUL-11 16:00 23.127
but the big drawback with wait event and/or wait class views is that they lack information on CPU load. CPU load can be found in the system statistics but it’s just easier to do it all in one query using v$active_session_history. Here is a query using ASH to calculate AAS load on the database over the last 60 seconds:
查看过去一分钟系统的平均等待时间,包括CPU等待
select
round(count(*)/secs.var,3) AVGS,
decode(session_state,'ON CPU','CPU',wait_class) wait_class
from v$active_session_history ash,
(select 60 var from dual) secs
where
SAMPLE_TIME > sysdate - (secs.var/(24*60*60)) and
SESSION_TYPE = 'FOREGROUND'
group by decode(session_state,'ON CPU','CPU',wait_class) , secs.var
/
AVGS WAIT_CLASS
---------- ---------------
.016 Concurrency
.001 Network
0 Other
.083 Configuration
.001 Administrative
.034 CPU
0 System I/O
.001 Commit
.054 Application
0 User I/O
参考至:http://dboptimizer.com/2011/07/07/mining-awr-statistics-metrics-verses-statistics/
http://dboptimizer.com/2011/07/20/wait-event-and-wait-class-metrics-vs-vsystem_event/
http://junsansi.itpub.net/post/29894/291051
http://www.laoxiong.net/dbms_stats_and_analyze_and_global_statistics.html
http://blog.163.com/wghbeyond@126/blog/static/3516618120106190204350/
本文原创,转载请注明出处、作者
如有错误,欢迎指正
邮箱:[email protected]
已有 0 人发表留言,猛击->> 这里<<-参与讨论
ITeye推荐