perf学习-linux自带性能分析工具
目前在做性能分析的事情,之前没怎么接触perf,找了几篇文章梳理了一下,按照问题的形式记录在这里。
方便自己查看。
什么是perf?
linux性能调优工具,32内核以上自带的工具,软件性能分析。在2.6.31及后续版本的Linux内核里,安装perf非常的容易。
几乎能够处理所有与性能相关的事件。
什么是性能事件?
指在处理器或者操作系统中发生,可能影响到程序性能的硬件事件或者软件事情。
主要关注点在哪里?
算法优化(空间复杂度、时间复杂度)、代码优化(提到执行速度、减少内存占用)
评估程序对硬件资源的使用情况,例如各级cache的访问次数,各级cache的丢失次数、流水线停顿周期、前端总线访问次数等。
评估程序对操作系统资源的使用情况,系统调用次数、上下文切换次数、任务迁移次数。
基本原理?
硬件的话采用PMC(performance monitoring unit)CPU的部件,在特定的条件下探测的性能事件是否发生以及发生的次数。
软件性能测试,内置于kernel,分布在各个功能模块中,统计和操作系统相关性能事件。
如何使用高精度的采样?
如果需要采用高精度的采样,需要在制定性能事情时,在事件后添加后缀“:p”或者“:pp”
| 1 2 3 4 | 0:无精度保证1:采样指令好触发性能时间的指令偏差为常数(:p)2:尽量保证偏差为0(:pp)3:保证偏差必须为0(:ppp) | 
有哪些常用的命令?
1、perf list 列出所有能够触发perf采样点的事件(当前硬件环境支持的性能事件)
总体分为三类hardware(硬件产生)、software(内核软件产生)、tradepoint(内核中静态tracepoint触发事件)。
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | List of pre-defined events (to be used in -e):  cpu-cycles OR cycles                               [Hardware event]处理器周期事件  stalled-cycles-frontend OR idle-cycles-frontend    [Hardware event]  stalled-cycles-backend OR idle-cycles-backend      [Hardware event]  instructions                                       [Hardware event]  cache-references                                   [Hardware event]  cache-misses                                       [Hardware event]  branch-instructions OR branches                    [Hardware event]  branch-misses                                      [Hardware event]  bus-cycles                                         [Hardware event]  cpu-clock                                          [Software event]  task-clock                                         [Software event]  page-faults OR faults                              [Software event]  minor-faults                                       [Software event]  major-faults                                       [Software event]  context-switches OR cs                             [Software event]  cpu-migrations OR migrations                       [Software event]  alignment-faults                                   [Software event]  emulation-faults                                   [Software event]  L1-dcache-loads                                    [Hardware cache event]  L1-dcache-load-misses                              [Hardware cache event]  L1-dcache-stores                                   [Hardware cache event]  L1-dcache-store-misses                             [Hardware cache event]  L1-dcache-prefetches                               [Hardware cache event]  L1-dcache-prefetch-misses                          [Hardware cache event]  L1-icache-loads                                    [Hardware cache event]  L1-icache-load-misses                              [Hardware cache event]  L1-icache-prefetches                               [Hardware cache event]  L1-icache-prefetch-misses                          [Hardware cache event]  LLC-loads                                          [Hardware cache event]  LLC-load-misses                                    [Hardware cache event]  LLC-stores                                         [Hardware cache event]  LLC-store-misses                                   [Hardware cache event]  LLC-prefetches                                     [Hardware cache event]  LLC-prefetch-misses                                [Hardware cache event]  dTLB-loads                                         [Hardware cache event]  dTLB-load-misses                                   [Hardware cache event]  dTLB-stores                                        [Hardware cache event]  dTLB-store-misses                                  [Hardware cache event]  dTLB-prefetches                                    [Hardware cache event]  dTLB-prefetch-misses                               [Hardware cache event]  iTLB-loads                                         [Hardware cache event]  iTLB-load-misses                                   [Hardware cache event]  branch-loads                                       [Hardware cache event]  branch-load-misses                                 [Hardware cache event] | 
2、perf stat分析程序的整体性能
利用10个典型事件剖析了应用程序。
- 
    task-clock:目标任务真真占用处理器的时间,单位是毫秒,我们称之为任务执行时间, 后面是任务的处理器占用率(执行时间和持续时间的比值) 持续时间值从任务提交到任务结束的总时间(总时间在stat结束之后会打印出来)。 
- 
    context-switches:上下文切换次数,前半部分是切换次数,后面是平均每秒发生次数(M是10的6次方)。 
- 
    cpu-migrations:处理器迁移,linux为了位置各个处理器的负载均衡, 会在特定的条件下将某个任务从一个处理器迁往另外一个处理器,此时便是发生了一次处理器迁移。 
- 
    page-fault:缺页异常,linux内存管理子系统采用了分页机制, 当应用程序请求的页面尚未建立、请求的页面不在内存中或者请求的页面虽在在内存中, 但是尚未建立物理地址和虚拟地址的映射关系是,会触发一次缺页异常。 
- 
    cycles:任务消耗的处理器周期数 
- 
    instructions:任务执行期间产生的处理器指令数,IPC(instructions perf cycle) IPC是评价处理器与应用程序性能的重要指标。(很多指令需要多个处理周期才能执行完毕), IPC越大越好,说明程序充分利用了处理器的特征。 
- 
    branches:程序在执行期间遇到的分支指令数。 
- 
    branch-misses:预测错误的分支指令数 
- 
    cache-misses:cache时效的次数 
- 
    cache-references:cache的命中次数 
常用的参数如下
| 1 2 3 4 5 | -e,指定性能事件-p,指定分析进程的PID-t,指定待分析线程的TID-r N,连续分析N次-d,全面性能分析,采用更多的性能事件 | 
一次分析后的结果如下:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | Performance counter stats forprocess id '21787':     42677.253367task-clock                #    0.142CPUs utilized                   587,906context-switches          #    0.014M/sec                            29,209CPU-migrations            #    0.001M/sec                               117page-faults               #    0.000M/sec                    82,341,400,508cycles                    #    1.929GHz                     [83.48%]   61,262,984,952stalled-cycles-frontend   #   74.40% frontend cycles idle    [83.28%]   43,113,701,768stalled-cycles-backend    #   52.36% backend  cycles idle    [66.72%]   44,023,301,495instructions              #    0.53insns per cycle                                                   #    1.39stalled cycles per insn [83.50%]    8,137,448,528branches                  #  190.674M/sec                   [83.22%]      430,957,756branch-misses             #    5.30% of all branches         [83.34%]    300.393753095seconds time elapsed | 
3、perf top实时显示系统/进程的性能统计信息
默认性能事件“cycles CPU周期数”进行全系统的性能剖析
常见的参数如下:
| 1 2 3 4 | -p:指定进程PID-t:指定线程的TID-a:分析整个系统的性能(默认)-d:界面刷新周期,默认是2秒 | 
结果输出中,比例是该符号引发的性能时间在整个监测域中占的比例,通常称为热度。
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | samples  pcnt function                                                                               DSO_______ _____ ______________________________________________________________________________________ _________  61.0019.4% native_write_msr_safe                                                                  [kernel]  18.005.7% JVM_InternString                                                                       libjvm.so  17.005.4% find_busiest_group                                                                     [kernel]  17.005.4% _spin_lock                                                                             [kernel]  12.003.8% dev_hard_start_xmit                                                                    [kernel]  11.003.5% tg_load_down                                                                           [kernel]   9.002.9% futex_wake                                                                             [kernel]   8.002.5% do_futex                                                                               [kernel]   7.002.2% load_balance_fair                                                                      [kernel]   7.002.2% weighted_cpuload                                                                       [kernel]   7.002.2% update_cfs_shares                                                                      [kernel]   7.002.2% JVM_LatestUserDefinedLoader                                                            libjvm.so   6.001.9% update_cfs_load                                                                        [kernel]   5.001.6% _ZN16SystemDictionary30resolve_instance_class_or_nullE12symbolHandle6HandleS1_P6Thread libjvm.so   5.001.6% br_sysfs_delbr                                                                         [bridge]   5.001.6% futex_wait                                                                             [kernel] | 
4、perf record/report记录一段时间内系统/进程的性能事件
默认在当前目录下生成数据文件:perf.data
report读取生成的perf.data文件,-i参数指定路径
了解perf,是性能分析的开始。
已有 0 人发表留言,猛击->> 这里<<-参与讨论
ITeye推荐