Likwid-高性能服务器开发不可缺少的工具箱

标签: Linux 工具介绍 Likwid msr topology | 发表时间:2013-01-16 16:30 | 作者:Yu Feng
出处:http://blog.yufeng.info

原创文章,转载请注明: 转载自 非业余研究

本文链接地址: Likwid-高性能服务器开发不可缺少的工具箱

做高性能服务器的时候,知道如何开发高性能代码是一个事情,开发出来的系统是不是高性能那就是另外一个事情了。

通常我们需要了解系统的CPU拓扑结构,内存使用情况,各种CPU性能计数器的数字,各种CPU Cache的使用情况,命中率等等信息,这些信息有效的结合在一起才能准确的分析出我们程序的缺陷,从而找到更好的优化点。 通常这些信息是散落在系统的各个地方,对于普通的开发人员很难汇总起来,形成合力。

好了,以精细出名的德国人又来帮忙了,隆重推出Likwid。

Likwid

Likwid项目的地址在 这里。 根据主页的上的描述:

Likwid stands for Like I knew what I am doing. This project contributes easy to use command line tools for Linux to support programmers in developing high performance multi threaded programs.

It contains the following tools:

likwid-topology: Show the thread and cache topology
likwid-perfctr: Measure hardware performance counters on Intel and AMD processors
likwid-features: Show and Toggle hardware prefetch control bits on Intel Core 2 processors
likwid-pin: Pin your threaded application without touching your code (supports pthreads, Intel OpenMP and gcc OpenMP)
likwid-bench: Benchmarking framework allowing rapid prototyping of threaded assembly kernels
likwid-mpirun: Script enabling simple and flexible pinning of MPI and MPI/threaded hybrid applications
likwid-perfscope: Frontend for likwid-perfctr timeline mode. Allows live plotting of performance metrics.
likwid-powermeter: Tool for accessing RAPL counters and query Turbo mode steps on Intel processor.
likwid-memsweeper: Tool to cleanup ccNUMA memory domains.
Likwid stands out because:

No kernel patching, any vanilla linux 2.6 or newer kernel works
Transparent, always clear which events are chosen, event tags have the same naming as in documentation
Lightweight, LIKWID tries to add no overhead and keeps out of your way.
Easy to use, simple to build, no need to touch your code, configurable from outside. Clear CLI interface.
Multiplatform, likwid supports Intel and AMD processors
Up to date, likwid tries to fully support new processors as soon as possible
Extensible, you can add functionality by means of simple text files

同时他的文档还是做的非常不错的,使用的介绍在 这里

具体的使用我就不墨迹了,文档里面都有。我在这里秀下他的功能:

[[email protected] likwid-3.0]$ sudo ./likwid-topology 
-------------------------------------------------------------
CPU type:       Intel Core Westmere processor 
*************************************************************
Hardware Thread Topology
*************************************************************
Sockets:        2 
Cores per socket:       4 
Threads per core:       2 
-------------------------------------------------------------
HWThread        Thread          Core            Socket
0               0               0               1
1               0               1               1
2               0               9               1
3               0               10              1
4               0               0               0
5               0               1               0
6               0               9               0
7               0               10              0
8               1               0               1
9               1               1               1
10              1               9               1
11              1               10              1
12              1               0               0
13              1               1               0
14              1               9               0
15              1               10              0
-------------------------------------------------------------
Socket 0: ( 4 12 5 13 6 14 7 15 )
Socket 1: ( 0 8 1 9 2 10 3 11 )
-------------------------------------------------------------

*************************************************************
Cache Topology
*************************************************************
Level:  1
Size:   32 kB
Cache groups:   ( 4 12 ) ( 5 13 ) ( 6 14 ) ( 7 15 ) ( 0 8 ) ( 1 9 ) ( 2 10 ) ( 3 11 )
-------------------------------------------------------------
Level:  2
Size:   256 kB
Cache groups:   ( 4 12 ) ( 5 13 ) ( 6 14 ) ( 7 15 ) ( 0 8 ) ( 1 9 ) ( 2 10 ) ( 3 11 )
-------------------------------------------------------------
Level:  3
Size:   12 MB
Cache groups:   ( 4 12 5 13 6 14 7 15 ) ( 0 8 1 9 2 10 3 11 )
-------------------------------------------------------------

*************************************************************
NUMA Topology
*************************************************************
NUMA domains: 2 
-------------------------------------------------------------
Domain 0:
Processors:  4 5 6 7 12 13 14 15
Relative distance to nodes:  10 20
Memory: 16222.4 MB free of total 24567.1 MB
-------------------------------------------------------------
Domain 1:
Processors:  0 1 2 3 8 9 10 11
Relative distance to nodes:  20 10
Memory: 5424.19 MB free of total 24576 MB
-------------------------------------------------------------



$ sudo ./likwid-perfctr  -C 0-3 -g MEM sleep 10
-------------------------------------------------------------
-------------------------------------------------------------
CPU type:       Intel Core Westmere processor 
CPU clock:      2.13 GHz 
Measuring group MEM
-------------------------------------------------------------
sleep 10
Status: 0x400000000 
Status: 0x0 
Status: 0x0 
Status: 0x0 
+--------------------------------+-------------+-------------+-------------+-------------+
|             Event              |   core 0    |   core 1    |   core 2    |   core 3    |
+--------------------------------+-------------+-------------+-------------+-------------+
|       INSTR_RETIRED_ANY        | 1.15794e+08 | 3.30559e+08 | 9.21383e+07 | 6.13907e+07 |
|     CPU_CLK_UNHALTED_CORE      | 2.16557e+08 | 5.36794e+08 | 1.60588e+08 | 1.07672e+08 |
|      CPU_CLK_UNHALTED_REF      | 2.1624e+08  | 5.15724e+08 | 1.55415e+08 | 1.0452e+08  |
|    UNC_QMC_NORMAL_READS_ANY    | 1.42469e+07 |      0      |      0      |      0      |
|    UNC_QMC_WRITES_FULL_ANY     | 3.3378e+06  |      0      |      0      |      0      |
| UNC_QHL_REQUESTS_REMOTE_READS  | 5.95875e+06 |      0      |      0      |      0      |
|  UNC_QHL_REQUESTS_LOCAL_READS  | 9.16778e+06 |      0      |      0      |      0      |
| UNC_QHL_REQUESTS_REMOTE_WRITES |   163766    |      0      |      0      |      0      |
+--------------------------------+-------------+-------------+-------------+-------------+
+-------------------------------------+-------------+-------------+-------------+-------------+
|                Event                |     Sum     |     Max     |     Min     |     Avg     |
+-------------------------------------+-------------+-------------+-------------+-------------+
|       INSTR_RETIRED_ANY STAT        | 5.99881e+08 | 3.30559e+08 | 6.13907e+07 | 1.4997e+08  |
|     CPU_CLK_UNHALTED_CORE STAT      | 1.02161e+09 | 5.36794e+08 | 1.07672e+08 | 2.55403e+08 |
|      CPU_CLK_UNHALTED_REF STAT      | 9.91899e+08 | 5.15724e+08 | 1.0452e+08  | 2.47975e+08 |
|    UNC_QMC_NORMAL_READS_ANY STAT    | 1.42469e+07 | 1.42469e+07 |      0      | 3.56173e+06 |
|    UNC_QMC_WRITES_FULL_ANY STAT     | 3.3378e+06  | 3.3378e+06  |      0      |   834449    |
| UNC_QHL_REQUESTS_REMOTE_READS STAT  | 5.95875e+06 | 5.95875e+06 |      0      | 1.48969e+06 |
|  UNC_QHL_REQUESTS_LOCAL_READS STAT  | 9.16778e+06 | 9.16778e+06 |      0      | 2.29194e+06 |
| UNC_QHL_REQUESTS_REMOTE_WRITES STAT |   163766    |   163766    |      0      |   40941.5   |
+-------------------------------------+-------------+-------------+-------------+-------------+
+-----------------------------+----------+----------+-----------+-----------+
|           Metric            |  core 0  |  core 1  |  core 2   |  core 3   |
+-----------------------------+----------+----------+-----------+-----------+
|     Runtime (RDTSC) [s]     | 10.0024  | 10.0024  |  10.0024  |  10.0024  |
|    Runtime unhalted [s]     | 0.101511 | 0.251623 | 0.0752758 | 0.0504714 |
|         Clock [MHz]         | 2136.45  | 2220.49  |  2204.33  |  2197.66  |
|             CPI             |  1.8702  |  1.6239  |  1.7429   |  1.75388  |
| Memory bandwidth [MBytes/s] | 112.515  |    0     |     0     |     0     |
| Memory data volume [GBytes] | 1.12542  |    0     |     0     |     0     |
|  Remote Read BW [MBytes/s]  | 38.1267  |    0     |     0     |     0     |
| Remote Write BW [MBytes/s]  | 1.04785  |    0     |     0     |     0     |
|    Remote BW [MBytes/s]     | 39.1746  |    0     |     0     |     0     |
+-----------------------------+----------+----------+-----------+-----------+
+----------------------------------+----------+----------+-----------+----------+
|              Metric              |   Sum    |   Max    |    Min    |   Avg    |
+----------------------------------+----------+----------+-----------+----------+
|     Runtime (RDTSC) [s] STAT     | 40.0097  | 10.0024  |  10.0024  | 10.0024  |
|    Runtime unhalted [s] STAT     | 0.478882 | 0.251623 | 0.0504714 | 0.11972  |
|         Clock [MHz] STAT         | 8758.93  | 2220.49  |  2136.45  | 2189.73  |
|             CPI STAT             | 1.70302  |  1.8702  |  1.6239   | 0.425755 |
| Memory bandwidth [MBytes/s] STAT | 112.515  | 112.515  |     0     | 28.1287  |
| Memory data volume [GBytes] STAT | 1.12542  | 1.12542  |     0     | 0.281355 |
|  Remote Read BW [MBytes/s] STAT  | 38.1267  | 38.1267  |     0     | 9.53168  |
| Remote Write BW [MBytes/s] STAT  | 1.04785  | 1.04785  |     0     | 0.261962 |
|    Remote BW [MBytes/s] STAT     | 39.1746  | 39.1746  |     0     | 9.79365  |
+----------------------------------+----------+----------+-----------+----------+

各种信息就在你指尖。

祝玩的开心!

Post Footer automatically generated by wp-posturl plugin for wordpress.

相关 [likwid 性能 服务器] 推荐:

Likwid-高性能服务器开发不可缺少的工具箱

- - 非业余研究
原创文章,转载请注明: 转载自 非业余研究. Likwid-高性能服务器开发不可缺少的工具箱. 做高性能服务器的时候,知道如何开发高性能代码是一个事情,开发出来的系统是不是高性能那就是另外一个事情了. 通常我们需要了解系统的CPU拓扑结构,内存使用情况,各种CPU性能计数器的数字,各种CPU Cache的使用情况,命中率等等信息,这些信息有效的结合在一起才能准确的分析出我们程序的缺陷,从而找到更好的优化点.

高性能服务器架构

- 临峰 - 博客园-首页原创精华区
    任何一行都有自己的军规, 我想这篇著名的文章就是游戏服务器程序员的军规. 也许你认为游戏服务器程序员日常并不涉及这样底层的实现, 而只是去完成策划提出的需求, 我觉得也有道理, 毕竟这些是我们的工作, 下面的译文就不太适合你. 但是对于想改进现有系统, 在服务器方面给予更好的技术支持, 那么你在开始工作之前必须了解一些禁忌, 并且给出了一些解决方向上的真知灼见.

Linux服务器性能评估

- peigen - 唐福林-博客雨
一、影响Linux服务器性能的因素. 影响性能因素 评判标准 好 坏 糟糕 CPU user% + sys%< 70% user% + sys%= 85% user% + sys% >=90% 内存 Swap In(si)=0 Swap Out(so)=0. Per CPU with 10 page/s More Swap In & Swap Out 磁盘 iowait % < 20% iowait % =35% iowait % >= 50%.

Tomcat 生产服务器性能优化

- - ITeye博客
增加JVM堆(heap). 线程池(thread pool)的设置. Tomcat原生库(native library). 第一步  – 提高JVM栈内存Increase JVM heap memory. 要更改文件(catalina.sh) 位于"\tomcat server folder\bin\catalina.sh",下面,给出这个文件的配置信息,.

跟踪OpenLDAP服务器性能

- - CSDN博客系统运维推荐文章
原文: http://prefetch.net/articles/monitoringldap.html. LDAP已经成为互联网标准的目录访问协议,并且用于访问一切从DNS区域文件到用户帐户信息. 随着企业和软件供应商更多地依赖于LDAP目录服务器,需要测量服务器的吞吐量和性能变得势在必行. 本文将介绍可用于监视LDAP目录服务器的运行状况和性能优化的工具,并且将解释随着时间的推移ORCA如何越来越多地应用到目录服务器的性能监测中.

Web服务器的性能估计

- - 鸟窝
在给客户做方案的时候,或者在软件设计的时候,或者在软件测试的时候,我们经常会估算我们的web应用程序的性能. 如果估算误差太大,你给客户的方案是10台服务器,实际部署时确需要20台机器,客户绝对要疯了. 同样对我们的代码设计和测试方案影响重大. 那么在带宽和内存都很充足的情况下我们如何大致估算出一个web应用程序的性能呢.

高性能服务器架构思路

- - ITeye资讯频道
本文来自: http://wetest.qq.com/. 在服务器端程序开发领域,性能问题一直是备受关注的重点. 业界有大量的框架、组件、类库都是以性能为卖点而广为人知. 然而,服务器端程序在性能问题上应该有何种基本思路,这个却很少被这些项目的文档提及. 本文正式希望介绍服务器端解决性能问题的基本策略和经典实践,并分为几个部分来说明:.

服务器性能指标:拨开服务器评测体系迷雾

- - inJava
用户总希望有一种简单、高效的度量标准,来量化评价服务器系统,以便作为选型的依据. 但实际上,服务器的系统性能很难用一两种指标来衡量. 包括 TPC、SPEC、SAP SD、Linpack和HPCC在内的众多服务器评测体系,从处理器性能、服务器系统性能、商业应用性能直到高性能计算机的性能,都给出了一个量化的评价指标.