[原]Linux HugePages 配置 与 Oracle 性能关系说明

标签: | 发表时间:2013-01-24 03:06 | 作者:tianlesoftware
出处:http://blog.csdn.net/tianlesoftware


 

一. HugePages 说明

 

1.1 HugePages 介绍

HugePages is afeature integrated into the Linux kernel with release 2.6. This featurebasically provides the alternative to the 4K page size (16Kfor IA64) providing bigger pages.

 

关于HugePages,有一些相关的专业术语,具体如下:

(1)  Page Table: A page table is thedata structure of a virtual memory system in an operating system to store themapping between virtual addresses and physical addresses. This means that on avirtual memory system, the memory is accessed by first accessing a page tableand then accessing the actual memory location implicitly.

--Page Table 是操作系统上的虚拟内存系统的数据结构,其用来存储虚拟内存地址和物理内存地址之间的映射关系。这就意味着在虚拟内存系统上,我们访问内存时,是先访问Page Table,然后根据Page Table 中的映射关系,隐式的转移到物理的内存位置。

 

(2)  TLB: A Translation LookasideBuffer (TLB) is a buffer (or cache) in a CPU that contains parts ofthe page table. This is a fixed size buffer being used to do virtual addresstranslation faster.

--TLB(Translation Lookaside Buffer) 是CPU 中的一块buffer 或者cache,其大小的固定的, TLB中包含了部分Page Table,用来快速进行虚拟地址的转换。

 

(3)  hugetlb: This is an entryin the TLB that points to a HugePage (a large/big page larger than regular 4Kand predefined in size). HugePages are implemented via hugetlb entries, i.e. wecan say that a HugePage is handled by a "hugetlb page entry". The'hugetlb" term is also (and mostly) used synonymously with a HugePage(See  Note261889.1). In this document the term "HugePage" is going to beused but keep in mind that mostly "hugetlb" refers to the sameconcept.

--hugetlb 是TLB中的一个entry,其指向HugePage(大于4k或预定义的一个large page)。 HugePage 通过hugetlb entries来实现,我们也可以说HugePage 是hugetlb page entry的一个句柄。 在MOS 文档: Note 261889.1中,二者是几乎是相同的概念。

 

(4)  hugetlbfs: This is a newin-memory filesystem like tmpfs and is presented by 2.6 kernel. Pages allocatedon hugetlbfs type filesystem are allocated in HugePages.

--hugetlbfs 是2.6内核中提出的一个新的in-memory filesystem,就像tmpfs一样。

 

1.2 常见的错误概念

WRONG: HugePages is a method to be able to use large SGA on 32-bit VLM systems

RIGHT: HugePages is a method to have larger pages where it is useful for working with very large memory. It is both useful in 32- and 64-bit configurations

WRONG: HugePages cannot be used without USE_INDIRECT_DATA_BUFFERS

RIGHT: HugePages can be used without indirect buffers. 64-bit systems does not need to use indirect buffers to have a large buffer cache for the RDBMS instance and HugePages can be used there too.

WRONG: hugetlbfs means hugetlb

RIGHT: hugetlbfs is a filesystem type **BUT** hugetlb is the mechanism employed in the back where hugetlb can be employed WITHOUT hugetlbfs

WRONG: hugetlbfs means hugepages

RIGHT: hugetlbfs is a filesystem type **BUT** HugePages is the mechanism employed in the back (synonymously with hugetlb) where HugePages can be employed WITHOUT hugetlbfs.

 

 

1.3 Regular Pages 与 HugePages 说明

 

When a singleprocess works with a piece of memory, the pages that the process uses arereference in a local page table for the specific process. The entries in thistable also contain references to the System-Wide Page Table which actually hasreferences to actual physical memory addresses. So theoretically a user modeprocess (i.e. Oracle processes), follows its local page table to access to thesystem page table and then can reference the actual physical table virtually. Asyou can see below, it is also possible (and very common to Oracle RDBMS due toSGA use) that two different O/S processes can point to the same entry in thesystem-wide page table.

    --当一个进程使用一块内存来工作时,进程使用的page 从local page table 中引用。 Local page table中的entries 又引用了System-Wide Page Table的page, 该page 指向了实际的物理内存地址。

所以,理论上,用户的进程(如oracle进程),根据local page table中的entry 指向了system page table中的entry,而System page table中的entry 指向了实际的物理内存。

    当然,也有可能,2个不同的O/S 进程指向了system-wide page table 中同一个entry,如下图所示,最常见的原因是Oracle SGA的使用。


 

 

 

When HugePagesare in the play, the usual page tables are employed. The very basic differenceis that the entries in both process page table and the system page table hasattributes about huge pages. So any page in a page table can be a huge page ora regular page. The following diagram illustrates 4096K hugepages but thediagram would be the same for any huge page size.

    --当配置了HugePage后,最基本的不同是 process page table 和 system page table中的entry 都包含了huge page的属性。所以page table 中的任一page 都可能是huge page 或者regular page。

 


1.4Some HugePages Facts/Features

(1)  HugePages can be allocated on-the-fly but they must be reservedduring system startup. Otherwise the allocation might fail as the memory isalready paged in 4K mostly.

(2)  HugePage sizes vary from 2MB to 256MB based onkernel version and HW architecture (See related sectionbelow.)

(3)  HugePages are not subject to reservation /  release after thesystem startup unless there is system administrator intervention, basicallychanging the hugepages configuration (i.e. number of pages available or poolsize)

 

1.5         Advantages of HugePages OverNormal Sharing Or AMM

(1)  Notswappable: 不需要内存页交换

HugePages are not swappable. Therefore there is nopage-in/page-out mechanism overhead.HugePages are universally regarded aspinned.

 

(2)Relief of TLB pressure: 减轻TLB的压力

1)Hugepge uses fewer pages to cover thephysical address space, so the size of “book keeping” (mapping from the virtualto the physical address) decreases, so it requiring fewer entries in the TLB

2)TLB entries will cover a larger part ofthe address space when use HugePages, there will be fewer TLB misses before theentire or most of the SGA is mapped in the SGA

3)Fewer TLB entries for the SGA also meansmore for other parts of the address space

 

(3)Decreased page table overhead: 降低pagetable 的消耗

Each page table entry can be as large as64 bytes and if we are trying to handle 50GB of RAM, the pagetable will beapproximately 800MB in size which is practically will not fit in 880MB sizelowmem (in 2.4 kernels - the page table is not necessarily in lowmem in 2.6kernels) considering the other uses of lowmem. When 95% of memory is accessedvia 256MB hugepages, this can work with a page table of approximately 40MB intotal.

       每个一个page table 的entry最大需要64 bytes的内存,如果我们管理50GB的内存,那么Pagetable 就需要约800MB的内存空间. 如果我们使用256MB的hugepage,同样对于50G的内存,我们只需要40MB的pagetable。

 

Dave 注释:

按普通模式,每个page 4k,那么需要的entries个数是:(50*1024*1024/4)

每个entry 是64bytes,所以总的内存大小就是:(50*1024*1024/4) * 64/1024/1024=800M

 

注意,这只是一个进程的page table,如果有10个进程,那么光处理这些page 就需要800*10,约8G的内存空间,而我们总共的内存也不过50G而已,所以大内存的情况下,需要HugePage就显的尤其重要。

 

HugePage 最大的大小从2M到256MB,按2MB算:

(50*1024/2)*64/1024/1024= 1.6M

 

10 进程也才16M而已。

 

(4)Eliminated page table lookup overhead: 降低page table 的lookup 次数

Since the pagesare not subject to replacement, page table lookups are not required.

 

(5)Faster overall memory performance: 提升内存的性能

On virtualmemory systems each memory operation is actually two abstract memoryoperations. Since there are fewer pages to work on, the possible bottleneck onpage table access is clearly avoided.

--virtual memory system 上的每一次内存操作实际上都需要2次内存的操作, hugepage减少了page数量从而避免了访问page table上的瓶颈。

 

1.6 HugePage 的大小

 

 单个HugePage的大小根据平台的不同而不同:

(1)  Kernel version/linux distribution

(2)  HW Platform

 

HugePage 的实际大小可以使用如下命令查看:

        $grep Hugepagesize /proc/meminfo

 

The table belowshows the sizes of HugePages on different configurations. Note that these aregeneral numbers taken from the most recent versions of the kernels. For aspecific kernel source package, you can check for the HPAGE_SIZE macro value(based on HPAGE_SHIFT) for a different (more recent) kernel source tree.

--下表显示了不同平台下HugePages的值:

 

HW Platform

Source Code Tree

Kernel 2.4

Kernel 2.6

Linux x86 (IA32)

i386

4 MB

4 MB *

Linux x86-64 (AMD64, EM64T)

x86_64

2 MB

2 MB

Linux Itanium (IA64)

ia64

256 MB

256 MB

IBM Power Based Linux (PPC64)

ppc64/powerpc

N/A **

16 MB

IBM zSeries Based Linux

s390

N/A

N/A

IBM S/390 Based Linux

s390

N/A

N/A


* Some older packaging for the 2.6.5 kernel on SLES8 (like 2.6.5-7.97) can have2 MB Hugepagesize.
** Oracle RDBMS is also not certified in this configuration. See  Document341507.1

 

1.7HugePages and Oracle 11g Automatic Memory Management (AMM)

The AMM and HugePages are not compatible.One needs to disable AMM on 11g to be able to use HugePages. See  Document749851.1 for further information.

    --Oracle 11g的AMM与HugePages不兼容。 需要注意。

 

1.8  没配置HugePages 的危险

 

在Linux OS下,如果对delicate进程没有配置合适的的HugePage,那么可能会遇到如下的问题:

(1)  HugePages not used (HugePages_Total = HugePages_Free) at all wastingthe amount configured for

(2)  Poor database performance 影响数据库性能

(3)  System running out of memory or excessive swapping 内存不足或者经常需要进行swap

(4) Some or any database instancecannot be started 某些数据库实例不能启动

(5)  Crucial system services failing(e.g.: CRS) 严重的系统故障

 

To avoid / helpwith such situations  Bug10153816 was filed to introduce a database initialization parameter in11.2.0.2 (use_large_pages) to help manage which SGAs will use huge pages andpotentially give warnings or not start up at all if they cannot get thosepages.

 

 

 

1.9 为什么需要配置HugePages

 

HugePages iscrucial for faster Oracle database performance on Linux if you have a large RAMand SGA. If your combined database SGAs is large (like more than 8GB, can evenbe important for smaller), you will need HugePages configured. Note that thesize of the SGA matters. Advantages of HugePages are:

   --如果使用了大内存和SGA,那么HugePage对提高数据库性能就非常重要。如果数据库SGA脚本,比如超过8G,就需要配置HugePages。配置HugePages 有如下好处:

 

(1)  Larger Page Size and Less # of Pages: Default page size is 4K whereas the HugeTLB size is 2048K. Thatmeans the system would need to handle 512 times less pages.

(2)  No Page Table Lookups:Since the HugePages are not subject to replacement (despite regular pages),page table lookups are not required.

(3)  Better Overall Memory Performance: On virtual memory systems (any modern OS) each memory operation isactually two abstract memory operations. With HugePages, since there are lessnumber of pages to work on, the possible bottleneck on page table access isclearly avoided.

(4)  No Swapping: We must avoidswapping to happen on Linux OS at all  Document1295478.1. HugePages are not swappable (whereas regular pages are).Therefore there is no page replacement mechanism overhead. HugePages areuniversally regarded as pinned.

(5)  No 'kswapd' Operations: kswapdwill get very busy if there is a very large area to be paged (i.e. 13 millionpage table entries for 50GB memory) and will use an incredible amount of CPUresource. When HugePages are used, kswapd is not involved in managing them. Seealso  Document361670.1

 

二. 配置HugePages

 

2.1 第一步: 设置memlock

在/etc/security/limits.conf文件中添加memlock的限制,注意该值略微小于实际物理内存的大小。 比如物理内存是64GB,可以设置为如下:

 

*   soft  memlock    60397977
*   hard   memlock    60397977

 

如果这里的值超过了SGA的需求,也没有不利的影响。

 

如果使用了Oracle Linux的oracle­-validated包,或者Exadata DB compute会自动配置这个参数。

 

2.2 第二步: 验证memlock

使用如下命令查看参数值:

$ ulimit -l
60397977

 

2.3 第三步:11g中禁用AMM

如果Oracle 是11g以后的版本,那么默认创建的实例会使用Automatic Memory Management (AMM)的特性,该特性与HugePage不兼容。

 

在设置HugePage之前需要先禁用AMM。设置初始化参数MEMORY_TARGET 和MEMORY_MAX_TARGET 为0即可。

 

使用AMM的情况下,所有的SGA 内存都是在/dev/shm 下分配的,因此在分配SGA时不会使用HugePage。这也是AMM 与HugePage不兼容的原因。

 

另外:默认情况下ASM instance 也是使用AMM的,但因为ASM 实例不需要大SGA,所以对ASM 实例使用HugePages意义不大。

 

如果我们要使用HugePage,那么就必须先确保没有设置MEMORY_TARGET/ MEMORY_MAX_TARGET参数。

 

2.4 第四步:计算vm.nr_hugepages的建议值

确保所有的数据库实例都已经启动,包括ASM 实例。使用 hugepages_settings.sh 脚本获取the vm.nr_hugepages 内核参数的建议值。

 

$ ./hugepages_settings.sh
...
Recommended setting: vm.nr_hugepages = 1496
$

 

也可以根据自己的经验来计算该值。

 

脚本如下:

#!/bin/bash
#
#hugepages_settings.sh
#
# Linux bash scriptto compute values for the
# recommendedHugePages/HugeTLB configuration
#
# Note: This scriptdoes calculation for all shared memory
# segmentsavailable when the script is run, no matter it
# is an OracleRDBMS shared memory segment or not.
#
# This script isprovided by Doc ID 401749.1 from My Oracle Support 
#http://support.oracle.com

# Welcome text
echo "
This script isprovided by Doc ID 401749.1 from My Oracle Support 
(http://support.oracle.com)where it is intended to compute values for 
the recommendedHugePages/HugeTLB configuration for the current shared 
memory segments.Before proceeding with the execution please note following:
 * For ASMinstance, it needs to configure ASMM instead of AMM.
 * The'pga_aggregate_target' is outside the SGA and 
   youshould accommodate this while calculating SGA size.
 * In case youchanges the DB SGA size, 
   as thenew SGA will not fit in the previous HugePages configuration, 
   it hadbetter disable the whole HugePages, 
   startthe DB with new SGA size and run the script again.
And make sure that:
 * OracleDatabase instance(s) are up and running
 * OracleDatabase 11g Automatic Memory Management (AMM) is not setup 
   (SeeDoc ID 749851.1)
 * The sharedmemory segments can be listed by command:
    # ipcs -m


Press Enter toproceed..."

read

# Check for thekernel version
KERN=`uname -r |awk -F. '{ printf("%d.%d\n",$1,$2); }'`

# Find out theHugePage size
HPG_SZ=`grepHugepagesize /proc/meminfo | awk '{print $2}'`
if [ -z"$HPG_SZ" ];then
    echo"The hugepages may not be supported in the system where the script isbeing executed."
   exit 1
fi

# Initialize thecounter
NUM_PG=0

# Cumulative numberof pages required to handle the running shared memory segments
for SEG_BYTES in`ipcs -m | cut -c44-300 | awk '{print $1}' | grep "[0-9][0-9]*"`
do
   MIN_PG=`echo "$SEG_BYTES/($HPG_SZ*1024)" | bc -q`
   if [ $MIN_PG -gt 0 ]; then
       NUM_PG=`echo "$NUM_PG+$MIN_PG+1" | bc -q`
   fi
done

RES_BYTES=`echo"$NUM_PG * $HPG_SZ * 1024" | bc -q`

# An SGA less than100MB does not make sense
# Bail out if thatis the case
if [ $RES_BYTES -lt100000000 ]; then
   echo "***********"
   echo "** ERROR **"
   echo "***********"
   echo "Sorry! There are not enough total of shared memory segmentsallocated for 
HugePagesconfiguration. HugePages can only be used for shared memory segments 
that you can listby command:

   # ipcs -m

of a size that canmatch an Oracle Database SGA. Please make sure that:
 * OracleDatabase instance is up and running 
 * OracleDatabase 11g Automatic Memory Management (AMM) is not configured"
   exit 1
fi

# Finish withresults
case $KERN in
   '2.4') HUGETLB_POOL=`echo "$NUM_PG*$HPG_SZ/1024" | bc -q`;
          echo "Recommended setting: vm.hugetlb_pool = $HUGETLB_POOL" ;;
   '2.6') echo "Recommended setting: vm.nr_hugepages = $NUM_PG" ;;
    *) echo "Unrecognized kernel version $KERN. Exiting." ;;
esac

# End

 

2.5 第五步: 在/etc/sysctl.conf文件中设置vm.nr_hugepages参数

...
vm.nr_hugepages = 1496
...

 

2.6 第六步:停止所有实例,并重启服务器

 

2.7 验证配置

 

在重启系统之后,确保所有的数据库实例都已经启动,使用如下命令检查HugePage的状态:

 

# grep HugePages /proc/meminfo
HugePages_Total:    1496
HugePages_Free:      485
HugePages_Rsvd:      446
HugePages_Surp:        0

 

为了确保HugePages配置的有效性,HugePages_Free值应该小于HugePages_Total 的值,并且应该等于HugePages_Rsvd的值。

Hugepages_Free 和HugePages_Rsvd 的值应该小于SGA 分配的gages。

 

2.8 故障处理

一些常见的问题如下:

Symptom

Possible Cause

Troubleshooting Action

System is running out of memory or swapping

Not enough HugePages to cover the SGA(s) and therefore the area reserved for HugePages are wasted where SGAs are allocated through regular pages.

Review your HugePages configuration to make sure that all SGA(s) are covered.

Databases fail to start

memlock limits are not set properly

Make sure the settings in limits.conf apply to database owner account.

One of the database fail to start while another is up

The SGA of the specific database could not find available HugePages and remaining RAM is not enough.

Make sure that the RAM and HugePages are enough to cover all your database SGAs

Cluster Ready Services (CRS) fail to start

HugePages configured too large (maybe larger than installed RAM)

Make sure the total SGA is less than the installed RAM and re-calculate HugePages.

HugePages_Total = HugePages_Free

HugePages are not used at all. No database instances are up or using AMM.

Disable AMM and make sure that the database instances are up.

Database started successfully and the performance is slow

The SGA of the specific database could not find available HugePages and therefore the SGA is handled by regular pages, which leads to slow performance

Make sure that the HugePages are many enough to cover all your database SGAs

 

 

 

2.9 MOS 相关文档

HugePages and Oracle Database 11g AutomaticMemory Management (AMM) on Linux [ID 749851.1]

Hugepages are Not used by Database BufferCache [ID 829850.1]

Oracle Not Utilizing Hugepages [ID803238.1]

/proc/meminfo Does Not Provide HugePagesInformation on Oracle Enterprise Linux (OEL5) [ID 860350.1]

HugePages Not Released On Oracle RDBMSInstance Shutdown with RHEL / EL 5 Update 1 (5.1) [ID 550443.1]

Shell Script to Calculate ValuesRecommended Linux HugePages / HugeTLB Configuration [ID 401749.1]

HugePages on Oracle Linux 64-bit [ID 361468.1]

HugePages on Linux: What It Is... and WhatIt Is Not... [ID 361323.1]

Document749851.1 HugePages and Oracle Database 11g Automatic Memory Management(AMM) on Linux

Document829850.1 Hugepages Are Not Used by Database Buffer Cache

Document803238.1 Oracle Not Utilizing Hugepages

Document728063.1 Setup HugePages in an Guest Does Not Work with Oracle VM 2.1or 2.1.1

Document550443.1 HugePages Not Released On Oracle RDBMS Instance Shutdown withRHEL / EL 5 Update 1 (5.1)

Document860350.1 /proc/meminfo Does Not Provide HugePages Information onOracle Enterprise Linux (OEL5)

 

 

 

 

 

---------------------------------------------------------------------------------------

版权所有,文章允许转载,但必须以链接方式注明源地址,否则追究法律责任!

Skype:    tianlesoftware

QQ:       [email protected]

Email:     [email protected]

Blog:     http://blog.csdn.net/tianlesoftware

Weibo:     http://weibo.com/tianlesoftware

Twitter:  http://twitter.com/tianlesoftware

Facebook: http://www.facebook.com/tianlesoftware

Linkedin: http://cn.linkedin.com/in/tianlesoftware


作者:tianlesoftware 发表于2013-1-24 3:06:24 原文链接
阅读:150 评论:0 查看评论

相关 [linux hugepages oracle] 推荐:

[原]Linux HugePages 配置 与 Oracle 性能关系说明

- - David Dai -- Focus on Oracle
1.1 HugePages 介绍. 关于HugePages,有一些相关的专业术语,具体如下:. (1)  Page Table: A page table is thedata structure of a virtual memory system in an operating system to store themapping between virtual addresses and physical addresses.

Linux Ksplice,MySQL and Oracle

- Syn - DBA Notes
Oracle 在 7 月份收购了 Ksplice. 使用了 Ksplice 的 Linux 系统,为 Kernel 打补丁无需重启动,做系统维护的朋友应该明白这是一个杀手级特性. 现在该产品已经合并到 Oracle Linux 中. 目前已经有超过 700 家客户,超过 10 万套系统使用了 Ksplice (不知道国内是否已经有用户了.

Linux下安装Oracle 11g

- - Oracle - 数据库 - ITeye博客
1、 下载安装VMware Workstation v9.0.2 虚拟机软件,下载rhel-server-6.0-x86_64-dvd.iso安装Red Hat Enterprise Linux 6 64-bit操作系统到VMware中,安装时候添加Linux用户Oracle. 2、 在Vmware虚拟机中设置光驱属性为使用ISO映像文件linux.x64_11gR2_database.iso,并连接到虚拟机,拷贝光驱中的Oracle11G安装文件夹database到虚拟机的/u02/目录下.

Linux 内核参数优化(for oracle)

- - CSDN博客数据库推荐文章
    Oracle 不同平台的数据库安装指导为我们部署Oracle提供了一些系统参数设置的建议值,然而建议值是在通用的情况下得出的结论,并非能完全满足不同的需求. 使用不同的操作系统内核参数将使得数据库性能相差甚远. 本文描述了linux下几个主要内核参数的设置,供参考.   共享内存是在系统内核分配的一块缓冲区,多个进程都可以访问该缓冲区.

Linux下的ORACLE安装,成功率99.99999%

- - BlogJava-首页技术区
相信很多童鞋都有过在Linux上安装Oracle数据库的痛苦经历,其中绝大多数都是环境设置的问题. 我给大家推荐一个国外大牛写的Oracle的安装脚本 . OTK,是用Bash写的,这东东大大简化了Oracle安装过程,而且成功率达到99.9999%以上,只要确保你的Linux系统是干净的,那么按照这个教程,保证你会安装成功.

Oracle 管理之 Linux 网络基础

- - CSDN博客数据库推荐文章
1、TCP/IP 网络配置文件. TCP/IP 网络配置文件. IP配置文件:/etc/sysconfig/network-scripts/ifcfg-eth0. 网管配置文件:/etc/sysconfig/network. 域名解析:/etc/host.conf. 主机配置:/etc/hosts.

linux 静默安装 oracle 11 - 简书

- -
linux 静默安装 oracle 11. 检查 swap分区、内存、磁盘大小. 下载 jdk-8u73-linux-x64.rpm. 使用 root 用户配置环境变量. 在/etc/hosts文件中添加主机名. 添加与主机名与IP对应记录,不然在安装数据库时会报错. 注:kernel.shmmax = 1073741824(byte)为本机物理内存的一半.

Linux中模拟诊断Oracle高CPU占用

- - CSDN博客推荐文章
1,在一个session中模拟CPU高使用率,如下:. 2,在shell窗口用top命令查看CPU使用情况:                                                                                                            .

Linux下 和 Windows 下 Oracle Instant Client 的安装.

- - Oracle - 数据库 - ITeye博客
Oracle数据库软件十分庞大,数据库引擎有好几个G. 通常情况下,我们的使用方式是安装一个Oracle数据库在服务器机器上,在客户端通过PL/SQL Developer、sqlplus等工具操作服务器上的数据,当然Oracle 10g后,OEM(企业管理器)已经是WEB版了,在浏览器中也可以同样操作数据库.

(总结)CentOS Linux下配置Oracle 11gR2为系统服务自动启动

- - 服务器运维与网站架构|Linux运维|互联网研究
PS:在Windows下安装完成Oracle 11gR2后,默认就开机自启动Oracle相关服务,但Linux下安装完后每次都得手动启动和关闭数据库(dbstart | dbshut)、监听器(lsnrctl)、控制台(emtcl). 如何把Oracle添加到Linux系统服务里开机自启动呢. 下面以CentOS 6.3为例详解,其他发行版一样通用.