redis进程OOM被linux内核kill问题调查 - 简书
【发现问题】
运维人员收到zabbix告警说codis集群usa-9节点所在机器,原swap 4G 空间只剩下80k。其立即登录该机器增加了约6G的swap空间。
Lack of free swap space on USARN-H-Host-Linux-172.24.19.59: PROBLEM (Value: 80 KB) 2019.11.13 14:47:34
接着收到某个应用的500错误告警,错误堆栈里提到codis该usa-9节点 “JedisConnectionException: Unexpected end of stream”,再次登录usa-9拿到 linux的系统日志如下:
Nov 13 14:56:19 vm-centos6 kernel: codis-server invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
Nov 13 14:56:19 vm-centos6 kernel: codis-server cpuset=/ mems_allowed=0
Nov 13 14:56:19 vm-centos6 kernel: Pid: 4492, comm: codis-server Not tainted 2.6.32-504.el6.x86_64 #1
Nov 13 14:56:19 vm-centos6 kernel: Call Trace:
Nov 13 14:56:19 vm-centos6 kernel: [<ffffffff810d40c1>] ? cpuset_print_task_mems_allowed+0x91/0xb0
Nov 13 14:56:19 vm-centos6 kernel: [<ffffffff81127300>] ? dump_header+0x90/0x1b0
Nov 13 14:56:19 vm-centos6 kernel: [<ffffffff8122ea2c>] ? security_real_capable_noaudit+0x3c/0x70
Nov 13 14:56:19 vm-centos6 kernel: [<ffffffff81127782>] ? oom_kill_process+0x82/0x2a0
Nov 13 14:56:19 vm-centos6 kernel: [<ffffffff811276c1>] ? select_bad_process+0xe1/0x120
Nov 13 14:56:19 vm-centos6 kernel: [<ffffffff81127bc0>] ? out_of_memory+0x220/0x3c0
Nov 13 14:56:19 vm-centos6 kernel: [<ffffffff811344df>] ? __alloc_pages_nodemask+0x89f/0x8d0
Nov 13 14:56:19 vm-centos6 kernel: [<ffffffff8116c69a>] ? alloc_pages_current+0xaa/0x110
Nov 13 14:56:19 vm-centos6 kernel: [<ffffffff811246f7>] ? __page_cache_alloc+0x87/0x90
Nov 13 14:56:19 vm-centos6 kernel: [<ffffffff811240de>] ? find_get_page+0x1e/0xa0
Nov 13 14:56:19 vm-centos6 kernel: [<ffffffff81125697>] ? filemap_fault+0x1a7/0x500
Nov 13 14:56:19 vm-centos6 kernel: [<ffffffff8114eae4>] ? __do_fault+0x54/0x530
Nov 13 14:56:19 vm-centos6 kernel: [<ffffffff8114f0b7>] ? handle_pte_fault+0xf7/0xb00
Nov 13 14:56:19 vm-centos6 kernel: [<ffffffff814470e1>] ? sock_aio_read+0x1a1/0x1b0
Nov 13 14:56:19 vm-centos6 kernel: [<ffffffff810a2bbb>] ? __remove_hrtimer+0x3b/0xb0
Nov 13 14:56:19 vm-centos6 kernel: [<ffffffff8114fcea>] ? handle_mm_fault+0x22a/0x300
Nov 13 14:56:19 vm-centos6 kernel: [<ffffffff811d68e0>] ? ep_send_events_proc+0x0/0x110
Nov 13 14:56:19 vm-centos6 kernel: [<ffffffff8104d0d8>] ? __do_page_fault+0x138/0x480
Nov 13 14:56:19 vm-centos6 kernel: [<ffffffff8152ffbe>] ? do_page_fault+0x3e/0xa0
Nov 13 14:56:19 vm-centos6 kernel: [<ffffffff8152d375>] ? page_fault+0x25/0x30
Nov 13 14:56:19 vm-centos6 kernel: Mem-Info:
Nov 13 14:56:19 vm-centos6 kernel: Node 0 DMA per-cpu:
Nov 13 14:56:19 vm-centos6 kernel: CPU 0: hi: 0, btch: 1 usd: 0
Nov 13 14:56:19 vm-centos6 kernel: CPU 1: hi: 0, btch: 1 usd: 0
Nov 13 14:56:19 vm-centos6 kernel: CPU 2: hi: 0, btch: 1 usd: 0
Nov 13 14:56:19 vm-centos6 kernel: CPU 3: hi: 0, btch: 1 usd: 0
Nov 13 14:56:19 vm-centos6 kernel: Node 0 DMA32 per-cpu:
Nov 13 14:56:19 vm-centos6 kernel: CPU 0: hi: 186, btch: 31 usd: 0
Nov 13 14:56:19 vm-centos6 kernel: CPU 1: hi: 186, btch: 31 usd: 0
Nov 13 14:56:19 vm-centos6 kernel: CPU 2: hi: 186, btch: 31 usd: 0
Nov 13 14:56:19 vm-centos6 kernel: CPU 3: hi: 186, btch: 31 usd: 0
Nov 13 14:56:19 vm-centos6 kernel: Node 0 Normal per-cpu:
Nov 13 14:56:19 vm-centos6 kernel: CPU 0: hi: 186, btch: 31 usd: 35
Nov 13 14:56:19 vm-centos6 kernel: CPU 1: hi: 186, btch: 31 usd: 3
Nov 13 14:56:19 vm-centos6 kernel: CPU 2: hi: 186, btch: 31 usd: 59
Nov 13 14:56:19 vm-centos6 kernel: CPU 3: hi: 186, btch: 31 usd: 184
Nov 13 14:56:19 vm-centos6 kernel: active_anon:4040530 inactive_anon:451920 isolated_anon:0
Nov 13 14:56:19 vm-centos6 kernel: active_file:3492 inactive_file:4985 isolated_file:0
Nov 13 14:56:19 vm-centos6 kernel: unevictable:0 dirty:2037 writeback:1387 unstable:0
Nov 13 14:56:19 vm-centos6 kernel: free:35841 slab_reclaimable:2943 slab_unreclaimable:7727
Nov 13 14:56:19 vm-centos6 kernel: mapped:296 shmem:73 pagetables:13459 bounce:0
Nov 13 14:56:19 vm-centos6 kernel: Node 0 DMA free:15668kB min:52kB low:64kB high:76kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15276kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Nov 13 14:56:19 vm-centos6 kernel: lowmem_reserve[]: 0 3000 18150 18150
Nov 13 14:56:19 vm-centos6 kernel: Node 0 DMA32 free:71556kB min:11160kB low:13948kB high:16740kB active_anon:2063844kB inactive_anon:519380kB active_file:656kB inactive_file:1132kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3072160kB mlocked:0kB dirty:660kB writeback:0kB mapped:120kB shmem:0kB slab_reclaimable:628kB slab_unreclaimable:68kB kernel_stack:0kB pagetables:204kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:2688 all_unreclaimable? yes
Nov 13 14:56:19 vm-centos6 kernel: lowmem_reserve[]: 0 0 15150 15150
Nov 13 14:56:19 vm-centos6 kernel: Node 0 Normal free:56140kB min:56364kB low:70452kB high:84544kB active_anon:14098276kB inactive_anon:1288300kB active_file:13312kB inactive_file:18808kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15513600kB mlocked:0kB dirty:7488kB writeback:5548kB mapped:1064kB shmem:292kB slab_reclaimable:11144kB slab_unreclaimable:30840kB kernel_stack:2184kB pagetables:53632kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:52256 all_unreclaimable? yes
Nov 13 14:56:19 vm-centos6 kernel: lowmem_reserve[]: 0 0 0 0
Nov 13 14:56:19 vm-centos6 kernel: Node 0 DMA: 1*4kB 2*8kB 2*16kB 2*32kB 1*64kB 1*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15668kB
Nov 13 14:56:19 vm-centos6 kernel: Node 0 DMA32: 2308*4kB 391*8kB 210*16kB 146*32kB 62*64kB 37*128kB 26*256kB 22*512kB 18*1024kB 3*2048kB 0*4096kB = 71592kB
Nov 13 14:56:19 vm-centos6 kernel: Node 0 Normal: 756*4kB 706*8kB 494*16kB 330*32kB 170*64kB 89*128kB 21*256kB 3*512kB 0*1024kB 0*2048kB 0*4096kB = 56320kB
Nov 13 14:56:19 vm-centos6 kernel: 65997 total pagecache pages
Nov 13 14:56:19 vm-centos6 kernel: 57354 pages in swap cache
Nov 13 14:56:19 vm-centos6 kernel: Swap cache stats: add 46466585, delete 46409231, find 15690882/21869217
Nov 13 14:56:19 vm-centos6 kernel: Free swap = 0kB
Nov 13 14:56:19 vm-centos6 kernel: Total swap = 4063228kB
Nov 13 14:56:19 vm-centos6 kernel: 4718576 pages RAM
Nov 13 14:56:19 vm-centos6 kernel: 117970 pages reserved
Nov 13 14:56:19 vm-centos6 kernel: 9305 pages shared
Nov 13 14:56:19 vm-centos6 kernel: 4551285 pages non-shared
Nov 13 14:56:19 vm-centos6 kernel: [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name
Nov 13 14:56:19 vm-centos6 kernel: [ 514] 0 514 2729 1 1 -17 -1000 udevd
Nov 13 14:56:19 vm-centos6 kernel: [ 837] 0 837 2729 1 1 -17 -1000 udevd
Nov 13 14:56:19 vm-centos6 kernel: [ 1272] 0 1272 62838 313 3 0 0 vmtoolsd
Nov 13 14:56:19 vm-centos6 kernel: [ 1310] 0 1310 15023 6 2 0 0 VGAuthService
Nov 13 14:56:19 vm-centos6 kernel: [ 1386] 0 1386 23283 40 0 -17 -1000 auditd
Nov 13 14:56:19 vm-centos6 kernel: [ 1406] 0 1406 62464 692 2 0 0 rsyslogd
Nov 13 14:56:19 vm-centos6 kernel: [ 1436] 0 1436 4589 36 0 0 0 irqbalance
Nov 13 14:56:19 vm-centos6 kernel: [ 1452] 32 1452 4744 18 2 0 0 rpcbind
Nov 13 14:56:19 vm-centos6 kernel: [ 1472] 29 1472 5837 2 0 0 0 rpc.statd
Nov 13 14:56:19 vm-centos6 kernel: [ 1589] 81 1589 5394 47 2 0 0 dbus-daemon
Nov 13 14:56:19 vm-centos6 kernel: [ 1621] 0 1621 1020 1 0 0 0 acpid
Nov 13 14:56:19 vm-centos6 kernel: [ 1631] 68 1631 9521 162 2 0 0 hald
Nov 13 14:56:19 vm-centos6 kernel: [ 1632] 0 1632 5099 2 1 0 0 hald-runner
Nov 13 14:56:19 vm-centos6 kernel: [ 1664] 0 1664 5629 2 3 0 0 hald-addon-inpu
Nov 13 14:56:19 vm-centos6 kernel: [ 1674] 68 1674 4501 2 0 0 0 hald-addon-acpi
Nov 13 14:56:19 vm-centos6 kernel: [ 1689] 0 1689 2728 1 3 -17 -1000 udevd
Nov 13 14:56:19 vm-centos6 kernel: [ 1695] 0 1695 96534 43 1 0 0 automount
Nov 13 14:56:19 vm-centos6 kernel: [ 1823] 0 1823 20332 28 0 0 0 master
Nov 13 14:56:19 vm-centos6 kernel: [ 1846] 89 1846 20398 24 2 0 0 qmgr
Nov 13 14:56:19 vm-centos6 kernel: [ 1849] 0 1849 28661 2 3 0 0 abrtd
Nov 13 14:56:19 vm-centos6 kernel: [ 1862] 0 1862 29342 24 2 0 0 crond
Nov 13 14:56:19 vm-centos6 kernel: [ 1876] 0 1876 5394 7 0 0 0 atd
Nov 13 14:56:19 vm-centos6 kernel: [ 1889] 0 1889 19879 2 0 0 0 login
Nov 13 14:56:19 vm-centos6 kernel: [ 1891] 0 1891 1016 2 3 0 0 mingetty
Nov 13 14:56:19 vm-centos6 kernel: [ 1893] 0 1893 1016 2 0 0 0 mingetty
Nov 13 14:56:19 vm-centos6 kernel: [ 1895] 0 1895 1016 2 2 0 0 mingetty
Nov 13 14:56:19 vm-centos6 kernel: [ 1897] 0 1897 1016 2 0 0 0 mingetty
Nov 13 14:56:19 vm-centos6 kernel: [ 1899] 0 1899 1016 2 1 0 0 mingetty
Nov 13 14:56:19 vm-centos6 kernel: [ 1996] 0 1996 521256 57 0 0 0 console-kit-dae
Nov 13 14:56:19 vm-centos6 kernel: [ 2063] 0 2063 27076 2 1 0 0 bash
Nov 13 14:56:19 vm-centos6 kernel: [29526] 0 29526 25812 47 1 0 0 ping
Nov 13 14:56:19 vm-centos6 kernel: [ 4492] 0 4492 6354569 4432393 1 0 0 codis-server
Nov 13 14:56:19 vm-centos6 kernel: [25500] 0 25500 133214 139 0 0 0 SFTMonitor
Nov 13 14:56:19 vm-centos6 kernel: [25501] 0 25501 222155 168 1 0 0 SFTServer
Nov 13 14:56:19 vm-centos6 kernel: [19596] 0 19596 16672 22 2 -17 -1000 sshd
Nov 13 14:56:19 vm-centos6 kernel: [26159] 500 26159 4441 10 3 0 0 zabbix_agentd
Nov 13 14:56:19 vm-centos6 kernel: [26161] 500 26161 4441 132 0 0 0 zabbix_agentd
Nov 13 14:56:19 vm-centos6 kernel: [26162] 500 26162 4441 49 0 0 0 zabbix_agentd
Nov 13 14:56:19 vm-centos6 kernel: [26163] 500 26163 4441 49 2 0 0 zabbix_agentd
Nov 13 14:56:19 vm-centos6 kernel: [26164] 500 26164 4441 49 2 0 0 zabbix_agentd
Nov 13 14:56:19 vm-centos6 kernel: [26165] 500 26165 4441 49 0 0 0 zabbix_agentd
Nov 13 14:56:19 vm-centos6 kernel: [26166] 500 26166 4441 49 0 0 0 zabbix_agentd
Nov 13 14:56:19 vm-centos6 kernel: [26167] 500 26167 4441 49 2 0 0 zabbix_agentd
Nov 13 14:56:19 vm-centos6 kernel: [26168] 500 26168 4441 49 3 0 0 zabbix_agentd
Nov 13 14:56:19 vm-centos6 kernel: [26169] 500 26169 4441 49 1 0 0 zabbix_agentd
Nov 13 14:56:19 vm-centos6 kernel: [26170] 500 26170 4441 49 2 0 0 zabbix_agentd
Nov 13 14:56:19 vm-centos6 kernel: [26171] 500 26171 4441 49 0 0 0 zabbix_agentd
Nov 13 14:56:19 vm-centos6 kernel: [26172] 500 26172 4441 49 2 0 0 zabbix_agentd
Nov 13 14:56:19 vm-centos6 kernel: [26174] 500 26174 4441 49 2 0 0 zabbix_agentd
Nov 13 14:56:19 vm-centos6 kernel: [26175] 500 26175 4441 49 2 0 0 zabbix_agentd
Nov 13 14:56:19 vm-centos6 kernel: [23868] 38 23868 7683 44 2 0 0 ntpd
Nov 13 14:56:19 vm-centos6 kernel: [ 3221] 89 3221 20352 231 2 0 0 pickup
Nov 13 14:56:19 vm-centos6 kernel: [ 3463] 0 3463 24592 291 2 0 0 sshd
Nov 13 14:56:19 vm-centos6 kernel: [ 3466] 0 3466 27087 145 0 0 0 bash
Nov 13 14:56:19 vm-centos6 kernel: [ 3490] 0 3490 26297 51 0 0 0 dd
Nov 13 14:56:19 vm-centos6 kernel: Out of memory: Kill process 4492 (codis-server) score 941 or sacrifice child
Nov 13 14:56:19 vm-centos6 kernel: Killed process 4492, UID 0, (codis-server) total-vm:25418276kB, anon-rss:17729176kB, file-rss:396kB
//这是运维收到机器原4G swap只剩80k告警时,立即去增加了约6G swap空间产生的日志,但redis进程已经在20秒前被kill掉了
Nov 13 14:56:39 vm-centos6 kernel: Adding 5999996k swap on /home/swap/swapfile. Priority:-2 extents:8 across:6499708k
【分析问题】
redis实例被系统内核关闭掉了,系统日志最重要的就是一头一尾两句:
codis-server invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
Killed process 4492, UID 0, (codis-server) total-vm:25418276kB, anon-rss:17729176kB, file-rss:396kB
redis进程申请4K内存空间时(order=0所以是2^0页也就是4k),系统内存不足触发了oom-killer,最后被选中kill的就是redis进程自己。
参考 https://www.jianshu.com/p/c2e7d36829af的内存结构,mask(0x201da)的最低2位 "10"=2是会Allocate from ZONE_HIGHMEM,但在64位系统中是没有highmem区的,实际是从normal区请求内存。从日志得知“Node 0 Normal free:56140kB min:56364kB”,normal区当前可用56140kB小于最低限制56364kB,由此触发的oom-killer。
codis-monitor监控对该节点的内存使用告警阈值为65%,maxmemory=12G,所以在K-V使用内存到 12G * 65% = 7.8G 时会发出告警。但节点被kill时并没有发出告警,也就是说K-V使用的内存还不到 7.8G,机器总内存 18G swap区当时 4G,没有别的什么进程能消耗内存。
从日志可以看到“anon-rss:17729176kB”,redis节点被关闭时占用内存约16.9G,一边说redis占用内存16.9G耗光了内存导致OOM,一边说redis的K-V数据量不超过7.8G。
于是调查anon-rss的含义,RSS是说从操作系统角度来看分配给进程的内存。又核对codis-monitor的65%是怎么设置的,原来是针对info命令打印出来的 used_memory 实际K-V数据所使用内存,info命令还有 used_memory_rss 字段表示操作系统分配给redis所占用的内存,used_memory_rss 能大于 used_memory 表示内存碎片率即另一个字段 mem_fragmentation_ratio。
至此理解阈值告警的used_memory字段和系统层面分配的used_memory_rss字段后,可以得知这两个现象描述的是不同维度的事情,初步猜测是内存碎片过大,导致redis总占用内存超过机器内存总量,先于K-V存储数据达到告警阈值。
【验证问题】
由于该usa-9节点redis已重启无法追溯问题现场,于是遍历了usa集群的其他redis节点来验证初步猜测。
1)usa-2节点
[root@usa-idc-micen-codis-app2 ~]# top -c
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4498 root 20 0 26.9g 17g 884 S 4.3 96.9 23255:49 /opt/xyz/codis202/bin/codis-server *:8998
[root@usa-idc-micen-codis-app2 ~]# free -m
total used free shared buffers cached
Mem: 17971 17784 187 0 8 19
-/+ buffers/cache: 17755 215
Swap: 3967 2553 1414
xxx.xxx.xxx.xxx:8998> info
# Memory
used_memory_human:6.63G
used_memory_rss_human:17.00G
mem_fragmentation_ratio:2.57
2)usa-4节点
[root@usa-idc-micen-codis-app4 log]# top -c
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
9297 root 20 0 11.2g 10g 1076 S 3.0 59.5 1780:23 /opt/xyz/codis202/bin/codis-server *:8998
[root@usa-idc-micen-codis-app4 log]# free -m
total used free shared buffers cached
Mem: 17971 17751 219 0 138 5184
-/+ buffers/cache: 12429 5542
Swap: 3967 631 3336
xxx.xxx.xxx.xxx:8998> info
# Memory
used_memory_human:7.63G
used_memory_rss_human:10.44G
mem_fragmentation_ratio:1.37
3)usa-1节点
[root@usa-idc-micen-codis-app1 ~]# top -c
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4617 root 20 0 29.8g 15g 876 S 3.0 89.8 22864:35 /opt/xyz/codis202/bin/codis-server *:8998
[root@usa-idc-micen-codis-app1 ~]# free -m
total used free shared buffers cached
Mem: 17971 17813 158 0 30 97
-/+ buffers/cache: 17685 285
Swap: 11780 4239 7541
xxx.xxx.xxx.xxx:8998> info
# Memory
used_memory_human:6.61G
used_memory_rss_human:15.74G
mem_fragmentation_ratio:2.38
总结:
1)usa-2节点状况最接近OOM的usa-9节点,K-V数据存储6.63G但包含内存碎片的总内存占用17G,内存碎片惊人的达到了10G以上,相当于存储6G数据但浪费10G内存不可用,碎片率2.57远远超过业界建议的1.5。间接验证了usa-9节点是因为内存碎片过大,总占用内存达到物理内存上限,申请新内存页失败导致OOM。
2)top命令查看到的进程常驻内存RES,应该就是info命令查看到的redis包含碎片的占用内存used_memory_rss,也即是内核日志kill时的清理出来的内存anon-rss。
3)除了usa-4节点状况健康:碎片率1.37低于1.5、剩余内存5G、swap区几乎没用,其他节点碎片量和碎片率过大、内存剩余无几、swap区大量使用。尤其是usa-2节点离OOM不远,但其能在悬崖边游走而没有掉下去,是因为上面提到的K-V存储内存65%阈值告警对redis所做的保护只读不写,让包含碎片的总占用内存没有超过物理内存总量,但usa-9节点就没这么好运。
【解决问题】
1)保守治疗就是让redis节点不容易OOM。一是增加swap区加大物理内存耗尽的容忍度,降低触发oom-killer的机会;二是调低K-V存储告警阈值从65%到60%让保护提前生效,从而降低包含碎片的内存占用总量超过物理内存的风险。
2)有效治疗就是清理内存碎片,redis4.0之后具备了清理能力,但目前使用的redis3.2只能通过关机重启,加入新机器节点逐步迁移slot,迁移完成之后关闭重启旧节点。困难有三个:一是缺乏自动运维手段,逐个slot手工迁移费时;二是之前缺乏项目组对redis的使用约束,里面会存放有大key,迁移这些slot时带来的停顿项目组可不会接受;三是缺乏使用约束,项目组很可能把redis当db使用,这些被重度使用的slot所在节点会有master-slave保证高可用(几乎都不开持久化),如果迁移slot时在缺乏新的slave备份的情况下出现redis挂掉数据丢失,项目组完全无法接受。
3)长期治疗就是降低内存碎片,要求项目组对使用到的所有key补上TTL,一小时或一星期都行,让过期key能被清理,从而降低内存使用量和内存碎片量。但完全没项目组认领的key,只能暂时留在redis内,后期用脚本遍历对没有TTL的key补默认TTL。
【思考问题】
1)内存碎片是如何产生的?
可以确定的是频繁的对key set新值。比如整数集合(intSet)数据结构,假设以连续int16空间存储多个小整数,一旦加入一个2字节以上的大整数时,所有小整数都会升级成int32或int64的空间,之后再把这个大整数删除,所有小整数可不会降级回到int16,于是有一半以上的内存空间被浪费了。另外假设set keyA 1m_str,之后再set keyA int_val,空余出来的内存是否能释放,有待验证。
【待思考项】
1)redis启用持久化时,fork子进程需要同redis进程相等的内存空间(实际上copy-on-write不会真使用完全一样多的内存空间),如果只分配45%物理内存给redis进程,剩余留给持久化子进程可不划算。所以推荐的是打开内核参数 vm.overcommit_memory = 1,让分配内存空间给fork子进程时,由swap区来担保分配。redis启动日志也能看到这条警告:
# WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
vm.overcommit_memory 默认是0,也就是说redis进程申请内存只能从物理剩余内存中申请,不会去使用swap区。那top命令查看usa-2节点显示的VIRT=26.9g是怎么计算出来的?