有关 SoftReference 的一些事实 - in355hz - ITeye技术网站

Java 的 SoftReference 有很多年都没有被人惦记了。在 Javadoc 里, 它的描述是这样：

”虚拟机在抛出 OutOfMemoryError 之前会保证所有的软引用对象已被清除。此外，没有任何约束保证软引用将在某个特定的时间点被清除，或者确定一组不同的软引用对象被清除的顺序。不过，虚拟机的具体实现会倾向于不清除最近创建或最近使用过的软引用。“

这个类可以直接被用来实现简单的缓存，这个类或派生的子类也可用于较大的数据结构，来实现更加复杂的缓存。只要软引用可以到达该对象，就是说，该对象实际上是在使用，软引用就不会被清除。这样能够实现一个复杂的缓存，例如，使用强引用来关联最近使用的项目以防止对象被清除，而剩下的项目（使用软引用）抛给垃圾收集器去自由衡量。“

这里告诉我们什么？

1. 在你看到 OutOfMemoryError 前，Java 虚拟机一定会回收 SoftReference 对象；

2. Java 不保证 SoftReference 对象何时被清除，相关的机制是 JVM 实现相关的；

3. Java 提供 SoftReference 的期望是更好的实现缓存。

恩，看起来 很好很强大。JVM 会负责保留最近最新使用过的软引用，简直完美。但是，喂喂，有没有人在实际项目里用过 SoftReference 以及仔细观察过它的清除？

结果告诉我，现实是骨感的：

1. 如果你的进程所占的内存不是满到要抛 OutOfMemoryError 的程度，JVM 根本不清理 SoftReference 占用的内存。

2. 软引用对象占用了一大堆内存，更糟糕的是它们都会进入 Old-Gen。这样你的进程会频繁触发 Full GC，但即使这样，JVM 也不一定会清理 SoftReference 占用的内存。

3. 因为 Old-Gen 现在是满负荷工作，你会发现一次 FullGC 的时间变得异常的长。

简直太坑爹了，那 JVM 什么时候才清理 SoftReference 呢？

这里的正确答案是 ”这是 JVM 的自由，凡人无法干涉“。恩，尽管凡人无法干涉 JVM，但是可以使点小手段欺骗：

Java代码  
try {   
    Object[] ignored = new Object[(int) Runtime.getRuntime().maxMemory()];  
} catch (Throwable e) {  
    // Ignore OME  
}  

（来源：http://stackoverflow.com/questions/3785713/how-to-make-the-java-system-release-soft-references）

上面这段代码可以让 JVM 立即回收 SoftReference，很猛很暴力。

那么，常见的 JVM，例如 HotSpot 是怎么回收 SoftReference 的呢？谢天谢地，已经有人给出了研究结果：

http://jeremymanson.blogspot.com/2009/07/how-hotspot-decides-to-clear_07.html

直接翻译一下结论，是这样的：

”发生 GC 的时候，是否清理 SoftReference 取决于两个因素：

1. 引用的时间戳；

2. 有多少可用内存。

计算公式非常简单，首先定义：

free_heap - 堆里的空闲内存数量，单位是 MB

interval - 上一次 GC 时间与与引用记录的时间戳之间的时间间隔

ms_per_mb - 是一个毫秒数常量，表示每 MB 空闲堆中保留的 SoftReference 数量。

判定公式是：

interval <= free_heap * ms_per_mb“

其中 ms_per_mb 是一个可以设置的 JVM 参数：-XX:SoftRefLRUPolicyMSPerMB，结合公式很容易看明白，这个参数决定 FullGC 保留的 SoftReference 数量，参数值越大，GC 后保留的软引用对象就越多。

有些博客在推荐 JVM 参数时，建议 -XX:SoftRefLRUPolicyMSPerMB 配置成 0 ，这样可以避免在 GC 后保留 SoftReference。是否这样就可以完全避免软引用回收的问题？我想只有 JVM 知道了。

这里也揭示了 JVM 回收 SoftReference 的算法，注意它并不是真正淘汰最久最少访问的对象，而是根据内存余量，淘汰最近未访问的对象。相比真正的 LRU 淘汰算法，这显得比较粗放。

上面这些事实背后，我的结论是，使用 SoftReference 前需要谨慎考虑：

1. 你的应用的确需要把这些对象保留在 JVM 中，如果内存够用就永不清理吗？

2. 这些软引用对象会不会过分占用内存，导致你的应用内存压力增加，频繁 Full GC?

3. 除了 SoftReference, 你有没有更好管理这些对象的机制？

阅读全文……

标签 : java, jvm

发表评论

IT瘾于2015年9月10日下午06时41分00秒发布 #

Java ™ HotSpot Virtual Machine Performance Enhancements

NUMA Collector Enhancements

The Parallel Scavenger garbage collector has been extended to take advantage of machines with NUMA (Non Uniform Memory Access) architecture. Most modern computers are based on NUMA architecture, in which it takes a different amount of time to access different parts of memory. Typically, every processor in the system has a local memory that provides low access latency and high bandwidth, and remote memory that is considerably slower to access.

In the Java HotSpot Virtual Machine, the NUMA-aware allocator has been implemented to take advantage of such systems and provide automatic memory placement optimizations for Java applications. The allocator controls the eden space of the young generation of the heap, where most of the new objects are created. The allocator divides the space into regions each of which is placed in the memory of a specific node. The allocator relies on a hypothesis that a thread that allocates the object will be the most likely to use the object. To ensure the fastest access to the new object, the allocator places it in the region local to the allocating thread. The regions can be dynamically resized to reflect the allocation rate of the application threads running on different nodes. That makes it possible to increase performance even of single-threaded applications. In addition, "from" and "to" survivor spaces of the young generation, the old generation, and the permanent generation have page interleaving turned on for them. This ensures that all threads have equal access latencies to these spaces on average.

The NUMA-aware allocator is available on the Solaris™ operating system starting in Solaris 9 12/02 and on the Linux operating system starting in Linux kernel 2.6.19 and glibc 2.6.1.

The NUMA-aware allocator can be turned on with the -XX:+UseNUMA flag in conjunction with the selection of the Parallel Scavenger garbage collector. The Parallel Scavenger garbage collector is the default for a server-class machine. The Parallel Scavenger garbage collector can also be turned on explicitly by specifying the -XX:+UseParallelGC option.

The -XX:+UseNUMA flag was added in Java SE 6u2.

Note: There was a known bug in the Linux Kernel that may cause the JVM to crash when being run with -XX:UseNUMA. The bug was fixed in 2012, so this should not affect the latest versions of the Linux Kernel. To see if your Kernel has this bug, you can run the native reproducer.

NUMA Performance Metrics

When evaluated against the SPEC JBB 2005 benchmark on an 8-chip Opteron machine, NUMA-aware systems showed the following performance increases:

32 bit – About 30 percent increase in performance with NUMA-aware allocator
64 bit – About 40 percent increase in performance with NUMA-aware allocator

-XX:+UseNUMA

Enables a JVM heap space allocation policy that helps overcome the time it takes to fetch data from memory by leveraging processor to memory node relationships by allocating objects in a memory node local to a processor on NUMA systems.

Introduced in Java 6 Update 2. As of this writing, it is available with the throughput collector only, -XX:+UseParallelOldGC and -XX:+UseParallelGC.On Oracle Solaris, with multiple JVM deployments that span more than one processor/memory node should also set lgrp_mem_pset_aware=1 in /etc/system.

Linux additionally requires use of the numacntl command. Use numacntl –interleave for single JVM deployments. For multiple JVM deployments where JVMs that span more than one processor/memory node, use numacntl –cpubind=<node number> –memnode=<node number>.

Windows under AMD additionally requires enabling node-interleaving in the BIOS for single JVM deployments. All Windows multiple JVM deployments, where JVMs that span more than one processor/memory node should use processor affinity, use the SET AFFINITY [mask] command. Useful in JVM deployments that span processor/memory nodes on a NUMA system.

-XX:+UseNUMA should not be used in JVM deployments where the JVM does not span processor/memory nodes.

http://www.techpaste.com/2012/02/java-command-line-options-jvm-performance-improvement/

阅读全文……

标签 : java, jvm

发表评论

IT瘾于2015年9月6日下午09时45分00秒发布 #

使用 MTR 诊断网络问题 | 每日一贴

验证数据包丢失

在分析WinMTR/MTR输出结果时，您需要查看两件事情：丢包和延迟。首先，我们来讨论丢包。如果您在任何一个节点看到有掉包，这可能表示这个特定的路由节点有问题。然而，有些服务提供商会限制WINMTR/MTR工具发送的ICMP传输。这会对数据包丢失造成错觉，但事实上并未丢包。要确认您看到的数据包丢失是否是由于服务提供商限制造成的，您可以查看随后的一跳路由节点。如果该跳显示丢失0%，那么您可以肯定是ICMP限制，实际未丢包。看下面的例子：

在这种情况下，从第一跳到第二跳的丢包可能是由于第二跳路由ICMP限制导致的。因为剩余的8个路由节点都没有丢包。这种情况下，采取掉包最少的节点作为它实际的丢包率。

再考虑一个例子：

在这种情况下，你会看到第三跳和第四跳之间有60%的丢包。您可以假设这是由于路由设备限制导致的丢包。然而，您可以看到最后一跳是显示40%的丢包。但产生不同的丢包结果时，始终采用最后一跳的丢包率。

有些丢包可能产生在路由返回的时候。数据包可以正确无误地到达目的地，但未正常返回。这也会计算在丢包率中，但您从WinMTR/MTR结果报告中很难分辨。因此，在任何时候您都需要同时收集两个方向的WinMTR/MTR结果报告。

读懂网络延迟

除了可以通过 MTR 报告看到丢包率，我们还可以看到本地到目的主机之间的延时。因为不同的物理位置，延迟通常随着跳数的增加而增加。所以，延迟通常取决于节点之间的物理距离和线路质量。

例如，在同样的传输距离下，dial-up连接比cable modem连接有更大的延迟。如下示例中显示 MTR 报告：

[email protected]:~# mtr –report www.google.com
HOST: localhost                   Loss%   Snt   Last   Avg Best Wrst StDev
1. 63.247.74.43                  0.0%    10    0.3   0.6   0.3   1.2   0.3
2. 63.247.64.157                 0.0%    10    0.4   1.0   0.4   6.1   1.8
3. 209.51.130.213                0.0%    10    0.8   2.7   0.8 19.0   5.7
4. aix.pr1.atl.google.com        0.0%    10 388.0 360.4 342.1 396.7   0.2
5. 72.14.233.56                  0.0%    10 390.6 360.4 342.1 396.7   0.2
6. 209.85.254.247                0.0%    10 391.6 360.4 342.1 396.7   0.4
7. 64.233.174.46                 0.0%    10 391.8 360.4 342.1 396.7   2.1
8. gw-in-f147.1e100.net          0.0%    10 392.0 360.4 342.1 396.7   1.2

在这份报告中，从第三跳到第四跳的延迟猛增，直接导致了后面的延迟也很大。这可能是因为第四跳的路由器配置不当，或者线路很拥堵的原因。

然而，高延迟并不一定意味着当前路由器有问题。这份报告虽然看到第四跳有点问题，但是数据仍然可以正常达到目的主机并且返回给主机。延迟很大的原因也有可能是在返回过程中引发的。我这份报告我们看不到返回的路径，返回的路径可能是完全不同的线路，所以我们一般要进行双向测试了。

ICMP 速率限制也可能会增加延迟，如下：

[email protected]:~# mtr --report www.google.com
HOST: localhost                   Loss%   Snt   Last   Avg  Best  Wrst StDev
  1. 63.247.74.43                  0.0%    10    0.3   0.6   0.3   1.2   0.3
  2. 63.247.64.157                 0.0%    10    0.4   1.0   0.4   6.1   1.8
  3. 209.51.130.213                0.0%    10    0.8   2.7   0.8  19.0   5.7
  4. aix.pr1.atl.google.com        0.0%    10    6.7   6.8   6.7   6.9   0.1
  5. 72.14.233.56                  0.0%    10  254.2 250.3 230.1 263.4   2.9
  6. 209.85.254.247                0.0%    10   39.1  39.4  39.1  39.7   0.2
  7. 64.233.174.46                 0.0%    10   39.6  40.4  39.4  46.9   2.3
  8. gw-in-f147.1e100.net          0.0%    10   39.6  40.5  39.5  46.7   2.2

乍一看，第4跳和第5跳直接的延迟很大。但是第5跳之后，延迟又恢复正常了。最后的延迟差不多为 40ms。像这种情况，是不影响实际情况的。因为可能仅仅是第5跳设备限制了 ICMP 传输速率的原因。所以我们一般要用最后一跳的实际延迟为准。

常见的 MTR 报告类型

很多网络问题十分麻烦，并且需要上级网络提供商来帮助。然而，这里有很多常见的 MTR 报告所描述的网络问题。如果您正在经历一些网络问题，并且想诊断出原因，可以参考如下示例：

目的主机网络配置不当

在下面这个例子中，数据包在目的地出现了 100% 的丢包。乍一看是数据包没有到达，其实未必，很有可能是路由器或主机配置不当。

[email protected]:~# mtr --report www.google.com
HOST: localhost                   Loss%   Snt   Last   Avg  Best  Wrst StDev
  1. 63.247.74.43                  0.0%    10    0.3   0.6   0.3   1.2   0.3
  2. 63.247.64.157                 0.0%    10    0.4   1.0   0.4   6.1   1.8
  3. 209.51.130.213                0.0%    10    0.8   2.7   0.8  19.0   5.7
  4. aix.pr1.atl.google.com        0.0%    10    6.7   6.8   6.7   6.9   0.1
  5. 72.14.233.56                  0.0%    10    7.2   8.3   7.1  16.4   2.9
  6. 209.85.254.247                0.0%    10   39.1  39.4  39.1  39.7   0.2
  7. 64.233.174.46                 0.0%    10   39.6  40.4  39.4  46.9   2.3
  8. gw-in-f147.1e100.net         100.0    10    0.0   0.0   0.0   0.0   0.0

MTR 报告数据包没有到达目的主机是因为目的主机没有发送任何应答。这可能是目的主机防火墙的原因，例如： iptables 配置丢掉 ICMP 包所致。

家庭或办公室路由器的原因

有时候家庭路由器的原因导致 MTR 报告看起来有点误导。

% mtr --no-dns --report google.com
HOST: deleuze                     Loss%   Snt   Last   Avg  Best  Wrst StDev
  1. 192.168.1.1                   0.0%    10    2.2   2.2   2.0   2.7   0.2
  2. ???                          100.0    10    8.6  11.0   8.4  17.8   3.0
  3. 68.86.210.126                 0.0%    10    9.1  12.1   8.5  24.3   5.2
  4. 68.86.208.22                  0.0%    10   12.2  15.1  11.7  23.4   4.4
  5. 68.85.192.86                  0.0%    10   17.2  14.8  13.2  17.2   1.3
  6. 68.86.90.25                   0.0%    10   14.2  16.4  14.2  20.3   1.9
  7. 68.86.86.194                  0.0%    10   17.6  16.8  15.5  18.1   0.9
  8. 75.149.230.194                0.0%    10   15.0  20.1  15.0  33.8   5.6
  9. 72.14.238.232                 0.0%    10   15.6  18.7  14.1  32.8   5.9
 10. 209.85.241.148                0.0%    10   16.3  16.9  14.7  21.2   2.2
 11. 66.249.91.104                 0.0%    10   22.2  18.6  14.2  36.0   6.5

不要为 100% 的丢包率所吓到，这并不表明这里有问题。你可以看打在接下来几跳是没有任何丢包的。

运营商的路由器没有正确配置

有时候您的运营商的路由器配置原因，导致 ICMP 包永远不能到达目的地，例如：

[email protected]:~# mtr --report www.google.com
HOST: localhost                   Loss%   Snt   Last   Avg  Best  Wrst StDev
  1. 63.247.74.43                  0.0%    10    0.3   0.6   0.3   1.2   0.3
  2. 63.247.64.157                 0.0%    10    0.4   1.0   0.4   6.1   1.8
  3. 209.51.130.213                0.0%    10    0.8   2.7   0.8  19.0   5.7
  4. aix.pr1.atl.google.com        0.0%    10    6.7   6.8   6.7   6.9   0.1
  5. ???                           0.0%    10    0.0   0.0   0.0   0.0   0.0
  6. ???                           0.0%    10    0.0   0.0   0.0   0.0   0.0
  7. ???                           0.0%    10    0.0   0.0   0.0   0.0   0.0
  8. ???                           0.0%    10    0.0   0.0   0.0   0.0   0.0
  9. ???                           0.0%    10    0.0   0.0   0.0   0.0   0.0
 10. ???                           0.0%    10    0.0   0.0   0.0   0.0   0.0

当没有额外的路由信息时，将会显示问号（???），下面例子也一样：

[email protected]:~# mtr --report www.google.com
HOST: localhost                   Loss%   Snt   Last   Avg  Best  Wrst StDev
   1. 63.247.74.43                 0.0%    10    0.3   0.6   0.3   1.2   0.3
   2. 63.247.64.157                0.0%    10    0.4   1.0   0.4   6.1   1.8
   3. 209.51.130.213               0.0%    10    0.8   2.7   0.8  19.0   5.7
   4. aix.pr1.atl.google.com       0.0%    10    6.7   6.8   6.7   6.9   0.1
   5. 172.16.29.45                 0.0%    10    0.0   0.0   0.0   0.0   0.0
   6. ???                          0.0%    10    0.0   0.0   0.0   0.0   0.0 
   7. ???                          0.0%    10    0.0   0.0   0.0   0.0   0.0
   8. ???                          0.0%    10    0.0   0.0   0.0   0.0   0.0
   9. ???                          0.0%    10    0.0   0.0   0.0   0.0   0.0
  10. ???                          0.0%    10    0.0   0.0   0.0   0.0   0.0

有时候，一个错误配置的路由器，将会在一个环路中不断发送数据包，如下：

[email protected]:~# mtr --report www.google.com
HOST: localhost                   Loss%   Snt   Last   Avg  Best  Wrst StDev
  1. 63.247.74.43                  0.0%    10    0.3   0.6   0.3   1.2   0.3
  2. 63.247.64.157                 0.0%    10    0.4   1.0   0.4   6.1   1.8
  3. 209.51.130.213                0.0%    10    0.8   2.7   0.8  19.0   5.7
  4. aix.pr1.atl.google.com        0.0%    10    6.7   6.8   6.7   6.9   0.1
  5. 12.34.56.79                   0.0%    10    0.0   0.0   0.0   0.0   0.0
  6. 12.34.56.78                   0.0%    10    0.0   0.0   0.0   0.0   0.0
  7. 12.34.56.79                   0.0%    10    0.0   0.0   0.0   0.0   0.0
  8. 12.34.56.78                   0.0%    10    0.0   0.0   0.0   0.0   0.0
  9. 12.34.56.79                   0.0%    10    0.0   0.0   0.0   0.0   0.0
 10. 12.34.56.78                   0.0%    10    0.0   0.0   0.0   0.0   0.0
 11. 12.34.56.79                   0.0%    10    0.0   0.0   0.0   0.0   0.0
 12. ???                           0.0%    10    0.0   0.0   0.0   0.0   0.0
 13. ???                           0.0%    10    0.0   0.0   0.0   0.0   0.0
 14. ???                           0.0%    10    0.0   0.0   0.0   0.0   0.0

通过报告可以看打第4跳的路由器没有正确配置。如果这种状况发生了，您可以连接当地的网络管理员或ISP解决问题。

ICMP 速率限制

ICMP 速率限制可引起数据包的丢失。如果数据包在这一跳有丢失，但是下面几条都正常，我们可以判断是 ICMP 速率限制的原因。如下：

[email protected]:~# mtr --report www.google.com
 HOST: localhost                   Loss%   Snt   Last   Avg  Best  Wrst StDev
   1. 63.247.74.43                  0.0%    10    0.3   0.6   0.3   1.2   0.3
   2. 63.247.64.157                 0.0%    10    0.4   1.0   0.4   6.1   1.8
   3. 209.51.130.213                0.0%    10    0.8   2.7   0.8  19.0   5.7
   4. aix.pr1.atl.google.com        0.0%    10    6.7   6.8   6.7   6.9   0.1
   5. 72.14.233.56                 60.0%    10   27.2  25.3  23.1  26.4   2.9
   6. 209.85.254.247                0.0%    10   39.1  39.4  39.1  39.7   0.2
   7. 64.233.174.46                 0.0%    10   39.6  40.4  39.4  46.9   2.3
   8. gw-in-f147.1e100.net          0.0%    10   39.6  40.5  39.5  46.7   2.2

这种状况是没关系的。ICMP 速率限制是一种常见的手段，这样可以减少网络数据的负载，让更重要的流量先通过。

超时

在很多种情况下会发生超时现象。例如：很多路由器可能会直接丢弃 ICMP 包，这时就会导致超时（???）。
另外，也有可能在数据返回的路上出现了问题。

[email protected]:~# mtr --report www.google.com
HOST: localhost                   Loss%   Snt   Last   Avg  Best  Wrst StDev
  1. 63.247.74.43                  0.0%    10    0.3   0.6   0.3   1.2   0.3
  2. 63.247.64.157                 0.0%    10    0.4   1.0   0.4   6.1   1.8
  3. 209.51.130.213                0.0%    10    0.8   2.7   0.8  19.0   5.7
  4. aix.pr1.atl.google.com        0.0%    10    6.7   6.8   6.7   6.9   0.1
  5. ???                           0.0%    10    7.2   8.3   7.1  16.4   2.9
  6. ???                           0.0%    10   39.1  39.4  39.1  39.7   0.2
  7. 64.233.174.46                 0.0%    10   39.6  40.4  39.4  46.9   2.3
  8. gw-in-f147.1e100.net          0.0%    10   39.6  40.5  39.5  46.7   2.2

超时不一定是数据包被丢失。如上例，数据包还是安全的到达目的地并且返回。中间节点的超时可能是路由器配置丢弃 ICMP 包，或者 QoS 设置引起的原因，这个是没关系的。

参考：

http://kb.51hosting.com/analyzing-mtr-report

阅读全文……

标签 : monitor

发表评论

IT瘾于2015年9月1日下午06时26分00秒发布 #