GC开销的降低--netty4在Twitter的使用

标签: gc netty4 twitter | 发表时间:2013-12-04 22:27 | 作者:城的灯
出处:http://www.iteye.com

netty的founder Trustin Lee发布在Twitter上的一篇博客,非常好,直接转。

 

The following text from Twitter 

 

 

 

At Twitter,  Netty ( @netty_project) is used in core places requiring networking functionality.

For example:

  • Finagle is our  protocol agnostic RPC system whose transport layer is built on top of Netty, and it is used to implement most services internally like Search
  • TFE (Twitter Front End) is our proprietary  spoon-feeding  reverse proxy which serves most of public-facing HTTP and  SPDY traffic using Netty
  • Cloudhopper sends billions of SMS messages every month to hundreds of mobile carriers all around the world using Netty

For those who aren’t aware, Netty is an open source  Java NIO framework that makes it easier to create high-performing protocol servers. An older version of Netty v3 used Java objects to represent I/O events. This was simple, but could generate a lot of  garbageespecially at our scale. In the new Netty 4 release, changes were made so that instead of short-lived event objects, methods on long-lived channel objects are used to handle I/O events. There is also a specialized buffer allocator that uses pools.

We take the performance, usability, and sustainability of the Netty project seriously, and we have been working closely with the Netty community to improve it in all aspects. In particular, we will discuss our usage of Netty 3 and will aim to show why migrating to Netty 4 has made us more efficient.

 

 

Reducing GC pressure and memory bandwidth consumption

A problem was Netty 3’s reliance on the JVM’s memory management for buffer allocations. Netty 3 creates a new heap buffer whenever a new message is received or a user sends a message to a remote peer. This means a ‘new byte[capacity]’ for each new buffer. These buffers caused GC pressure and consumed memory bandwidth: allocating a new byte array consumes memory bandwidth to fill the array with zeros for safety. However, the zero-filled byte array is very likely to be filled with the actual data, consuming the same amount of memory bandwidth. We could have reduced the consumption of memory bandwidth to 50% if the Java Virtual Machine (JVM) provided a way to create a new byte array which is not necessarily filled with zeros, but there’s no such way at this moment.

To address this issue, we made the following changes for Netty 4.

Removal of event objects

Instead of creating event objects, Netty 4 defines different methods for different event types. In Netty 3, the  ChannelHandler has a single method that handles all event objects:

class Before implements ChannelUpstreamHandler {
  void handleUpstream(ctx, ChannelEvent e) {
    if (e instanceof MessageEvent) { ... }
    else if (e instanceof ChannelStateEvent) { ... }
      ...
    }
}

Netty 4 has as many handler methods as the number of event types:

class After implements ChannelInboundHandler {
  void channelActive(ctx) { ... }
  void channelInactive(ctx) { ... }
  void channelRead(ctx, msg) { ... }
  void userEventTriggered(ctx, evt) { ... }
  ...
}

Note a handler now has a method called ‘ userEventTriggered’ so that it does not lose the ability to define a custom event object.

Buffer pooling

Netty 4 also introduced a new interface, ‘ ByteBufAllocator’. It now provides a buffer pool implementation via that interface and is a pure Java variant of  jemalloc, which implements  buddy memory allocation and  slab allocation.

Now that Netty has its own memory allocator for buffers, it doesn’t waste memory bandwidth by filling buffers with zeros. However, this approach opens another can of worms—reference counting. Because we cannot rely on GC to put the unused buffers into the pool, we have to be very careful about leaks. Even a single handler that forgets to release a buffer can make our server’s memory usage grow boundlessly.

Was it worthwhile to make such big changes?

Because of the changes mentioned above, Netty 4 has no backward compatibility with Netty 3. It means our projects built on top of Netty 3 as well as other community projects have to spend non-trivial amount of time for migration. Is it worth doing that?

We compared two  echo protocol servers built on top of Netty 3 and 4 respectively. (Echo is simple enough such that any garbage created is Netty’s fault, not the protocol). I let them serve the same distributed echo protocol clients with 16,384 concurrent connections sending 256-byte random payload repetitively, nearly saturating gigabit ethernet.

According to our test result, Netty 4 had:

  • 5 times less frequent GC pauses:  45.5 vs. 9.2 times/min
  • 5 times less garbage production:  207.11 vs 41.81 MiB/s

I also wanted to make sure our buffer pool is fast enough. Here’s a graph where the X and Y axis denote the size of each allocation and the time taken to allocate a single buffer respectively:

As you see, the buffer pool is much faster than JVM as the size of the buffer increases. It is even more noticeable for direct buffers. However, it could not beat JVM for small heap buffers, so we have something to work on here.

Moving forward

Although some parts of our services already migrated from Netty 3 to 4 successfully, we are performing the migration gradually. We discovered some barriers that slow our adoption that we hope to address in the near future:

    • Buffer leaks: Netty has a simple leak reporting facility but it does not provide information detailed enough to fix the leak easily.
    • Simpler core: Netty is a community driven project with many stakeholders that could benefit from a simpler core set of code. This increases the instability of the core of Netty because those non-core features tend to lead to collateral changes in the core. We want to make sure only the real core features remain in the core and other features stay out of there.

We also are thinking of adding more cool features such as:

      • HTTP/2 implementation
      • HTTP and SOCKS proxy support for client side
      • Asynchronous DNS resolution (see  pull request)
      • Native extensions for Linux that works directly with  epoll via JNI
      • Prioritization of the connections with strict response time constraints

Getting Involved

What’s interesting about Netty is that it is used by many different people and companies worldwide, mostly not from Twitter. It is an independent and very healthy open source project with many  contributors. If you are interested in building ‘the future of network programming’, why don’t you visit the project  web site, follow  @netty_project, jump right into the  source code at GitHub or even consider  joining the flock to help us improve Netty?

 

Acknowledgements

Netty project was founded by Trustin Lee ( @trustin) who joined the flock in 2011 to help build Netty 4. We also like to thank Jeff Pinner ( @jpinner) from the TFE team who gave many great ideas mentioned in this article and became a guinea pig for Netty 4 without hesitation. Furthermore, Norman Maurer ( @normanmaurer), one of the core Netty committers, made an enormous amount of effort to help us materialize the great ideas into actually shippable piece of code as part of the Netty project. There are also countless number of individuals who gladly tried a lot of unstable releases catching up all the breaking changes we had to make, in particular we would like to thank: Berk Demir ( @bd), Charles Yang ( @cmyang), Evan Meagher ( @evanm), Larry Hosken ( @lahosken), Sonja Keserovic ( @thesonjake), and Stu Hood ( @stuhood).



已有 0 人发表留言,猛击->> 这里<<-参与讨论


ITeye推荐



相关 [gc netty4 twitter] 推荐:

GC开销的降低--netty4在Twitter的使用

- - 开源软件 - ITeye博客
netty的founder Trustin Lee发布在Twitter上的一篇博客,非常好,直接转. protocol agnostic RPC system whose transport layer is built on top of Netty, and it is used to implement most services internally like.

Twitter:使用Netty 4来减少GC开销

- - Java译站
在twitter,需要网络功能的核心模块使用的都是Netty. Finagle是我们的协议无关的RPC系统,它的传输层是在Netty之上构建的,许多内部的服务都是通过它来实现的,比如说搜索服务. TFE(Twitter Front End,Twitter前端)是我们专门的填鸭式反向代理,它使用Netty支撑了大部分面向公众的HTTP及SPDY的流量.

玩转Netty – 从Netty3升级到Netty4

- - CSDN博客综合推荐文章
        这篇文章主要和大家分享一下,在我们基础软件升级过程中遇到的经典Netty问题. 当然, 官方资料也许是一个更好的补充. 另外,大家如果对Netty及其Grizzly架构以及源码有疑问的,欢迎交流. 后续会为大家奉献我们基于Grizzly和Netty构建的RPC框架的压测分析,希望大家能够喜欢.

Java GC 调优

- - Darktea
关于 Java GC 已经有很多好的文档了, 比如这些:. 但是这里还是想再重点整理一下 Java GC 日志的格式, 可以作为实战时的备忘录.. 同时也会再整理一下各种概念. 一, JDK 6 提供的各种垃圾收集器. 先整理一下各种垃圾收集器.. 新生代收集器: Serial, ParNew, Parallel Scavenge (MaxGCPauseMillis vs.

[译]GC专家系列3-GC调优

- - SegmentFault 最新的文章
原文链接: http://www.cubrid.org/blog/dev-platform/how-to-tune-java-garbage-collection/. 本篇是”GC专家系列“的第三篇. 在第一篇 理解Java垃圾回收中我们学习了几种不同的GC算法的处理过程,GC的工作方式,新生代与老年代的区别.

GC 日志分析

- - 码蜂笔记
不同的JVM及其选项会输出不同的日志. 生成下面日志使用的选项: -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -Xloggc:d:/GClogs/tomcat6-gc.log. 最前面的数字 4.231 和 4.445 代表虚拟机启动以来的秒数.

初级分代GC

- - C++博客-首页原创精华区
通常情况下GC分为两种,分别是:扫描GC(Tracing GC)和引用计数GC(Reference counting GC). 其中扫描GC是比较常用的GC实现方法,其原理是:把正在使用的对象找出来,然后把未被使用的对象释放. 而引用计数GC则是对每个对象都添加一个计数器,引用增加一个计数器就加一,引用减少一个计数器就减一,当计数器减至零时,把对象回收释放.

Netty4底层用对象池和不用对象池实践优化

- - CSDN博客研发管理推荐文章
随着JVM虚拟机和JIT即时编译技术的发展,对象的分配和回收是个非常轻量级的工作. 但是对于缓冲区Buffer,情况却稍有不同,特别是对于堆外直接内存的分配和回收,是一件耗时的操作. 为了尽量重用缓冲区,Netty提供了基于内存池的缓冲区重用机制. 性能测试表明,采用内存池的ByteBuf相比于朝生夕灭的ByteBuf,性能高23倍左右(性能数据与使用场景强相关).

一个GC频繁的Case

- loudly - BlueDavy之技术Blog
前两天碰到一个很诡异的GC频繁的现象,走了不少弯路,N种方法查找后才终于查明原因了,在这篇blog中记录下,以便以后碰到这类问题时能更快的解决. 前两天一位同学找到我,说有个应用在启动后就一直Full GC,拿到GC log先看了下,确实是非常的诡异,截取的部分log如下:. 这个日志中诡异的地方在于每次Full GC的时候旧生代都还有很多的空间,于是去看来下启动参数,此时的启动参数如下:.

Java GC日志查看

- - Java - 编程语言 - ITeye博客
Java中的GC有哪几种类型. 虚拟机运行在Client模式的默认值,打开此开关参数后,. 使用Serial+Serial Old收集器组合进行垃圾收集. 打开此开关参数后,使用ParNew+Serial Old收集器组合进行垃圾收集. 打开此开关参数后,使用ParNew+CMS+Serial Old收集器组合进行垃圾收集.