Analyzing Thread Dumps in Middleware - Part 1
This section details with basics of thread states and locking in general. Subsequent sections will deal with capturing thread dumps and analysis with particular emphasis on WebLogic Application Server Thread dumps.
Thread Dumps
A Thread Dump is a brief snapshot in textual format of threads within a Java Virtual Machine (JVM). This is equivalent to process dump in the native world. Data about each thread including the name of the thread, priority, thread group, state (running/blocked/waiting/parking) as well as the execution stack in form of thread stack trace is included in the thread dump. All threads - the Java VM threads (GC threads/scavengers/monitors/others) as well as application and server threads are all included in the dump. Newer versions of JVMs also report blocked thread chains (like ThreadA is locked for a resource held by ThreadB) as well as deadlocks (circular dependency among threads for locks).
Different JVM Vendors display the data in different formats (markers for start/end of thread dumps, reporting of locks and thread states, method signatures) but the underlying data exposed by the thread dumps remains the same across vendors.
Sample of a JRockit Thread Dump:
===== FULL THREAD DUMP =============== Mon Feb 06 11:38:58 2012 Oracle JRockit(R) R28.0.0-679-130297-1.6.0_17-20100312-2123-windows-ia32 "Main Thread" id=1 idx=0x4 tid=4184 prio=5 alive, in native at java/net/PlainSocketImpl.socketConnect(Ljava/net/InetAddress;II)V(Native Method) at java/net/PlainSocketImpl.doConnect(PlainSocketImpl.java:333) ^-- Holding lock: java/net/SocksSocketImpl@0x10204E50[biased lock] at java/net/PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195) at java/net/PlainSocketImpl.connect(PlainSocketImpl.java:182) at java/net/SocksSocketImpl.connect(SocksSocketImpl.java:366) at java/net/Socket.connect(Socket.java:525) at java/net/Socket.connect(Socket.java:475) at sun/net/NetworkClient.doConnect(NetworkClient.java:163) at sun/net/www/http/HttpClient.openServer(HttpClient.java:394) at sun/net/www/http/HttpClient.openServer(HttpClient.java:529) ^-- Holding lock: sun/net/www/http/HttpClient@0x10203FB8[biased lock] at sun/net/www/http/HttpClient.<init>(HttpClient.java:233) at sun/net/www/http/HttpClient.New(HttpClient.java:306) at sun/net/www/http/HttpClient.New(HttpClient.java:323) at sun/net/www/protocol/http/HttpURLConnection.getNewHttpClient(HttpURLConnection.java:860) at sun/net/www/protocol/http/HttpURLConnection.plainConnect(HttpURLConnection.java:801) at sun/net/www/protocol/http/HttpURLConnection.connect(HttpURLConnection.java:726) at sun/net/www/protocol/http/HttpURLConnection.getOutputStream(HttpURLConnection.java:904) ^-- Holding lock: sun/net/www/protocol/http/HttpURLConnection@0x101FAD88[biased lock] at post.main(post.java:29) at jrockit/vm/RNI.c2java(IIIII)V(Native Method) -- end of trace "(Signal Handler)" id=2 idx=0x8 tid=4668 prio=5 alive, daemon "(OC Main Thread)" id=3 idx=0xc tid=6332 prio=5 alive, native_waiting, daemon "(GC Worker Thread 1)" id=? idx=0x10 tid=1484 prio=5 alive, daemon "(GC Worker Thread 2)" id=? idx=0x14 tid=5548 prio=5 alive, daemon "(Code Generation Thread 1)" id=4 idx=0x30 tid=8016 prio=5 alive, native_waiting, daemon "(Code Optimization Thread 1)" id=5 idx=0x34 tid=3596 prio=5 alive, native_waiting, daemon "(VM Periodic Task)" id=6 idx=0x38 tid=1352 prio=10 alive, native_blocked, daemon "(Attach Listener)" id=7 idx=0x3c tid=6592 prio=5 alive, native_blocked, daemon "Finalizer" id=8 idx=0x40 tid=1576 prio=8 alive, native_waiting, daemon at jrockit/memory/Finalizer.waitForFinalizees(J[Ljava/lang/Object;)I(Native Method) at jrockit/memory/Finalizer.access$700(Finalizer.java:12) at jrockit/memory/Finalizer$4.run(Finalizer.java:183) at java/lang/Thread.run(Thread.java:619) at jrockit/vm/RNI.c2java(IIIII)V(Native Method) -- end of trace "Reference Handler" id=9 idx=0x44 tid=3012 prio=10 alive, native_waiting, daemon at java/lang/ref/Reference.waitForActivatedQueue(J)Ljava/lang/ref/Reference;(Native Method) at java/lang/ref/Reference.access$100(Reference.java:11) at java/lang/ref/Reference$ReferenceHandler.run(Reference.java:82) at jrockit/vm/RNI.c2java(IIIII)V(Native Method) -- end of trace "(Sensor Event Thread)" id=10 idx=0x48 tid=980 prio=5 alive, native_blocked, daemon "VM JFR Buffer Thread" id=11 idx=0x4c tid=6072 prio=5 alive, in native, daemon ===== END OF THREAD DUMP ===============
Sample of a Sun Hotspot Thread Dump (executing same code as above)
2012-02-06 11:37:30 Full thread dump Java HotSpot(TM) Client VM (16.0-b13 mixed mode): "Low Memory Detector" daemon prio=6 tid=0x0264bc00 nid=0x520 runnable [0x00000000] java.lang.Thread.State: RUNNABLE "CompilerThread0" daemon prio=10 tid=0x02647400 nid=0x1ae8 waiting on condition [0x00000000] java.lang.Thread.State: RUNNABLE "Attach Listener" daemon prio=10 tid=0x02645800 nid=0x1480 runnable [0x00000000] java.lang.Thread.State: RUNNABLE "Signal Dispatcher" daemon prio=10 tid=0x02642800 nid=0x644 waiting on condition [0x00000000] java.lang.Thread.State: RUNNABLE "Finalizer" daemon prio=8 tid=0x02614800 nid=0x1e70 in Object.wait() [0x1882f000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x04660b18> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) - locked <0x04660b18> (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) "Reference Handler" daemon prio=10 tid=0x02610000 nid=0x1b84 in Object.wait() [0x1879f000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x04660a20> (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:485) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) - locked <0x04660a20> (a java.lang.ref.Reference$Lock) "main" prio=6 tid=0x00ec9400 nid=0x19e4 runnable [0x0024f000] java.lang.Thread.State: RUNNABLE at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333) - locked <0x04642958> (a java.net.SocksSocketImpl) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) at java.net.Socket.connect(Socket.java:525) at java.net.Socket.connect(Socket.java:475) at sun.net.NetworkClient.doConnect(NetworkClient.java:163) at sun.net.www.http.HttpClient.openServer(HttpClient.java:394) at sun.net.www.http.HttpClient.openServer(HttpClient.java:529) - locked <0x04642058> (a sun.net.www.http.HttpClient) at sun.net.www.http.HttpClient.<init>(HttpClient.java:233) at sun.net.www.http.HttpClient.New(HttpClient.java:306) at sun.net.www.http.HttpClient.New(HttpClient.java:323) at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:860) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:801) at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:726) at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:904) - locked <0x04639dd0> (a sun.net.www.protocol.http.HttpURLConnection) "VM Thread" prio=10 tid=0x0260d000 nid=0x4dc runnable "VM Periodic Task Thread" prio=10 tid=0x02656000 nid=0x16b8 waiting on condition JNI global references: 667 Heap def new generation total 4928K, used 281K [0x04660000, 0x04bb0000, 0x09bb0000) eden space 4416K, 6% used [0x04660000, 0x046a6460, 0x04ab0000) from space 512K, 0% used [0x04ab0000, 0x04ab0000, 0x04b30000) to space 512K, 0% used [0x04b30000, 0x04b30000, 0x04bb0000) tenured generation total 10944K, used 0K [0x09bb0000, 0x0a660000, 0x14660000) the space 10944K, 0% used [0x09bb0000, 0x09bb0000, 0x09bb0200, 0x0a660000) compacting perm gen total 12288K, used 1704K [0x14660000, 0x15260000, 0x18660000) the space 12288K, 13% used [0x14660000, 0x1480a290, 0x1480a400, 0x15260000) No shared spaces configured.
Sample of an IBM Thread dump
NULL ------------------------------------------------------------------------ 0SECTION THREADS subcomponent dump routine NULL ================================= NULL 1XMCURTHDINFO Current Thread Details NULL ---------------------- NULL 1XMTHDINFO All Thread Details NULL ------------------ NULL 2XMFULLTHDDUMP Full thread dump J9 VM (J2RE 6.0 IBM J9 2.4 Windows Vista x86-32 build jvmwi3260sr4ifx-20090506_3499120090506_034991_lHdSMr, native threads): 3XMTHREADINFO "main" TID:0x00554B00, j9thread_t:0x00783AE4, state:CW, prio=5 3XMTHREADINFO1 (native thread ID:0x1E48, native priority:0x5, native policy:UNKNOWN) 4XESTACKTRACE at com/ibm/oti/vm/BootstrapClassLoader.loadClass(BootstrapClassLoader.java:65) 4XESTACKTRACE at sun/net/NetworkClient.isASCIISuperset(NetworkClient.java:122) 4XESTACKTRACE at sun/net/NetworkClient.<clinit>(NetworkClient.java:83) 4XESTACKTRACE at java/lang/J9VMInternals.initializeImpl(Native Method) 4XESTACKTRACE at java/lang/J9VMInternals.initialize(J9VMInternals.java:200(Compiled Code)) 4XESTACKTRACE at java/lang/J9VMInternals.initialize(J9VMInternals.java:167(Compiled Code)) 4XESTACKTRACE at sun/net/www/protocol/http/HttpURLConnection.getNewHttpClient(HttpURLConnection.java:783) 4XESTACKTRACE at sun/net/www/protocol/http/HttpURLConnection.plainConnect(HttpURLConnection.java:724) 4XESTACKTRACE at sun/net/www/protocol/http/HttpURLConnection.connect(HttpURLConnection.java:649) 4XESTACKTRACE at sun/net/www/protocol/http/HttpURLConnection.getOutputStream(HttpURLConnection.java:827) 4XESTACKTRACE at post.main(post.java:29) 3XMTHREADINFO "JIT Compilation Thread" TID:0x00555000, j9thread_t:0x00783D48, state:CW, prio=10 3XMTHREADINFO1 (native thread ID:0x111C, native priority:0xB, native policy:UNKNOWN) 3XMTHREADINFO "Signal Dispatcher" TID:0x6B693300, j9thread_t:0x00784210, state:R, prio=5 3XMTHREADINFO1 (native thread ID:0x1E34, native priority:0x5, native policy:UNKNOWN) 4XESTACKTRACE at com/ibm/misc/SignalDispatcher.waitForSignal(Native Method) 4XESTACKTRACE at com/ibm/misc/SignalDispatcher.run(SignalDispatcher.java:54) 3XMTHREADINFO "Gc Slave Thread" TID:0x6B693800, j9thread_t:0x0078EABC, state:CW, prio=5 3XMTHREADINFO1 (native thread ID:0x1AA4, native priority:0x5, native policy:UNKNOWN) 3XMTHREADINFO "Gc Slave Thread" TID:0x6B695500, j9thread_t:0x0078ED20, state:CW, prio=5 3XMTHREADINFO1 (native thread ID:0x14F8, native priority:0x5, native policy:UNKNOWN) 3XMTHREADINFO "Gc Slave Thread" TID:0x6B695A00, j9thread_t:0x0078EF84, state:CW, prio=5 3XMTHREADINFO1 (native thread ID:0x9E0, native priority:0x5, native policy:UNKNOWN) 3XMTHREADINFO "Gc Slave Thread" TID:0x6B698800, j9thread_t:0x0078F1E8, state:CW, prio=5 3XMTHREADINFO1 (native thread ID:0x1FB8, native priority:0x5, native policy:UNKNOWN) 3XMTHREADINFO "Gc Slave Thread" TID:0x6B698D00, j9thread_t:0x0078F44C, state:CW, prio=5 3XMTHREADINFO1 (native thread ID:0x1A58, native priority:0x5, native policy:UNKNOWN) 3XMTHREADINFO "Gc Slave Thread" TID:0x6B69BB00, j9thread_t:0x0078F6B0, state:CW, prio=5 3XMTHREADINFO1 (native thread ID:0x1430, native priority:0x5, native policy:UNKNOWN) 3XMTHREADINFO "Gc Slave Thread" TID:0x6B69C000, j9thread_t:0x029D8FE4, state:CW, prio=5 3XMTHREADINFO1 (native thread ID:0xBC4, native priority:0x5, native policy:UNKNOWN) NULL ------------------------------------------------------------------------
- New
- Runnable
- Non-Runnable
- Sleep for a time duration
- Wait for a condition/event
- Blocked for a lock
- Dead
In a thread dump, we are looking at threads that have been created already and are either in running or non-running states. So, the new (unless a new thread just got created at the exact moment the thread dump was generated) & dead states are really not of value or present in a thread dump.
Running state implies the thread is actively working on something. Coming to the non-runnable states, its possible a thread has nothing to do and sleeps for some duration and periodically checks for condition to start work. Wait for a condition implies the thread is waiting for some form of notification or an event and can start work once there is a green light. Its much more efficient to use waiting for a condition pattern instead of regular sleep-wake up pattern for optimal usage of resources. If there are multiple threads regularly doing a sleep and wake up periodically, they can be optimized to wake up on notify of an event and only one thread would be successful in getting the notify call instead of all doing the regular check for event in the sleep pattern.
Blocked implies it cannot proceed with its work till it can obtain a lock which is currently held by someone else. This is similar to obtaining a lock to a Critical Region or Semaphore (in OS semantics) before proceeding with the work.
States
Each of the thread entry in a thread dump specifies the state along with name and priority.
In Sun Hotspot, the state is also part of the individual thread entry. The main thread appears as RUNNABLE while the Finalizer GC thread appears in a WAITING state.
"main" prio=6 tid=0x00ec9400 nid=0x19e4 runnable [0x0024f000] java.lang.Thread.State: RUNNABLE at java.net.PlainSocketImpl.socketConnect(Native Method)
"Finalizer" daemon prio=8 tid=0x02614800 nid=0x1e70 in Object.wait() [0x1882f000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x04660b18> (a java.lang.ref.ReferenceQueue$Lock)
In a JRockit thread dump, there is no state mentioned for running thread. The Finalizer appears in native_waiting state.
"Main Thread" id=1 idx=0x4 tid=4184 prio=5 alive, in native at java/net/PlainSocketImpl.socketConnect(Ljava/net/InetAddress;II)V(Native Method) at java/net/PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
"Finalizer" id=8 idx=0x40 tid=1576 prio=8 alive, native_waiting, daemon at jrockit/memory/Finalizer.waitForFinalizees(J[Ljava/lang/Object;)I(Native Method) at jrockit/memory/Finalizer.access$700(Finalizer.java:12)
In IBM thread dump, the state is specified by the field state. CW stands for Condition Wait.
"main" TID:0x00554B00, j9thread_t:0x00783AE4, state:CW, prio=5 3XMTHREADINFO1 (native thread ID:0x1E48, native priority:0x5, native policy:UNKNOWN) 4XESTACKTRACE at com/ibm/oti/vm/BootstrapClassLoader.loadClass(BootstrapClassLoader.java:65) 4XESTACKTRACE at sun/net/NetworkClient.isASCIISuperset(NetworkClient.java:122) 4XESTACKTRACE at sun/net/NetworkClient.<clinit>(NetworkClient.java:83) .................... 4XESTACKTRACE at sun/net/www/protocol/http/HttpURLConnection.getOutputStream(HttpURLConnection.java:827) 4XESTACKTRACE at post.main(post.java:29) 3XMTHREADINFO "JIT Compilation Thread" TID:0x00555000, j9thread_t:0x00783D48, state:CW, prio=10 3XMTHREADINFO1 (native thread ID:0x111C, native priority:0xB, native policy:UNKNOWN) 3XMTHREADINFO "Gc Slave Thread" TID:0x6B693800, j9thread_t:0x0078EABC, state:CW, prio=5 3XMTHREADINFO1 (native thread ID:0x1AA4, native priority:0x5, native policy:UNKNOWN)
Locks
What are locks? Locks are regions that act as speed beakers or gatekeepers to ensure only one thread can obtain a temporary ownership of a resource and start work on something. This is mainly to ensure multiple threads don't work in the same region and mess up the final outcome or to ensure ordering of execution. This can be equated to only one person can operate on an ATM machine at a time as you don't want multiple different users to withdraw or deposit at the same time from the same machine without the ATM machine being able to confirm each of the operation being carried out (like credit multiple times a single deposit or credit to wrong accounts). Similar to Writer/Readers problem in OS scheduling, we don't want the multiple writers to intersperse their writes to the same page or readers to read incomplete content. JVM provides implicit locks whenever a code demarcates a method call as synchronized or a region of code within a method. The lock can be on an instance or class level or method level. JDK 1.6 provides higher level abstractions in form of concurrent.locks package (Rentrant Locks) similar to the jvm locks.
What happens when a thread requests for a Lock? If the lock is not owned by anyone, the thread becomes the new owner till it relinquishes it. What if another thread attempts to obtain ownership of the same lock when its already owned by a different thread? The new bidder gets added to a blocked list of contenders for the lock. As more threads join the waiting contenders list, the chances of getting ownership decreases among them. Normally the owner of the lock might be done finishing its job in a short duration and would relinquish the lock in a short while and one of the threads from the blocked list is chosen to become the new owner. But if the owner has to do heavy weight lifting and continues to own the lock for lot longer, and the lock is required by multiple threads, this can create a bottleneck in application or server execution as other threads cannot proceed without getting the lock which is held by the long or slow running owner. This can lead to blocked thread chains.
JRockit Thread blocking:
"ExecuteThread: '13' for queue: 'weblogic.kernel.Default (self-tuning)'" id=131 idx=0x248 tid=8047 prio=5 alive, blocked, native_blocked, daemon -- Blocked trying to get lock: weblogic/utils/classloaders/GenericClassLoader@0xd1bacb10[fat lock] at jrockit/vm/Threads.waitForUnblockSignal()V(Native Method) at jrockit/vm/Locks.fatLockBlockOrSpin(Locks.java:1411) at jrockit/vm/Locks.lockFat(Locks.java:1512) at jrockit/vm/Locks.monitorEnterSecondStageHard(Locks.java:1054)[optimized] at jrockit/vm/Locks.monitorEnterSecondStage(Locks.java:1005)[optimized] at jrockit/vm/Locks.monitorEnter(Locks.java:2179)[optimized] at java/lang/ClassLoader.loadClass(ClassLoader.java:292) at java/lang/ClassLoader.loadClass(ClassLoader.java:248) at weblogic/utils/classloaders/GenericClassLoader.loadClass(GenericClassLoader.java:179)
Sun Hotspot Thread blocking:
"ExecuteThread: '33' for queue: 'weblogic.kernel.Default (self-tuning)'" waiting for lock java.util.Collections$SynchronizedSet@1dc8b68c BLOCKED weblogic.management.provider.internal.RegistrationManagerImpl.invokeRegistrationHandlers(RegistrationManagerImpl.java:211) weblogic.management.provider.internal.RegistrationManagerImpl.unregister(RegistrationManagerImpl.java:105) weblogic.management.runtime.RuntimeMBeanDelegate.unregister(RuntimeMBeanDelegate.java:289) weblogic.messaging.common.PrivilegedActionUtilities$2.run(PrivilegedActionUtilities.java:56) weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:363) weblogic.security.service.SecurityManager.runAs(SecurityManager.java:147)
What if ThreadA owning LockA now needs LockB which is held by ThreadB. ThreadA will block till it gets ownership of LockB. Everyone else waiting for LockA will continue to block for LockA till ThreadA releases it. Deadlock is the condition that occurs if ThreadA is blocked for LockB while holding LockA and ThreadB is also blocked for LockA while holding LockB. Its a mutual deadlock where neither thread can proceed as its blocked for the other to release a resource it needs for its completion. The level of circular dependency can be simply between two thread or more threads. Restart of the JVM is the only way to clear the Deadlock.
Synchronization and Wait
A thread might acquire a lock but find itself unable to proceed as it has to wait for a condition to happen (like plane boarded and ready to fly but has to wait for take off signal from Control tower). In those cases, there might be others waiting to obtain lock owned by it. The owner thread can do a wait on the lock object which will make it relinquish the lock voluntarily and place itself on the waiting list till it gets a notification that has re-obtained a lock and rechecks the condition before proceeding ahead.
The usage pattern is for the thread that has changed the condition to obtain a lock and call notify on it before releasing the lock so the waiters can obtain the lock and check for the condition.
The following table shows three threads working on the same lock object but doing different activities.
Thread1 acquires lock on synchronized code segment and then waits for event notification. Will show up as having locked, released and now waiting for notification on the same Lock object (LockObj@0x849bb) |
void waitForRequest() { // Acquire lock on lockObj synchronized(lockObj) { doSomething1(); while (!condition) { // relinquish lock voluntarily lockObj.wait(); } doRestOfStuff(); } //lock released } "ExecuteThread: ‘1‘ -- Waiting for notification on: LockObj@0x849bb at Threads.waitForNotifySignal at java/lang/Object.wait() at ExThread.waitForRequest() ^-- Lock released while waiting: LockObj@0x849bb |
Thread2 acquires lock on synchronized code segment and then sends notification Will show as holding (or locked) Lock object (LockObj@0x849bb) |
void fillRequest() { // Acquire lock on lockObj synchronized(lockObj) { doSomething2(); condition = true; // notify on lock waiters lockObj.notify(); } //lock released } "ExecuteThread: ‘2‘ at doSomething2() -- Holding lock on: LockObj@0x849bb at ExThread.fillRequest() |
Thread3 waits to acquire lock on synchronized code segment. Will show up as Blocked or Waiting for Lock (on LockObj@0x849bb) |
void waitForLock() { // wait to acquire lock on lockObj synchronized(lockObj) { doSomething3(); } //lock released } "ExecuteThread: '3‘ -- Blocked trying to get lock: LockObj@0x849bb at ExThread.waitForLock() |
Reducing Locks
Now we know how locks can lead to blocked threads and affect application performance, how to reduce locking in application?
There are multiple options:
- Try to use the concurrent.locks package which provide extended capabilities compared to synchronized regions or method calls for timed lock acquisition, fairness among threads etc. But use with care as not following the guidelines can lead to locks never getting released after acquiring it and lot more suffering.
- Avoid synchronized methods. Go with smaller synchronized regions whenever possible. Try to use synchronizing on a specific object (locking a room or a cabinet) that needs to be protected rather than a bigger parent object (compared to locking the entire building).
- Increase the number of resources, where access to them leads to locking. For example, when there are finite resources, access requires owner lock the resource and release once done. If the number of resources are too little (like JDBC Connections or some pool for user objects) compared to user threads requesting for the resource, there would be higher contention. Try to increase the number of resources.
- Try to cache resources if each call to create the resource requires synchronized calls.
- Try to avoid the synchronized call entirely if possible by changing the logic of execution.
- Try to control the order of locking in cases of deadlocks. For example, every thread has to obtain LockA before obtaining LockB and not mix up the order of obtaining locks.
- If the owner of the lock has to wait for an event, then do a synchronization wait on the lock which would release the lock and put the owner itself on the blocked list for the lock automatically so other threads can obtain the lock and proceed.
Summary
In this section, we went over basics of thread states and thread locking. In the next section, we will drill deeper into capturing and analyzing Thread Dumps with special look into WebLogic Application Server specific thread dumps.
Dubbo架构设计详解 | 简单之美
Dubbo是Alibaba开源的分布式服务框架,它最大的特点是按照分层的方式来架构,使用这种方式可以使各个层之间解耦合(或者最大限度地松耦合)。从服务模型的角度来看,Dubbo采用的是一种非常简单的模型,要么是提供方提供服务,要么是消费方消费服务,所以基于这一点可以抽象出服务提供方(Provider)和服务消费方(Consumer)两个角色。关于注册中心、协议支持、服务监控等内容,详见后面描述。
总体架构
Dubbo的总体架构,如图所示:
Dubbo框架设计一共划分了10个层,而最上面的Service层是留给实际想要使用Dubbo开发分布式服务的开发者实现业务逻辑的接口层。图中左边淡蓝背景的为服务消费方使用的接口,右边淡绿色背景的为服务提供方使用的接口, 位于中轴线上的为双方都用到的接口。
下面,结合Dubbo官方文档,我们分别理解一下框架分层架构中,各个层次的设计要点:
- 服务接口层(Service):该层是与实际业务逻辑相关的,根据服务提供方和服务消费方的业务设计对应的接口和实现。
- 配置层(Config):对外配置接口,以ServiceConfig和ReferenceConfig为中心,可以直接new配置类,也可以通过spring解析配置生成配置类。
- 服务代理层(Proxy):服务接口透明代理,生成服务的客户端Stub和服务器端Skeleton,以ServiceProxy为中心,扩展接口为ProxyFactory。
- 服务注册层(Registry):封装服务地址的注册与发现,以服务URL为中心,扩展接口为RegistryFactory、Registry和RegistryService。可能没有服务注册中心,此时服务提供方直接暴露服务。
- 集群层(Cluster):封装多个提供者的路由及负载均衡,并桥接注册中心,以Invoker为中心,扩展接口为Cluster、Directory、Router和LoadBalance。将多个服务提供方组合为一个服务提供方,实现对服务消费方来透明,只需要与一个服务提供方进行交互。
- 监控层(Monitor):RPC调用次数和调用时间监控,以Statistics为中心,扩展接口为MonitorFactory、Monitor和MonitorService。
- 远程调用层(Protocol):封将RPC调用,以Invocation和Result为中心,扩展接口为Protocol、Invoker和Exporter。Protocol是服务域,它是Invoker暴露和引用的主功能入口,它负责Invoker的生命周期管理。Invoker是实体域,它是Dubbo的核心模型,其它模型都向它靠扰,或转换成它,它代表一个可执行体,可向它发起invoke调用,它有可能是一个本地的实现,也可能是一个远程的实现,也可能一个集群实现。
- 信息交换层(Exchange):封装请求响应模式,同步转异步,以Request和Response为中心,扩展接口为Exchanger、ExchangeChannel、ExchangeClient和ExchangeServer。
- 网络传输层(Transport):抽象mina和netty为统一接口,以Message为中心,扩展接口为Channel、Transporter、Client、Server和Codec。
- 数据序列化层(Serialize):可复用的一些工具,扩展接口为Serialization、 ObjectInput、ObjectOutput和ThreadPool。
从上图可以看出,Dubbo对于服务提供方和服务消费方,从框架的10层中分别提供了各自需要关心和扩展的接口,构建整个服务生态系统(服务提供方和服务消费方本身就是一个以服务为中心的)。
根据官方提供的,对于上述各层之间关系的描述,如下所示:
- 在RPC中,Protocol是核心层,也就是只要有Protocol + Invoker + Exporter就可以完成非透明的RPC调用,然后在Invoker的主过程上Filter拦截点。
- 图中的Consumer和Provider是抽象概念,只是想让看图者更直观的了解哪些类分属于客户端与服务器端,不用Client和Server的原因是Dubbo在很多场景下都使用Provider、Consumer、Registry、Monitor划分逻辑拓普节点,保持统一概念。
- 而Cluster是外围概念,所以Cluster的目的是将多个Invoker伪装成一个Invoker,这样其它人只要关注Protocol层Invoker即可,加上Cluster或者去掉Cluster对其它层都不会造成影响,因为只有一个提供者时,是不需要Cluster的。
- Proxy层封装了所有接口的透明化代理,而在其它层都以Invoker为中心,只有到了暴露给用户使用时,才用Proxy将Invoker转成接口,或将接口实现转成Invoker,也就是去掉Proxy层RPC是可以Run的,只是不那么透明,不那么看起来像调本地服务一样调远程服务。
- 而Remoting实现是Dubbo协议的实现,如果你选择RMI协议,整个Remoting都不会用上,Remoting内部再划为Transport传输层和Exchange信息交换层,Transport层只负责单向消息传输,是对Mina、Netty、Grizzly的抽象,它也可以扩展UDP传输,而Exchange层是在传输层之上封装了Request-Response语义。
- Registry和Monitor实际上不算一层,而是一个独立的节点,只是为了全局概览,用层的方式画在一起。
从上面的架构图中,我们可以了解到,Dubbo作为一个分布式服务框架,主要具有如下几个核心的要点:
服务定义
服务是围绕服务提供方和服务消费方的,服务提供方实现服务,而服务消费方调用服务。
服务注册
对于服务提供方,它需要发布服务,而且由于应用系统的复杂性,服务的数量、类型也不断膨胀;对于服务消费方,它最关心如何获取到它所需要的服务,而面对复杂的应用系统,需要管理大量的服务调用。而且,对于服务提供方和服务消费方来说,他们还有可能兼具这两种角色,即既需要提供服务,有需要消费服务。
通过将服务统一管理起来,可以有效地优化内部应用对服务发布/使用的流程和管理。服务注册中心可以通过特定协议来完成服务对外的统一。Dubbo提供的注册中心有如下几种类型可供选择:
- Multicast注册中心
- Zookeeper注册中心
- Redis注册中心
- Simple注册中心
服务监控
无论是服务提供方,还是服务消费方,他们都需要对服务调用的实际状态进行有效的监控,从而改进服务质量。
远程通信与信息交换
远程通信需要指定通信双方所约定的协议,在保证通信双方理解协议语义的基础上,还要保证高效、稳定的消息传输。Dubbo继承了当前主流的网络通信框架,主要包括如下几个:
- Mina
- Netty
- Grizzly
服务调用
下面从Dubbo官网直接拿来,看一下基于RPC层,服务提供方和服务消费方之间的调用关系,如图所示:
上图中,蓝色的表示与业务有交互,绿色的表示只对Dubbo内部交互。上述图所描述的调用流程如下:
- 服务提供方发布服务到服务注册中心;
- 服务消费方从服务注册中心订阅服务;
- 服务消费方调用已经注册的可用服务
接着,将上面抽象的调用流程图展开,详细如图所示:
注册/注销服务
服务的注册与注销,是对服务提供方角色而言,那么注册服务与注销服务的时序图,如图所示:
服务订阅/取消
为了满足应用系统的需求,服务消费方的可能需要从服务注册中心订阅指定的有服务提供方发布的服务,在得到通知可以使用服务时,就可以直接调用服务。反过来,如果不需要某一个服务了,可以取消该服务。下面看一下对应的时序图,如图所示:
协议支持
Dubbo支持多种协议,如下所示:
- Dubbo协议
- Hessian协议
- HTTP协议
- RMI协议
- WebService协议
- Thrift协议
- Memcached协议
- Redis协议
在通信过程中,不同的服务等级一般对应着不同的服务质量,那么选择合适的协议便是一件非常重要的事情。你可以根据你应用的创建来选择。例如,使用RMI协议,一般会受到防火墙的限制,所以对于外部与内部进行通信的场景,就不要使用RMI协议,而是基于HTTP协议或者Hessian协议。
参考补充
Dubbo以包结构来组织各个模块,各个模块及其关系,如图所示:
可以通过Dubbo的代码(使用Maven管理)组织,与上面的模块进行比较。简单说明各个包的情况:
- dubbo-common 公共逻辑模块,包括Util类和通用模型。
- dubbo-remoting 远程通讯模块,相当于Dubbo协议的实现,如果RPC用RMI协议则不需要使用此包。
- dubbo-rpc 远程调用模块,抽象各种协议,以及动态代理,只包含一对一的调用,不关心集群的管理。
- dubbo-cluster 集群模块,将多个服务提供方伪装为一个提供方,包括:负载均衡、容错、路由等,集群的地址列表可以是静态配置的,也可以是由注册中心下发。
- dubbo-registry 注册中心模块,基于注册中心下发地址的集群方式,以及对各种注册中心的抽象。
- dubbo-monitor 监控模块,统计服务调用次数,调用时间的,调用链跟踪的服务。
- dubbo-config 配置模块,是Dubbo对外的API,用户通过Config使用Dubbo,隐藏Dubbo所有细节。
- dubbo-container 容器模块,是一个Standalone的容器,以简单的Main加载Spring启动,因为服务通常不需要Tomcat/JBoss等Web容器的特性,没必要用Web容器去加载服务。
参考链接
dubbo 分布式服务框架安装
dubbo 分布式服务框架安装,windows安装自行切换命令。以下步骤在win7下验证成功。 0. Install the git and maven command line: yum install git or: apt-get install git cd ~ wget http://www.apache.org/dist//maven/binaries/apache-maven-2.2.1-bin.tar.gz tar zxvf apache-maven-2.2.1-bin.tar.gz vi .bash_profile - edit: export PATH=$PATH:~/apache-maven-2.2.1/bin source .bash_profile 1. Checkout the dubbo source code: cd ~ git clone https://github.com/alibaba/dubbo.git dubbo git checkout -b dubbo-2.4.0 git checkout master 2. Import the dubbo source code to eclipse project: cd ~/dubbo mvn eclipse:eclipse Eclipse -> Menu -> File -> Import -> Exsiting Projects to Workspace -> Browse -> Finish Context Menu -> Run As -> Java Application: dubbo-demo-provider/src/test/java/com.alibaba.dubbo.demo.provider.DemoProvider dubbo-demo-consumer/src/test/java/com.alibaba.dubbo.demo.consumer.DemoConsumer dubbo-monitor-simple/src/test/java/com.alibaba.dubbo.monitor.simple.SimpleMonitor dubbo-registry-simple/src/test/java/com.alibaba.dubbo.registry.simple.SimpleRegistry Edit Config: dubbo-demo-provider/src/test/resources/dubbo.properties dubbo-demo-consumer/src/test/resources/dubbo.properties dubbo-monitor-simple/src/test/resources/dubbo.properties dubbo-registry-simple/src/test/resources/dubbo.properties 3. Build the dubbo binary package: cd ~/dubbo mvn clean install -Dmaven.test.skip cd dubbo/target ls 4. Install the demo provider: cd ~/dubbo/dubbo-demo-provider/target tar zxvf dubbo-demo-provider-2.4.0-assembly.tar.gz cd dubbo-demo-provider-2.4.0/bin ./start.sh 5. Install the demo consumer: cd ~/dubbo/dubbo-demo-consumer/target tar zxvf dubbo-demo-consumer-2.4.0-assembly.tar.gz cd dubbo-demo-consumer-2.4.0/bin ./start.sh cd ../logs tail -f stdout.log 6. Install the simple monitor: cd ~/dubbo/dubbo-simple-monitor/target tar zxvf dubbo-simple-monitor-2.4.0-assembly.tar.gz cd dubbo-simple-monitor-2.4.0/bin ./start.sh http://127.0.0.1:8080 7. Install the simple registry: cd ~/dubbo/dubbo-simple-registry/target tar zxvf dubbo-simple-registry-2.4.0-assembly.tar.gz cd dubbo-simple-registry-2.4.0/bin ./start.sh cd ~/dubbo/dubbo-demo-provider/conf vi dubbo.properties - edit: dubbo.registry.adddress=dubbo://127.0.0.1:9090 cd ../bin ./restart.sh cd ~/dubbo/dubbo-demo-consumer/conf vi dubbo.properties - edit: dubbo.registry.adddress=dubbo://127.0.0.1:9090 cd ../bin ./restart.sh cd ~/dubbo/dubbo-simple-monitor/conf vi dubbo.properties - edit: dubbo.registry.adddress=dubbo://127.0.0.1:9090 cd ../bin ./restart.sh 8. Install the zookeeper registry: cd ~ wget http://www.apache.org/dist//zookeeper/zookeeper-3.3.3/zookeeper-3.3.3.tar.gz tar zxvf zookeeper-3.3.3.tar.gz cd zookeeper-3.3.3/conf cp zoo_sample.cfg zoo.cfg vi zoo.cfg - edit: dataDir=/home/xxx/data cd ../bin ./zkServer.sh start cd ~/dubbo/dubbo-demo-provider/conf vi dubbo.properties - edit: dubbo.registry.adddress=zookeeper://127.0.0.1:2181 cd ../bin ./restart.sh cd ~/dubbo/dubbo-demo-consumer/conf vi dubbo.properties - edit: dubbo.registry.adddress=zookeeper://127.0.0.1:2181 cd ../bin ./restart.sh cd ~/dubbo/dubbo-simple-monitor/conf vi dubbo.properties - edit: dubbo.registry.adddress=zookeeper://127.0.0.1:2181 cd ../bin ./restart.sh 9. Install the redis registry: cd ~ wget http://redis.googlecode.com/files/redis-2.4.8.tar.gz tar xzf redis-2.4.8.tar.gz cd redis-2.4.8 make nohup ./src/redis-server redis.conf & cd ~/dubbo/dubbo-demo-provider/conf vi dubbo.properties - edit: dubbo.registry.adddress=redis://127.0.0.1:6379 cd ../bin ./restart.sh cd ~/dubbo/dubbo-demo-consumer/conf vi dubbo.properties - edit: dubbo.registry.adddress=redis://127.0.0.1:6379 cd ../bin ./restart.sh cd ~/dubbo/dubbo-simple-monitor/conf vi dubbo.properties - edit: dubbo.registry.adddress=redis://127.0.0.1:6379 cd ../bin ./restart.sh 10. Install the admin console: cd ~/dubbo/dubbo-admin mvn jetty:run -Ddubbo.registry.address=zookeeper://127.0.0.1:2181 http://root:[email protected]:8080