`
淘气天空lc
  • 浏览: 46588 次
  • 性别: Icon_minigender_1
  • 来自: 北京
社区版块
存档分类
最新评论

Java Thread dump分析

阅读更多

 http://90haofang-163-com.iteye.com/blog/2096073    

    前段时间线上出了一次内存溢出的问题,对heap dump做了一些分析,同时也对thread dump做了了解,查了些资料,这篇文章是总结如何对thread dump进行分析,从而定位问题并解决。

java 和 线程相关概念
线程竞争:多个线程等待一个被另一个线程持有的锁(共享资源),在web应用在线程频繁访问共享资源,例如打印log,线程必须先持有共享资源的lock.
死锁:一种比较特殊的线程竞争,两个或两个以上的线程等待其他线程执行完成并释放共享资源,而形成的一个资源请求的环路。每个线程都处于等待其他线程的完成的状态中。
很多问题由于线程资源竞争引起,当出现问题时,可以通过分析Thread dump 内容
关于java 线程和线程同步
一个线程可以其他线程同时执行,为了确保多个线程安全的使用共享资源,所以同一个时间段只能有一个线程使用共享资源,其他线程必须处于等待状态,这个通过线程同步来实现。
java中的线程同步通过使用一个叫做monitor(自已看做一个资源访问锁)。每个java对象拥有唯一的monitor,这个monitor同时只能被一个线程持有,其他线程想持有的话,必须在等待队列里面,

直到其他线程释放了monitor.
java线程状态图


 
从图可以看到java线程成拥有以上五个状态
线程状态
NEW:线程刚刚创建,还没有被cpu调度执行过。
RUNABLE:当前线程占用着cpu并且在执行任务(由于操作系统资源调度的,也可能处于WAING状态),
BLOCKED:当前线程等待其他线程释放锁,从而获取monitor lock。
WAITING:线程被现实调用wait,或者join 或 park方法。
TIMED_WAITING:通过sleep,wait,join,park等方法(与WAITING不同的,他需要指定最大等待时间)等待时间由方法参数传入进去。
java线程类型分为守护线程和非守护线程。
当没有其他非守护线程运行时,它才会停止工作,即使你自己不创建任何线程,java应用也会创建一些线程,大部分是守护线程,例如GC的守护线程或者是JMX
如果一个线程执行 ‘static void main(String[] args)’ 方法会创建一个非守护线程(通过VM thread),如果这个主线程停止了,那个其他线程也会停止。
如何获得Thread dump?
thread dump只是当前时间点的线程状态统计,所有建议多dump几次线程栈,然后才能能观察线程的变化。
介绍三种方式
首先使用jps -v 查看当前运行的java进程


 

1:jdk6以上,使用jstack


 
也可以使用JVisualVM jdk自带的

"pool-3-thread-2-EventThread" daemon prio=6 tid=0x0000000008840000 nid=0x1174 waiting on condition [0x000000001386f000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x00000000c51ea098> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:491)
 
   Locked ownable synchronizers:
- None 

Thread Name:当使用 Java.lang.Thread 类创建的线程,线程的名字就是Thread-(number),使用java.uil.concurrent.ThreadFactory创建的,名字是pool-(number)-thread-(name)
Priority:线程的优先级。
Thread ID :线程的唯一ID(cpu 和内存使用率,可以通过thrad id获取)
Thread Status:线程状态
Thread stack 线程调用栈信息
(BLOCKED) 不能获得lock的状态
当lock被一个线程占有的时候,那么其他线程将不能持有锁,这个时候这些线程处于BLOCKED状态。
下面dump可以看出 pool-1-thread-2 和pool-1-thread-3阻塞lock(0x0000000780a000b0),这个被pool-1-thread-1持有。


 

"BLOCKED_TEST pool-1-thread-1" prio=6 tid=0x0000000006904800 nid=0x28f4 runnable [0x000000000785f000]
   java.lang.Thread.State: RUNNABLE
                at java.io.FileOutputStream.writeBytes(Native Method)
                at java.io.FileOutputStream.write(FileOutputStream.java:282)
                at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
                at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
                - locked <0x0000000780a31778> (a java.io.BufferedOutputStream)
                at java.io.PrintStream.write(PrintStream.java:432)
                - locked <0x0000000780a04118> (a java.io.PrintStream)
                at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:202)
                at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:272)
                at sun.nio.cs.StreamEncoder.flushBuffer(StreamEncoder.java:85)
                - locked <0x0000000780a040c0> (a java.io.OutputStreamWriter)
                at java.io.OutputStreamWriter.flushBuffer(OutputStreamWriter.java:168)
                at java.io.PrintStream.newLine(PrintStream.java:496)
                - locked <0x0000000780a04118> (a java.io.PrintStream)
                at java.io.PrintStream.println(PrintStream.java:687)
                - locked <0x0000000780a04118> (a java.io.PrintStream)
                at com.nbp.theplatform.threaddump.ThreadBlockedState.monitorLock(ThreadBlockedState.java:44)
                - locked <0x0000000780a000b0> (a com.nbp.theplatform.threaddump.ThreadBlockedState)
                at com.nbp.theplatform.threaddump.ThreadBlockedState$1.run(ThreadBlockedState.java:7)
                at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
                at java.lang.Thread.run(Thread.java:662)

   Locked ownable synchronizers: 对象本身持有的锁
                - <0x0000000780a31758> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)

"BLOCKED_TEST pool-1-thread-2" prio=6 tid=0x0000000007673800 nid=0x260c waiting for monitor entry [0x0000000008abf000]
   java.lang.Thread.State: BLOCKED (on object monitor)
                at com.nbp.theplatform.threaddump.ThreadBlockedState.monitorLock(ThreadBlockedState.java:43)
                - waiting to lock <0x0000000780a000b0> (a com.nbp.theplatform.threaddump.ThreadBlockedState)
                at com.nbp.theplatform.threaddump.ThreadBlockedState$2.run(ThreadBlockedState.java:26)
                at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
                at java.lang.Thread.run(Thread.java:662)

   Locked ownable synchronizers:
                - <0x0000000780b0c6a0> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)

"BLOCKED_TEST pool-1-thread-3" prio=6 tid=0x00000000074f5800 nid=0x1994 waiting for monitor entry [0x0000000008bbf000]
   java.lang.Thread.State: BLOCKED (on object monitor)
                at com.nbp.theplatform.threaddump.ThreadBlockedState.monitorLock(ThreadBlockedState.java:42)
                - waiting to lock <0x0000000780a000b0> (a com.nbp.theplatform.threaddump.ThreadBlockedState)
                at com.nbp.theplatform.threaddump.ThreadBlockedState$3.run(ThreadBlockedState.java:34)
                at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
                at java.lang.Thread.run(Thread.java:662)

   Locked ownable synchronizers:
                - <0x0000000780b0e1b8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)

下面看下死锁的状态
线程DEADLOC_TEST_1 需要获得线程DEAD_TEST_2 持有的锁(0x00000007d58f5e60)才能能继续运行,目前处于BLOCKED,而线程DEADLOCK_TEST-2需要获得DEADLOCK_TEST_3持有的锁(0x00000007d58f5e78)才能继续运行,而DEADLOCK_TEST_3需要获得DEADLOCK_TEST_1持有的锁(0x00000007d58f5e48)才能继续运行。可以看出这个锁请求形成了一个闭环。每个线程都处于等待状态。



上图 两个线程之间的资源请求环路,导致死锁



 

"DEADLOCK_TEST-1" daemon prio=6 tid=0x000000000690f800 nid=0x1820 waiting for monitor entry [0x000000000805f000]
   java.lang.Thread.State: BLOCKED (on object monitor)
                at com.nbp.theplatform.threaddump.ThreadDeadLockState$DeadlockThread.goMonitorDeadlock(ThreadDeadLockState.java:197)
                - waiting to lock <0x00000007d58f5e60> (a com.nbp.theplatform.threaddump.ThreadDeadLockState$Monitor)
                at com.nbp.theplatform.threaddump.ThreadDeadLockState$DeadlockThread.monitorOurLock(ThreadDeadLockState.java:182)
                - locked <0x00000007d58f5e48> (a com.nbp.theplatform.threaddump.ThreadDeadLockState$Monitor)
                at com.nbp.theplatform.threaddump.ThreadDeadLockState$DeadlockThread.run(ThreadDeadLockState.java:135)

   Locked ownable synchronizers:
                - None

"DEADLOCK_TEST-2" daemon prio=6 tid=0x0000000006858800 nid=0x17b8 waiting for monitor entry [0x000000000815f000]
   java.lang.Thread.State: BLOCKED (on object monitor)
                at com.nbp.theplatform.threaddump.ThreadDeadLockState$DeadlockThread.goMonitorDeadlock(ThreadDeadLockState.java:197)
                - waiting to lock <0x00000007d58f5e78> (a com.nbp.theplatform.threaddump.ThreadDeadLockState$Monitor)
                at com.nbp.theplatform.threaddump.ThreadDeadLockState$DeadlockThread.monitorOurLock(ThreadDeadLockState.java:182)
                - locked <0x00000007d58f5e60> (a com.nbp.theplatform.threaddump.ThreadDeadLockState$Monitor)
                at com.nbp.theplatform.threaddump.ThreadDeadLockState$DeadlockThread.run(ThreadDeadLockState.java:135)

   Locked ownable synchronizers:
                - None

"DEADLOCK_TEST-3" daemon prio=6 tid=0x0000000006859000 nid=0x25dc waiting for monitor entry [0x000000000825f000]
   java.lang.Thread.State: BLOCKED (on object monitor)
                at com.nbp.theplatform.threaddump.ThreadDeadLockState$DeadlockThread.goMonitorDeadlock(ThreadDeadLockState.java:197)
                - waiting to lock <0x00000007d58f5e48> (a com.nbp.theplatform.threaddump.ThreadDeadLockState$Monitor)
                at com.nbp.theplatform.threaddump.ThreadDeadLockState$DeadlockThread.monitorOurLock(ThreadDeadLockState.java:182)
                - locked <0x00000007d58f5e78> (a com.nbp.theplatform.threaddump.ThreadDeadLockState$Monitor)
                at com.nbp.theplatform.threaddump.ThreadDeadLockState$DeadlockThread.run(ThreadDeadLockState.java:135)

   Locked ownable synchronizers:
                - None

   接下来我们可以通过线程DUMP分析一些问题
1.CPU使用率异常高
linux:执行 top 找出最耗CPU的进程



 
找出进程1218使用cpu最多的线程
top -Hp 1218 -d 1 -n 1
例如是:16682  把它转出16进制:使用python
 

 

2014-07-24 16:22:25 Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.65-b04-462 mixed mode): "Attach Listener" daemon prio=9 tid=7fafd6800800 nid=0x110afa000 waiting on condition [00000000]   java.lang.Thread.State: RUNNABLE "DestroyJavaVM" prio=5 tid=7fafd5149000 nid=0x1090ff000 waiting on condition [00000000]   java.lang.Thread.State: RUNNABLE "CPU 100%" prio=5 tid=7fafd4836800 nid=0x111a84000 runnable [111a83000]   java.lang.Thread.State: RUNNABLE        at form.FormTest$1.run(FormTest.java:37)        at java.lang.Thread.run(Thread.java:695) "Monitor Ctrl-Break" daemon prio=5 tid=7fafd4879800 nid=0x11196e000 runnable [11196d000]   java.lang.Thread.State: RUNNABLE        at java.net.PlainSocketImpl.socketAccept(Native Method)        at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:439)        - locked <7f3596450> (a java.net.SocksSocketImpl)        at java.net.ServerSocket.implAccept(ServerSocket.java:468)

测试代码:
从上面可以看到导致CPU100%的代码位于FromTest的37行,这样就很快定位问题,下面是就是这个很简单的死循环导致CPU100%

 
2.程序处理性能急剧下降
首先获取几份Thread dump ,找到处于BLOCKED状态的线程。

" DB-Processor-13" daemon prio=5 tid=0x003edf98 nid=0xca waiting for monitor entry [0x000000000825f000]
java.lang.Thread.State: BLOCKED (on object monitor)
                at beans.ConnectionPool.getConnection(ConnectionPool.java:102)
                - waiting to lock <0xe0375410> (a beans.ConnectionPool)
                at beans.cus.ServiceCnt.getTodayCount(ServiceCnt.java:111)
                at beans.cus.ServiceCnt.insertCount(ServiceCnt.java:43)

"DB-Processor-14" daemon prio=5 tid=0x003edf98 nid=0xca waiting for monitor entry [0x000000000825f020]
java.lang.Thread.State: BLOCKED (on object monitor)
                at beans.ConnectionPool.getConnection(ConnectionPool.java:102)
                - waiting to lock <0xe0375410> (a beans.ConnectionPool)
                at beans.cus.ServiceCnt.getTodayCount(ServiceCnt.java:111)
                at beans.cus.ServiceCnt.insertCount(ServiceCnt.java:43)

" DB-Processor-3" daemon prio=5 tid=0x00928248 nid=0x8b waiting for monitor entry [0x000000000825d080]
java.lang.Thread.State: RUNNABLE
                at oracle.jdbc.driver.OracleConnection.isClosed(OracleConnection.java:570)
                - waiting to lock <0xe03ba2e0> (a oracle.jdbc.driver.OracleConnection)
                at beans.ConnectionPool.getConnection(ConnectionPool.java:112)
                - locked <0xe0386580> (a java.util.Vector)
                - locked <0xe0375410> (a beans.ConnectionPool)
                at beans.cus.Cue_1700c.GetNationList(Cue_1700c.java:66)
                at org.apache.jsp.cue_1700c_jsp._jspService(cue_1700c_jsp.java:120)

多获取几份处于BLOCKED状态的线程信息。对比看出是否线程一致处于BLOCKED状态,从而定位问题。
一个字一个字打起来真费劲额( ⊙ o ⊙ )!














  • 大小: 26.7 KB
  • 大小: 35.7 KB
  • 大小: 14.1 KB
  • 大小: 14.1 KB
  • 大小: 26.2 KB
  • 大小: 29.5 KB
  • 大小: 157.5 KB
  • 大小: 68.1 KB
  • 大小: 83.7 KB
  • 大小: 16.5 KB
2
1
分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics