http://90haofang-163-com.iteye.com/blog/2096073
前段时间线上出了一次内存溢出的问题,对heap dump做了一些分析,同时也对thread dump做了了解,查了些资料,这篇文章是总结如何对thread dump进行分析,从而定位问题并解决。
java 和 线程相关概念
线程竞争:多个线程等待一个被另一个线程持有的锁(共享资源),在web应用在线程频繁访问共享资源,例如打印log,线程必须先持有共享资源的lock.
死锁:一种比较特殊的线程竞争,两个或两个以上的线程等待其他线程执行完成并释放共享资源,而形成的一个资源请求的环路。每个线程都处于等待其他线程的完成的状态中。
很多问题由于线程资源竞争引起,当出现问题时,可以通过分析Thread dump 内容
关于java 线程和线程同步
一个线程可以其他线程同时执行,为了确保多个线程安全的使用共享资源,所以同一个时间段只能有一个线程使用共享资源,其他线程必须处于等待状态,这个通过线程同步来实现。
java中的线程同步通过使用一个叫做monitor(自已看做一个资源访问锁)。每个java对象拥有唯一的monitor,这个monitor同时只能被一个线程持有,其他线程想持有的话,必须在等待队列里面,
直到其他线程释放了monitor.
java线程状态图
从图可以看到java线程成拥有以上五个状态
线程状态
NEW:线程刚刚创建,还没有被cpu调度执行过。
RUNABLE:当前线程占用着cpu并且在执行任务(由于操作系统资源调度的,也可能处于WAING状态),
BLOCKED:当前线程等待其他线程释放锁,从而获取monitor lock。
WAITING:线程被现实调用wait,或者join 或 park方法。
TIMED_WAITING:通过sleep,wait,join,park等方法(与WAITING不同的,他需要指定最大等待时间)等待时间由方法参数传入进去。
java线程类型分为守护线程和非守护线程。
当没有其他非守护线程运行时,它才会停止工作,即使你自己不创建任何线程,java应用也会创建一些线程,大部分是守护线程,例如GC的守护线程或者是JMX
如果一个线程执行 ‘static void main(String[] args)’ 方法会创建一个非守护线程(通过VM thread),如果这个主线程停止了,那个其他线程也会停止。
如何获得Thread dump?
thread dump只是当前时间点的线程状态统计,所有建议多dump几次线程栈,然后才能能观察线程的变化。
介绍三种方式
首先使用jps -v 查看当前运行的java进程
1:jdk6以上,使用jstack
也可以使用JVisualVM jdk自带的
"pool-3-thread-2-EventThread" daemon prio=6 tid=0x0000000008840000 nid=0x1174 waiting on condition [0x000000001386f000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000000c51ea098> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:491) Locked ownable synchronizers: - None |
Thread Name:当使用 Java.lang.Thread 类创建的线程,线程的名字就是Thread-(number),使用java.uil.concurrent.ThreadFactory创建的,名字是pool-(number)-thread-(name)
Priority:线程的优先级。
Thread ID :线程的唯一ID(cpu 和内存使用率,可以通过thrad id获取)
Thread Status:线程状态
Thread stack 线程调用栈信息
(BLOCKED) 不能获得lock的状态
当lock被一个线程占有的时候,那么其他线程将不能持有锁,这个时候这些线程处于BLOCKED状态。
下面dump可以看出 pool-1-thread-2 和pool-1-thread-3阻塞lock(0x0000000780a000b0),这个被pool-1-thread-1持有。
"BLOCKED_TEST pool-1-thread-1" prio=6 tid=0x0000000006904800 nid=0x28f4 runnable [0x000000000785f000] java.lang.Thread.State: RUNNABLE at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:282) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) - locked <0x0000000780a31778> (a java.io.BufferedOutputStream) at java.io.PrintStream.write(PrintStream.java:432) - locked <0x0000000780a04118> (a java.io.PrintStream) at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:202) at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:272) at sun.nio.cs.StreamEncoder.flushBuffer(StreamEncoder.java:85) - locked <0x0000000780a040c0> (a java.io.OutputStreamWriter) at java.io.OutputStreamWriter.flushBuffer(OutputStreamWriter.java:168) at java.io.PrintStream.newLine(PrintStream.java:496) - locked <0x0000000780a04118> (a java.io.PrintStream) at java.io.PrintStream.println(PrintStream.java:687) - locked <0x0000000780a04118> (a java.io.PrintStream) at com.nbp.theplatform.threaddump.ThreadBlockedState.monitorLock(ThreadBlockedState.java:44) - locked <0x0000000780a000b0> (a com.nbp.theplatform.threaddump.ThreadBlockedState) at com.nbp.theplatform.threaddump.ThreadBlockedState$1.run(ThreadBlockedState.java:7) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Locked ownable synchronizers: 对象本身持有的锁 - <0x0000000780a31758> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) "BLOCKED_TEST pool-1-thread-2" prio=6 tid=0x0000000007673800 nid=0x260c waiting for monitor entry [0x0000000008abf000] java.lang.Thread.State: BLOCKED (on object monitor) at com.nbp.theplatform.threaddump.ThreadBlockedState.monitorLock(ThreadBlockedState.java:43) - waiting to lock <0x0000000780a000b0> (a com.nbp.theplatform.threaddump.ThreadBlockedState) at com.nbp.theplatform.threaddump.ThreadBlockedState$2.run(ThreadBlockedState.java:26) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Locked ownable synchronizers: - <0x0000000780b0c6a0> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) "BLOCKED_TEST pool-1-thread-3" prio=6 tid=0x00000000074f5800 nid=0x1994 waiting for monitor entry [0x0000000008bbf000] java.lang.Thread.State: BLOCKED (on object monitor) at com.nbp.theplatform.threaddump.ThreadBlockedState.monitorLock(ThreadBlockedState.java:42) - waiting to lock <0x0000000780a000b0> (a com.nbp.theplatform.threaddump.ThreadBlockedState) at com.nbp.theplatform.threaddump.ThreadBlockedState$3.run(ThreadBlockedState.java:34) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Locked ownable synchronizers: - <0x0000000780b0e1b8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) |
下面看下死锁的状态
线程DEADLOC_TEST_1 需要获得线程DEAD_TEST_2 持有的锁(0x00000007d58f5e60)才能能继续运行,目前处于BLOCKED,而线程DEADLOCK_TEST-2需要获得DEADLOCK_TEST_3持有的锁(0x00000007d58f5e78)才能继续运行,而DEADLOCK_TEST_3需要获得DEADLOCK_TEST_1持有的锁(0x00000007d58f5e48)才能继续运行。可以看出这个锁请求形成了一个闭环。每个线程都处于等待状态。
上图 两个线程之间的资源请求环路,导致死锁
"DEADLOCK_TEST-1" daemon prio=6 tid=0x000000000690f800 nid=0x1820 waiting for monitor entry [0x000000000805f000] java.lang.Thread.State: BLOCKED (on object monitor) at com.nbp.theplatform.threaddump.ThreadDeadLockState$DeadlockThread.goMonitorDeadlock(ThreadDeadLockState.java:197) - waiting to lock <0x00000007d58f5e60> (a com.nbp.theplatform.threaddump.ThreadDeadLockState$Monitor) at com.nbp.theplatform.threaddump.ThreadDeadLockState$DeadlockThread.monitorOurLock(ThreadDeadLockState.java:182) - locked <0x00000007d58f5e48> (a com.nbp.theplatform.threaddump.ThreadDeadLockState$Monitor) at com.nbp.theplatform.threaddump.ThreadDeadLockState$DeadlockThread.run(ThreadDeadLockState.java:135) Locked ownable synchronizers: - None "DEADLOCK_TEST-2" daemon prio=6 tid=0x0000000006858800 nid=0x17b8 waiting for monitor entry [0x000000000815f000] java.lang.Thread.State: BLOCKED (on object monitor) at com.nbp.theplatform.threaddump.ThreadDeadLockState$DeadlockThread.goMonitorDeadlock(ThreadDeadLockState.java:197) - waiting to lock <0x00000007d58f5e78> (a com.nbp.theplatform.threaddump.ThreadDeadLockState$Monitor) at com.nbp.theplatform.threaddump.ThreadDeadLockState$DeadlockThread.monitorOurLock(ThreadDeadLockState.java:182) - locked <0x00000007d58f5e60> (a com.nbp.theplatform.threaddump.ThreadDeadLockState$Monitor) at com.nbp.theplatform.threaddump.ThreadDeadLockState$DeadlockThread.run(ThreadDeadLockState.java:135) Locked ownable synchronizers: - None "DEADLOCK_TEST-3" daemon prio=6 tid=0x0000000006859000 nid=0x25dc waiting for monitor entry [0x000000000825f000] java.lang.Thread.State: BLOCKED (on object monitor) at com.nbp.theplatform.threaddump.ThreadDeadLockState$DeadlockThread.goMonitorDeadlock(ThreadDeadLockState.java:197) - waiting to lock <0x00000007d58f5e48> (a com.nbp.theplatform.threaddump.ThreadDeadLockState$Monitor) at com.nbp.theplatform.threaddump.ThreadDeadLockState$DeadlockThread.monitorOurLock(ThreadDeadLockState.java:182) - locked <0x00000007d58f5e78> (a com.nbp.theplatform.threaddump.ThreadDeadLockState$Monitor) at com.nbp.theplatform.threaddump.ThreadDeadLockState$DeadlockThread.run(ThreadDeadLockState.java:135) Locked ownable synchronizers: - None |
接下来我们可以通过线程DUMP分析一些问题
1.CPU使用率异常高
linux:执行 top 找出最耗CPU的进程
找出进程1218使用cpu最多的线程
top -Hp 1218 -d 1 -n 1
例如是:16682 把它转出16进制:使用python
2014-07-24 16:22:25 Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.65-b04-462 mixed mode): "Attach Listener" daemon prio=9 tid=7fafd6800800 nid=0x110afa000 waiting on condition [00000000] java.lang.Thread.State: RUNNABLE "DestroyJavaVM" prio=5 tid=7fafd5149000 nid=0x1090ff000 waiting on condition [00000000] java.lang.Thread.State: RUNNABLE "CPU 100%" prio=5 tid=7fafd4836800 nid=0x111a84000 runnable [111a83000] java.lang.Thread.State: RUNNABLE at form.FormTest$1.run(FormTest.java:37) at java.lang.Thread.run(Thread.java:695) "Monitor Ctrl-Break" daemon prio=5 tid=7fafd4879800 nid=0x11196e000 runnable [11196d000] java.lang.Thread.State: RUNNABLE at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:439) - locked <7f3596450> (a java.net.SocksSocketImpl) at java.net.ServerSocket.implAccept(ServerSocket.java:468) |
测试代码:
从上面可以看到导致CPU100%的代码位于FromTest的37行,这样就很快定位问题,下面是就是这个很简单的死循环导致CPU100%
2.程序处理性能急剧下降
首先获取几份Thread dump ,找到处于BLOCKED状态的线程。
" DB-Processor-13" daemon prio=5 tid=0x003edf98 nid=0xca waiting for monitor entry [0x000000000825f000] java.lang.Thread.State: BLOCKED (on object monitor) at beans.ConnectionPool.getConnection(ConnectionPool.java:102) - waiting to lock <0xe0375410> (a beans.ConnectionPool) at beans.cus.ServiceCnt.getTodayCount(ServiceCnt.java:111) at beans.cus.ServiceCnt.insertCount(ServiceCnt.java:43) "DB-Processor-14" daemon prio=5 tid=0x003edf98 nid=0xca waiting for monitor entry [0x000000000825f020] java.lang.Thread.State: BLOCKED (on object monitor) at beans.ConnectionPool.getConnection(ConnectionPool.java:102) - waiting to lock <0xe0375410> (a beans.ConnectionPool) at beans.cus.ServiceCnt.getTodayCount(ServiceCnt.java:111) at beans.cus.ServiceCnt.insertCount(ServiceCnt.java:43) " DB-Processor-3" daemon prio=5 tid=0x00928248 nid=0x8b waiting for monitor entry [0x000000000825d080] java.lang.Thread.State: RUNNABLE at oracle.jdbc.driver.OracleConnection.isClosed(OracleConnection.java:570) - waiting to lock <0xe03ba2e0> (a oracle.jdbc.driver.OracleConnection) at beans.ConnectionPool.getConnection(ConnectionPool.java:112) - locked <0xe0386580> (a java.util.Vector) - locked <0xe0375410> (a beans.ConnectionPool) at beans.cus.Cue_1700c.GetNationList(Cue_1700c.java:66) at org.apache.jsp.cue_1700c_jsp._jspService(cue_1700c_jsp.java:120) |
多获取几份处于BLOCKED状态的线程信息。对比看出是否线程一致处于BLOCKED状态,从而定位问题。
一个字一个字打起来真费劲额( ⊙ o ⊙ )!
相关推荐
java thread dump 分析
java线程分析工具
java 线程Dump 分析工具: Java的TDA线程转储分析器是一个用于分析Sun Java VM生成的线程转储和堆信息的小型Swing GUI(目前用1.4测试)。它从提供的日志文件中解析线程转储和类直方图。它提供关于发现的线程转储的...
好用的线程dump分析工具
Java Thread Dump Analyzing
dumpAnalyzer图形化进行内存分析,可以很方便的知道系统里面什么地方存在性能瓶颈
thread and mointor dump analyzer,java线程文件分析工具
IBM Thread and Monitor Dump Analyzer for Java(简称 jca)。它可以识别Java线程中的挂起,死锁,资源竞争,和瓶颈。 使用方法: java -Xmx1000m -jar jca456.jar
Java线程转储分析: 《Analyzing Java Thread Dumps.pdf》 《Java Thread Dumps 分析.pdf》
包括 堆内存dump分析工具和thread dump(java core)的分析工具 还包括两篇关于dump分析的文档,分析java系统内存泄露死循环等非常需要啊有木有