Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Performance DOJO Part II

No description
by

Ljubisa Punosevac

on 21 January 2015

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Performance DOJO Part II

Performance
tuning
analysis and
DOJO Part II
Agenda
JVM architecture

JVM performance basics

Parallel Garbage Collector

Thread dumps

Examples
HotSpot JVM architecture
Key components
Garbage collector basics
The Generational Garbage Collection Process
Additional aging
Object allocation
Minor garbage collection
Object aging
General sizing rules
Examples
Sizing
Basic OS concepts/terms
Context switches
Virtual memory
Schedulers
Paging

Swapping
TLB - Translation lookaside buffer
MMU - memory management unit
Hard to calculate exact cost of one context switch
Use 80000 CPU cycles
Example:
CPU core has 3GHz clock
Measurement gives 7000 context switch per second
3.000.000.000 CPU cycles per second
7000 x 80.000 = 560.000.000 cycles is spent on context switching
560.000.000 / 3.000.000.000 = 18.6% CPU time is spent just for context switching
Reasons:
Lock contention


Parallel Garbage Collector
Minor Garbage Collection
Literature and resources for further learning
Java Performance
- Charlie Hunt - http://www.amazon.de/Java-Performance-Addison-Wesley-Charlie-Hunt/dp/0137142528/ref=sr_1_2?ie=UTF8&qid=1417010463&sr=8-2&keywords=java+performance
The Garbage Collection Handbook: The Art of Automatic Memory Management
- more authors - http://www.amazon.de/Garbage-Collection-Handbook-Automatic-Management/dp/1420082795/ref=la_B000AQTHV2_1_1?s=books&ie=UTF8&qid=1417011951&sr=1-1
http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html
http://www.oracle.com/technetwork/java/gc-tuning-5-138395.html
http://www.oracle.com/webfolder/technetwork/tutorials/obe/java/gc01/index.html
http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html
http://docs.oracle.com/javase/1.5.0/docs/guide/vm/gc-ergonomics.html
http://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/index.html
http://www.oracle.com/technetwork/java/javase/tech/memorymanagement-whitepaper-1-150020.pdf
http://www.oracle.com/technetwork/java/javase/tooldescr-136044.html#gbmpn
http://java-is-the-new-c.blogspot.de/2013/07/tuning-and-benchmarking-java-7s-garbage.html
http://docs.oracle.com/javase/7/docs/api/java/lang/Thread.State.html
http://blog.ragozin.info/2011/07/hotspot-jvm-garbage-collection-options.html

Thank you for attending!
Tasks

Time sharing

Preemption

Load balancing
Dominant consumer
System or User?
User: JVM or Application?
None: What now?
Thread dumps
vmstat example
vmstat example 2
Thread states
http://www.oracle.com/technetwork/java/biasedlocking-oopsla2006-wp-149958.pdf
http://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.2.1
http://www.captaindebug.com/2012/10/investigating-deadlocks-part-1.html
http://docs.oracle.com/javase/6/docs/technotes/tools/share/jstat.html
http://docs.oracle.com/javase/7/docs/technotes/guides/vm/gc-ergonomics.html
http://docs.oracle.com/javase/1.5.0/docs/guide/vm/server-class.html
http://www.oracle.com/technetwork/java/tuning-139912.html






Literature and resources for further learning
vmstat example 3
vmstat example 5
Thread status WAITING and TIMED_WAITING, what it means?
"derby.rawStoreDaemon" daemon prio=10 tid=0x0000000001520800 nid=0x7f04 in Object.wait()

[0x00007fbaa58b4000]
java.lang.Thread.State:
TIMED_WAITING
(on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x000000073272ee70> (a org.apache.derby.impl.services.daemon.BasicDaemon)
at org.apache.derby.impl.services.daemon.BasicDaemon.rest(Unknown Source)
- locked <0x000000073272ee70> (a org.apache.derby.impl.services.daemon.BasicDaemon)
at org.apache.derby.impl.services.daemon.BasicDaemon.run(Unknown Source)
at java.lang.Thread.run(Thread.java:745)


"Timer-0" daemon prio=10 tid=0x00007fbaa02e1800 nid=0x7ed2 in Object.wait() [0x00007fbaa5db9000]
java.lang.Thread.State:
WAITING
(on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x0000000726765c98> (a java.util.TaskQueue)
at java.lang.Object.wait(Object.java:503)
at java.util.TimerThread.mainLoop(Timer.java:526)
- locked <0x0000000726765c98> (a java.util.TaskQueue)
at java.util.TimerThread.run(Timer.java:505)

WAITING and TIMED_WAITING thread statuses
CPU monitoring - scheduler run queue
vmstat
r and b columns

r - The amount of threads in the run queue + number of threads which are currently executing. These are threads that are runnable, but the CPU is not available to execute them.

b - number of processes blocked and waiting on IO requests to finish.
vmstat example 2 - observations
vmstat example 3 - observations
vmstat example 5 - observations
No more vmstat!!!
WAITING:

Object.wait with no timeout

Thread.join with no timeout

LockSupport.park



TIMED_WAITING:

Thread.sleep

Object.wait with timeout

Thread.join with timeout

LockSupport.parkNanos

LockSupport.parkUntil
JVM tuning, step by step
Determine Memory Footprint - before we start
Parallel Garbage Collector Ergonomics
Deadlocks
Guidelines for calculating Heap Size
Automatic garbage collection is the process of looking at heap memory, identifying which objects are in use and which are not, and deleting the unused objects. An in use object, or a referenced object, means that some part of your program still maintains a pointer to that object. An unused object, or unreferenced object, is no longer referenced by any part of your program. So the memory used by an unreferenced object can be reclaimed.
Marking phase
Process where the garbage collector identifies which pieces of memory are in use and which are not.
Normal deletion phase*
Process where garbage collector removes unreferenced objects leaving referenced objects and pointers to free space.
Deletion with compacting phase**
To further improve performance, in addition to deleting unreferenced objects, you can also compact the remaining referenced objects. By moving referenced object together, this makes new memory allocation much easier and faster.
** Only one delete phase is performed within garbage collector depending on garbage collector type
* Only one delete phase is performed within garbage collector depending on garbage collector type
Hotspot Heap Structure
Young generation

Old (Tenured) generation

Permanent generation
New objects are allocated to the eden space.

Both survivor spaces start out empty.
Eden space fills up, a minor garbage collection is triggered.
Referenced objects are moved to the first survivor space.

Unreferenced objects are deleted when the eden space is cleared.
At the next minor GC, the same thing happens for the eden space. Unreferenced objects are deleted and referenced objects are moved to a survivor space. However, in this case, they are moved to the second survivor space (S1). In addition, objects from the last minor GC on the first survivor space (S0) have their age incremented and get moved to S1. Once all surviving objects have been moved to S1, both S0 and eden are cleared. Notice we now have differently aged object in the survivor space.
At the next minor GC, the same process repeats. However this time the survivor spaces switch. Referenced objects are moved to S0. Surviving objects are aged. Eden and S1 are cleared.
Promotion
After a minor GC, when aged objects reach a certain age threshold (8 in this example) they are promoted from young generation to old generation.
As minor GCs continue to occur objects will continue to be promoted to the old generation space.
Garbage collection process summary
Garbage collector role
Explicit vs. automatic memory management


Responsibilities:
allocating memory
ensuring that any referenced objects remain in memory, and
recovering memory used by objects that are no longer reachable from references in executing code.


Desireable characteristics:
efficient, without long pauses
limit fragmentation
scalable


Design choices:
Serial versus Parallel
Concurrent versus Stop-the-world
Compacting versus Non-compacting versus Copying
JVM Performance metrics - what to monitor
-XX:+UseParallelGC, default in Java 6 - Java 7u3
-XX:+UseParallelOldGC, default in Java 7u4 onwards
Timestamp
Type of GC
Type of Collector for minor GC
Occupancy of the young generation space prior to the garbage collection
Occupancy of the young generation space after garbage collection, occupancy of the tosurvivor space*
Size of the young generation
Occupancy of the young generation and old generation space before the garbage collection occured
Occupancy of the young generation and old generation space after the garbage collection occured
Sum young and old generation space
Duration of the GC
User, system and real time**
* why is that size of the tosurvivor space?
** level of parallelism
Full Garbage Collection
Timestamp
Type of GC event
Type of Collector for young generation
Occupancy of the young generation space prior to the garbage collection
Occupancy of the young generation space after garbage collection
Size of the young generation
Occupancy of the old generation space before the garbage collection occured
Type of Collector for old generation
Occupancy of the old generation space after the garbage collection occured
Size of the old generation space
Size of the heap before the garbage collection
Size of the heap after the garbage collection*
Size of the heap
Type of Collector for permanent generation
Size of the permanent generation before the garbage collection
Size of the permanent generation after the garbage collection
Size of the permanent generation
Duration of the GC
* Size of which part of the heap is this?
"Finalizer" daemon prio=10 tid=0x000b2800 nid=0x5 in Object.wait() [0xf3f7f000..0xf3f7f9c0]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0xf4000b40> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:116)
- locked <0xf4000b40> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:132)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
Thread name
Indication if the thread is a daemon thread
Thread priority (prio)
Thread ID (tid), which is the address of a thread structure in memory
ID of the native thread (nid)
Thread state, which indicates what the thread was doing at the time of the thread dump
Address range, which gives an estimate of the valid stack region for the thread
NEW The thread has not yet started.

RUNNABLE The thread is executing in the Java virtual machine.

BLOCKED The thread is blocked waiting for a monitor lock.

WAITING The thread is waiting indefinitely for another thread to perform a particular action.

TIMED_WAITING The thread is waiting for another thread to perform an action for up to a specified waiting time.

TERMINATED The thread has exited.
1. What does it means?
2. When this happens, how/when waiting could be/is ended?
Garbage collector:
-XX:+PrintGCDetails (spaces, their occupancies and sizes, amount of pauses of GC events)
-XX:+PrintGCDateStamps or -XX:+PrintGCTimeStamps
-Xloggc (log files for GC)

Application execution and stop time
Stop time (because of GC or "safepoint" operation)
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCApplicationConcurrentTime

JVM heap space sizes tuning
-XX:+PrintTenuringDistribution
-XX:+PrintAdaptiveSizePolicy (G1 and Parallel GC)

Examples will come after GC explanation...

Safepoint operation examples:
garbage collection support
thread stack dumps
thread suspension/stopping
biased locking revocation
Multithreaded programming - theory
Multithreaded programming - practice
JVM deployment model

Single JVM deployment

Multiple JVM deployment

JVM runtime

client or server

32-bit or 64-bit JVM

GC Tuning Fundamentals

Throughput

Latency

Footprint


1. Set proper logging options:
















2. Get some GC logs collected


2.1 Check/calculate amount of objects being promoted from young to old generation



In this example: (3143737K - 370116K) - (4400366K - 1636259K) = 9514K
Garbage collector:
-XX:+PrintGCDetails (spaces, their occupancies and sizes, amount of pauses of GC events)
-XX:+PrintGCDateStamps or -XX:+PrintGCTimeStamps
-Xloggc (log files for GC)

Application execution and stop time
Stop time (because of GC or "safepoint" operation)
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintSafepointStatistics

JVM heap space sizes tuning
-XX:+PrintTenuringDistribution
-XX:+PrintAdaptiveSizePolicy (G1 and Parallel GC)





Found one Java-level deadlock:
=============================
"pool-1-thread-5":
waiting to lock monitor 0x00007fd08a0eac08 (object 0x00000007aab38dc0, a eu.javaspecialists.deadlock.lab1.Krasi),
which is held by "pool-1-thread-1"
"pool-1-thread-1":
waiting to lock monitor 0x00007fd08c805758 (object 0x00000007aab38dd0, a eu.javaspecialists.deadlock.lab1.Krasi),
which is held by "pool-1-thread-2"
"pool-1-thread-2":
waiting to lock monitor 0x00007fd08c8056a8 (object 0x00000007aab38de0, a eu.javaspecialists.deadlock.lab1.Krasi),
which is held by "pool-1-thread-3"
"pool-1-thread-3":
waiting to lock monitor 0x00007fd08a0e9608 (object 0x00000007aab38df0, a eu.javaspecialists.deadlock.lab1.Krasi),
which is held by "pool-1-thread-4"
"pool-1-thread-4":
waiting to lock monitor 0x00007fd08a0e9558 (object 0x00000007aab38e00, a eu.javaspecialists.deadlock.lab1.Krasi),
which is held by "pool-1-thread-5"

Tools to get thread dump:

kill -3 PID
jstack PID
JVisualVM
Dynatrace
...
Tools to examine thread dump:

tda
JVisualVM
...
In order to avoid OutOfMemoryException there are three choices:

increase the size of the heap
tune -XX:GCTimeLimit=time-limit parameter, where time limit is the upper limit on the amount of time spent in garbage collection in percent of total time (default is 98).
tune -XX:GCHeapFreeLimit=space-limit, where space limit is the lower limit on the amount of space freed during a garbage collection in percent of the maximum heap (default is 2).

-XX:+UseAdaptiveSizePolicy is used for:
a desired maximum GC pause goal
a desired application throughput goal
minimum footprint

-XX:MaxGCPauseMillis=nnn
A hint to the virtual machine that pause times of nnn milliseconds or less are desired. The VM will adjust the java heap size and other GC-related parameters in an attempt to keep GC-induced pauses shorter than nnn milliseconds

-XX:GCTimeRatio=nnn
A hint to the virtual machine that it's desirable that not more than 1 / (1 + nnn) of the application execution time be spent in the collector. For example -XX:GCTimeRatio=19 sets a goal of 5% of the total time for GC and throughput goal of 95%. Default is 99.
Determine Memory Footprint
Live data size - measured in application steady state

Young generation space sizing parameters:
-XX:NewSize=<n>[g|m|k]
-XX:MaxNewSize=<n>[g|m|k]
-Xmn<n>[g|m|k]
When -Xmn is defined and -Xmx ≠ -Xms, young generation space will remain constant with any growth or contraction of the Java heap size, so it should be used only when -Xmx = -Xms

Old generation space size:
initial old generation space = -Xms minus -XX:NewSize
maximum old generation space size = -Xmx minus -XX:MaxNewSize
when -Xmx = -Xms, then old generation size is -Xmx (or -Xms) minus -Xmn

Permanent generation size:
-XX:PermSize=<n>[g|m|k]
-XX:MaxPermSize=<n>[g|m|k]
jmap -histo:live PID
creates heap profile with object allocation information and couses full garbage collection
32-bit or 64-bit JVM
Full transcript