Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

Performance DOJO Part I

No description
by

Ljubisa Punosevac

on 14 January 2015

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of Performance DOJO Part I

Performance
tuning
analysis and
DOJO Part I
Agenda
Performance basics

System performance monitoring basics

Examples
Tune important things
Performance basics
Performance basics
Make your decisions based on measurements
Performance basics
Be wise about the tools you use
Performance basics - Measures
Response time
Throughput
Load
Utilization
System subsystems/metrics

CPU
context switch
CPU utilization
Memory
capacity
bandwidth
Disk I/O
read/write count
Network
latency and bandwidth
packet width and utilization
Basic general rules
Automate everything (tests, monitoring, reporting)
Measure everything (system metrics, application logs, garbage collector logs, periodic thread dumps, etc.)
Test on a system identical to production

Questions to ask of the software we are testing:
do response times and throughput conform to specified requirements?
does performance remain stable over time?
does it remain stable and service the load?
does it fail gracefully?
does it recover gracefully once load returns to normal?


Test types:

Load test

Stress test

Spike test

Endurance

Tests
Basic OS concepts/terms
Context switches
Virtual memory
Scheduler
locks, semaphores, monitors, mutexes
Paging

Swapping
TLB - Translation lookaside buffer
MMU - memory management unit
Hard to calculate exact cost of one context switch
Use 80000 CPU cycles
Example:
CPU core has 3GHz clock
Measurement gives 7000 context switch per second
3.000.000.000 CPU cycles per second
7000 x 80.000 = 560.000.000 cycles is spent on context switching
560.000.000 / 3.000.000.000 = 18.6% CPU time is spent just for context switching
Reasons:
Lock contention


Scheduler overview
Scheduler functions
Literature and resources for further learning
Java Performance
- Charlie Hunt - http://www.amazon.de/Java-Performance-Addison-Wesley-Charlie-Hunt/dp/0137142528/ref=sr_1_2?ie=UTF8&qid=1417010463&sr=8-2&keywords=java+performance
System Performance
- Brendan Gregg - http://www.amazon.de/Systems-Performance-Enterprise-Brendan-Gregg-ebook/dp/B00FLYU9T2/ref=sr_1_1?ie=UTF8&qid=1417010580&sr=8-1&keywords=system+performance
The Garbage Collection Handbook: The Art of Automatic Memory Management
- more authors - http://www.amazon.de/Garbage-Collection-Handbook-Automatic-Management/dp/1420082795/ref=la_B000AQTHV2_1_1?s=books&ie=UTF8&qid=1417011951&sr=1-1
DTrace: Dynamic Tracing in Oracle Solaris, Mac OS X and FreeBSD
- more authors - http://www.amazon.de/DTrace-Dynamic-Tracing-Solaris-FreeBSD/dp/0132091518/ref=sr_1_1?s=books-intl-de&ie=UTF8&qid=1417364650&sr=1-1&keywords=dtrace
http://www.kodewerk.com/
http://www.oracle.com/technetwork/java/whitepaper-135217.html
http://www.oracle.com/technetwork/java/javase/tech/memorymanagement-whitepaper-1-150020.pdf
http://technet.microsoft.com/en-us/library/cc938613.aspx
http://en.wikipedia.org/wiki/Producer%E2%80%93consumer_problem
http://www.writeulearn.com/binary-semaphore-mutex-semaphore/
http://www.spec.org/
https://jcp.org/en/jsr/detail?id=133
Thank you for attending!
2. Semaphores - signaling mechanism between processes

3. Monitors - a programming language construct that supports controlled access to shared data
Monitors encapsulates:
shared data structures
procedures that operate on the shared data
synchronization between concurrent threads that invoke those procedures

4. Mutex - MUTual EXclusions - mutual exclusion refers to the requirement of ensuring that no two concurrent processes are in their critical section at the same time; it is a basic requirement in concurrency control, to prevent race conditions
Tasks

Time sharing

Preemption

Load balancing
Dominant consumer
System or User?

User: JVM or Application?

None: What now?
CPU monitoring - vmstat
output is divided in 6 sections (source man vmstat):
Procs
r: The number of processes waiting for run time.
b: The number of processes in uninterruptible sleep.
Memory
swpd: the amount of virtual memory used.
free: the amount of idle memory.
buff: the amount of memory used as buffers.
cache: the amount of memory used as cache.
inact: the amount of inactive memory. (-a option)
active: the amount of active memory. (-a option)
Swap
si: Amount of memory swapped in from disk (/s).
so: Amount of memory swapped to disk (/s).
IO
bi: Blocks received from a block device (blocks/s).
bo: Blocks sent to a block device (blocks/s).
System
in: The number of interrupts per second, including the clock.
cs: The number of context switches per second.
CPU
These are percentages of total CPU time.
us: Time spent running non-kernel code. (user time, including nice time)
sy: Time spent running kernel code. (system time)
id: Time spent idle. Prior to Linux 2.5.41, this includes IO-wait time.
wa: Time spent waiting for IO. Prior to Linux 2.5.41, included in idle.
st: Time stolen from a virtual machine. Prior to Linux 2.6.11, unknown.
vmstat example
vmstat example 2
vmstat example 4
http://en.wikipedia.org/wiki/Lock_(computer_science)
http://www.thomas-krenn.com/en/wiki/Linux_Performance_Measurements_using_vmstat
http://www.oracle.com/technetwork/java/6-performance-137236.html
http://www.ufsdump.org/papers/uuasc-june-2006.pdf
http://www.ibm.com/developerworks/java/library/j-jvmc3/index.html
http://www.oracle.com/technetwork/java/javase/tech/largememory-jsp-137182.html
http://www.techpaste.com/2012/02/19/monitoring-jvm-lock-contention-hot-locks-involuntary-context-switches-thread-migrations-unixwindowslinux/
http://www.csn.ul.ie/~mel/projects/vm/guide/pdf/understand.pdf
https://perf.wiki.kernel.org/index.php/Tutorial
http://www.oracle.com/technetwork/java/biasedlocking-oopsla2006-wp-149958.pdf
http://shallahamer-orapub.blogspot.de/2010/07/os-cpu-run-queue-not-what-it-appears.html
http://blog.scoutapp.com/articles/2009/07/31/understanding-load-averages
http://www.linuxjournal.com/article/9001
http://blog.scoutapp.com/articles/2011/02/10/understanding-disk-i-o-when-should-you-be-worried
http://vmtoday.com/2009/12/storage-basics-part-i-intro/
http://www.thegeekstuff.com/2014/11/pidstat-examples/
http://idak.gop.edu.tr/esmeray/UnderStandingKernel.pdf
https://www.cs.rutgers.edu/~pxk/416/notes/07-scheduling.html
http://www.ibm.com/developerworks/linux/library/l-completely-fair-scheduler/
http://www.ibm.com/developerworks/java/library/j-5things8/index.html
http://linux.die.net/man/8/lsof
http://guichaz.free.fr/iotop/
Literature and resources for further learning
1. Locks - synchronization mechanism for enforcing limits on access to a resource in an environment where there are many threads of execution
lock overhead - The extra resources for using locks, like the memory space allocated for locks, the CPU time to initialize and destroy locks, and the time for acquiring or releasing locks
lock contention - This occurs whenever one process or thread attempts to acquire a lock held by another process or thread
deadlock
vmstat example 3
vmstat example 5
CPU monitoring - mpstat
number of processors
usr, sys, idle
Process monitoring - pidstat
CPU monitoring - scheduler run queue
vmstat

r and b columns

r - The amount of threads in the run queue + number of threads which are currently executing. These are threads that are runnable, but the CPU is not available to execute them.

b - number of processes blocked and waiting on IO requests to finish.

number of CPUs/cores

acceptable values
vmstat example 2 - observations
vmstat example 3 - observations
vmstat example 5 - observations
No more vmstat!!!
Gives statistics about process(es) with some of the following:

statistics for process in certain period: pidstat -p PID 5
statistics for ALL running processes: pidstat -p ALL
statistics based on process name: pidstat -C "java"
I/O statistics for a specific process(es): pidstat -p PID -d 1
paging activity statistics for a specific process(es): pidstat -p PID -r
can show command name and its arguments: pidstat -C java -l
statistics of selected process(es) and its children: pidstat -T CHILD
statistics in a tree format: pidstat -t -C "mysql"
all statitics with tree fiew per process, per thread: pidstat -rud -h -C "java"
pidstat - showcase
Recipe: How everything works?
Ingredients (minimal requirements):
Inexperienced
developer (1pcs)
Nasty Java application with performance problems (1pcs)
Spices: Deadline (1pcs)
One possible resolution scenario (there are many "IF" inside):
1. Check CPU utilization (dominant consumer)
2. Find the process which utilizes the CPU most
3. If the process is Java application we are monitoring find
out in which are problem might be (CPU, memory, I/O, network)
4. Check which thread is doing "problematic" work
5. Take thread dump
6. Examine thread dump stacktrace and fix it!
Network activity monitoring
I/O activity monitoring
System metrics monitoring - summary
CPU utilization - user CPU, system
CPU and idle time

Context switching

CPU scheduling

Virtual memory consumption

Disc I/O

Network I/O
Nethogs








Nicstat
Tools summary
https://wiki.hybris.com/display/INFRA/Tools
Monitor I/O utilization, r/w count

Tools to use:

iostat

iotop
Find open file
Find which file is open on the disk by monitored process

lsof command with p switch
Literature and resources for further learning
http://blog.scoutapp.com/articles/2011/02/10/understanding-disk-i-o-when-should-you-be-worried
http://www.cyberciti.biz/faq/howto-linux-get-list-of-open-files/
http://www.cyberciti.biz/tips/linux-procfs-file-descriptors.html
http://en.wikipedia.org/wiki/Load_%28computing%29
http://www.howtogeek.com/194642/understanding-the-load-average-on-linux-and-other-unix-like-systems/
http://www.thegeekstuff.com/linux-101-hacks-ebook/
http://www.thegeekstuff.com/2010/08/tcpdump-command-examples/
http://docs.oracle.com/cd/E23824_01/html/821-1453/ipconfig-142.html
http://www.techrepublic.com/article/configure-it-quick-use-netstat-to-monitor-server-connections-in-linux/
http://www-01.ibm.com/support/knowledgecenter/ssw_ibm_i_71/rzab6/howdosockets.htm
http://www.oracle.com/webfolder/technetwork/tutorials/obe/java/gc01/index.html
CPU load
How to see average system load:

uptime

top
Network activity monitoring
Netstat
Used for:
monitoring incomming/outgoing connections
interface statistics
routing tables

Basic usage examples:
list network interfaces:netstat -i (or netstat -ie)
listing all the ports of TCP and UDP connections: netstat -a
listing all LISTENING connections: netstat -l
displaying service name with PID: netstat -tp
showing statistics by protocol (TCP in this case): netstat -st
print statistics continously: netstat -c

Statuses of the connections:
LISTEN—The socket is listening for incoming connections. Those sockets are only displayed if the –a or –l switch is set
ESTABLISHED—The socket has an established connection.
SYN_SENT—The socket is actively attempting to establish a connection.
SYN_RECV—A connection request has been received from the network.
TIME_WAIT—The socket is waiting after close to handle packets still in the network.
FIN_WAIT1—The socket is closed, and the connection is shutting down.
FIN_WAIT2—The connection is closed and the socket is waiting for a shutdown from the remote end.
CLOSE_WAIT—The remote end has shut down, and it is waiting for the socket to close.
CLOSED—The socket is not being used.
How to determine dominant consumer
Processor architecture
Throughput focuses on maximizing the amount of work by an application in a specific period of time. Examples of how throughput might be measured include:
The number of transactions completed in a given time.
The number of jobs that a batch program can complete in an hour.
The number of database queries that can be completed in an hour.
High pause times are acceptable for applications that focus on throughput. Since high throughput applications focus on benchmarks over longer periods of time, quick response time is not a consideration.
Responsiveness refers to how quickly an application or system responds with a requested piece of data. Examples include:
How quickly a desktop UI responds to an event
How fast a website returns a page
How fast a database query is returned
For applications that focus on respsonsiveness, large pause times are not acceptable. The focus is on responding in short periods of time.
Full transcript