Java Performance – 3 – A java Performance Toolbox


This article is part 4 for the series Java Performance that summarize the java performance book by Scot Oaks

In the previous chapter we have discussed performance testing methods. We have mentioned the difference between Micorbenchmarks, Macrobenchmarks and Mesobecnhmars. We have also talked about the responsetime, throughput and variability.

In this chapter we are going to discuss some intersting measurement tools for cpu, network and disk. We will understand the difference different profilers in java and talk a bit about JFR Java Flight Recorder.

Great, let’s start the third chapter…

Chapter Title:

A java Performance Toolbox

Performance analysis is all about visibility—knowing what is going on inside an application and in the application’s environment. Visibility is all about tools. And so performance tuning is all about tools.

1) Operating System Tools and Analysis

The starting point for program analysis is not Java-specific at all: it is the basic set of monitoring tools that come with the operating system.
We are going to see a quick look on operating system methods to take a look into the usage of:

A- CPU Usage

CPU usage is typically divided into two categories: user time and system time (Windows refers to this as privileged time).

Goal is to maximize the cpu utilization.

If you run vmstat 1 on your Linux desktop, you will get a series of lines (one every second) that look like this:
As you can find in the output:

The CPU can be idle for multiple reasons:

These first two situations are always indicative of something that can be addressed. If contention on the lock can be reduced or the database can be tuned so that it sends the answer back more quickly, then the program will run faster, and the average CPU use of the application will go up (assuming, of course, that there isn’t another such issue that will continue to block the application).

Java and Single CPU:
If code is batch-style application, then the cpu will not be idle, because it has work to do always [if job is blocked for i/o or something, another batch can use the cpu .. etc]

Java and multi CPU:
The general idea is the same as in single cpu, however making sure individual threads are not blocked will drive the CPU higher.

CPU Run Queue

You can monitor the number of threads that can be run [aka not blocked]. Those threads are called to be in the CPU Run Queue. You can find the length of the run queue in the previous image at the first column procs r
vmstat queue length


B- Disk Usage

Monitoring disk usage has two important goals.

You can use iostat command to monitor the disk, Let’s see an example:

Applications that write to disk can be bottlenecked both because they are writing data inefficiently (too little through‐ put) or because they are writing too much data (too much throughput).

C- Network Usage

If you are running an application that uses the network—for example, a REST server—you must monitor the network traffic as well.
You can use nicstat to monitor the network, it is not the default of the system but it’s opensource with more features.

Applications that write to the network can be bottlenecked because they are writing data inefficiently (too little through‐ put) or because they are writing too much data (too much throughput).

2) Java Monitoring Tools

To gain insight into the JVM itself, Java monitoring tools are required. These tools come with the JDK:

A- JVM Commands

if you are using docker, you can run them using docker exec except jconsole and jvisualvm.

These tools fits into these broad areas:
• Basic VM information
• Thread information
• Class information
• Live GC analysis
• Heap dump postprocessing
• Profiling a JVM

B- Basic VM Information

% jcmd process_id VM.uptime

% jcmd process_id VM.system_properties


% jinfo -sysprops process_id

% jcmd process_id VM.version

% jcmd process_id VM.flags [-all]

Note you can change tuning flags dynamically at runtime using jinfo command, example:

% jinfo -flag -PrintGCDetails process_id # turns off PrintGCDetails
% jinfo -flag PrintGCDetails process_id

3) Profiling Tools

Profilers are the most important tool in a performance analyst’s toolbox. Many profil‐ ers are available for Java, each with its own advantages and disadvantages.

Many common Java profiling tools are themselves written in Java and work by “attaching” themselves to the application to be profiled. This attachment is via a socket or via a native Java interface called the JVM Tool Interface (JVMTI).
This means you must pay attention to tuning the profiling tool just as you would tune any other Java application. In particular, if the application being profiled is large, it can transfer quite a lot of data to the profiling tool, so the profiling tool must have a sufficiently large heap to handle the data.

Profiling happens in one of two modes:

A- Sampling Profilers

Pros:The basic mode of profiling and carries the least amount of overhead.

Cons: However, sampling profilers can be subject to all sorts of errors, for example, the most common sampling erro is as shown in the figure below:
Image description
The thread here is alternating between executing methodA (shown in the shaded bars) and methodB (shown in the clear bars). If the timer fires only when the thread happens to be in methodB, the profile will report that the thread spent all its time executing methodB; in reality, more time was actually spent in methodA.

Reason: this is due to safepoint bias, which means that the profiler can get the stack trace of a thread only when the thread is at safepoint, when they are:
• Blocked on a synchronized lock
• Blocked waiting for I/O
• Blocked waiting for a monitor
• Parked
• Executing Java Native Interface (JNI) code (unless they perform a GC locking function)

B- Instrumented Profilers

Pros: Instrumented profilers are much more intrusive than sampling profilers, but they can also give more beneficial information about what’s happening inside a program.

Cons: They are much more likely to introduce performance differences into the application than are sampling profilers.

Instrumented profilers work by altering the bytecode sequence of classes as they are loaded (inserting code to count the invocations, and so on).


Is this a better profile than the sampled version? It depends; there is no way to know in a given situation which is the more accurate profile. The invocation count of an instrumented profile is certainly accurate, and that additional information is often helpful in determining where the code is spending more time and which things are more fruitful to optimize.

C- Native Profilers

Tools like async-profiler and Oracle Developer Studio have the capability to profile native code in addition to Java code. This has two advantages:

4) Java Flight Recorder JFR

Java Flight Recorder (JFR) is a feature of the JVM that performs lightweight performance analysis of applications while they are running. As its name suggests, JFR data is a history of events in the JVM that can be used to diagnose the past performance and operations of the JVM.

The basic operation of JFR is that a set of events is enabled (for example, one event is that a thread is blocked waiting for a lock), and each time a selected event occurs, data about that event is saved (either in memory or to a file).

The higher the number of events, the higher the performance got affected by the JFR.

A- Java Mission Control

The usual tool to examine JFR recordings is Java Mission Control (jmc), though other tools exist, and you can use toolkits to write your own analysis tools.
The Java Mission Control program (jmc) starts a window that displays the JVM pro‐ cesses on the machine and lets you select one or more processes to monitor. Figure 3-9 shows the Java Management Extensions (JMX) console of Java Mission Control monitoring our example REST server.

B- JFR features

The following table shows what other tools can collect and what jfr collects for each event:

Event Other tools JFR
Classloading Number of classes loaded and unloaded Which classloader loaded the class; time required to load an individual class
Thread statistics Number of threads created and destroyed; thread dumps Which threads are blocked on locks (and the specific lock they are blocked on)
Throwables Throwable classes used by the application Number of exceptions and errors thrown and the stack trace of their creation
TLAB allocation Number of allocations in the heap and size of thread-local allocation buffers (TLABs) Specific objects allocated in the heap and the stack trace where they are allocated
File and socket I/O Time spent performing I/O Time spent per read/write call, the specific file or socket taking a long time to read or write
Monitor blocked Threads waiting for a monitor Specific threads blocked on specific monitors and the length of time they are blocked
Code cache Size of code cache and how much it contains Methods removed from the code cache; code cache configuration
Code compilation Which methods are compiled, on-stack replacement (OSR) compilation, and length of time to compile Nothing specific to JFR, but unifies information from several sources
Garbage collection Times for GC, including individual phases; sizes of generations Nothing specific to JFR, but unifies the information from several tools
Profiling Instrumenting and sampling profiles Not as much as you’d get from a true profiler, but the JFR profile provides a good high-order overview

C- Enabling JFR

JFR is initially disabled. To enable it, add the flag
-XX:+FlightRecorder to the command line of the application. This enables JFR as a feature, but no recordings will be made until the recording process itself is enabled. That can occur either through a GUI or via the command line.

In Oracle’s JDK 8, you must also specify this flag (prior to the FlightRecorder flag): -XX:+UnlockCommercialFeatures (default: false).
If you forget to include these flags, remember that you can use jinfo to change their values and enable JFR. If you use jmc to start a recording, it will automatically change these values in the target JVM if necessary.

To enable it from command line:
The string in that parameter is a list of comma-separated name- value pairs taken from these options:

-->The name used to identify the recording.
-->Whether to start the recording initially. The default value is false; for reactive analysis, this should be set to true.
-->Name of the file containing the JFR settings (see the next section).
-->The amount of time (e.g., 30s, 1h) before the recording should start.
-->The amount of time to make the recording.
-->Name of the file to write the recording to.
-->Whether to compress (with gzip) the recording; the default is false.
-->Maximum time to keep recorded data in the circular buffer.
-->Maximum size (e.g., 1024K, 1M) of the recording’s circular buffer.

🏃 See you in chapter 4 …

🐒take a tip

Never trust your code. 👮

Suspect your code

Source: DEV Community

November 5, 2021
Category : News
Tags: books | java | performance | programming

Leave a Reply

Your email address will not be published. Required fields are marked *

Sitemap | Terms | Privacy | Cookies | Advertising

Senior Software Developer

Creator of @LzoMedia I am a backend software developer based in London who likes beautiful code and has an adherence to standards & love's open-source.