Speeding Up a Java CLI Part One: AppCDS

Posted on
java javaimports
Speeding Up a Java CLI Part One: AppCDS /img/pixel-art.jpg

During my free time, I am developing javaimports, a goimports-like Java command line tool that auto-imports Java classes without relying on an IDE, a LSP or any kind of cache. I use it daily to write Java, pairing it with google-java-format to automatically format my code and add missing imports on save. Obviously, I need javaimports to run fast, which pushes me to look for ways to make my code, and Java command line applications in general, faster.

My first move was to add multithreading support. But when I ran a poor man’s benchmark using time, some numbers caught my attention.

project number of files dependencies v1.1 (single thread) time v1.2 (multi thread) time
A 444 29 2.4s 1.6s
B 90 9 1.2s 1.0s
C 4 0 0.6s 0.6s

Unsurprisingly, multithreading does not help in project C. But 0.6 seconds to parse 4 files and scan 0 dependency sounds like an awful lot of time to me! A quick analysis found a bottleneck: despite all files being roughly the same size, parsing the first file took close to a hundred times longer (200ms as opposed to 2~3ms).

I first suspected there might be some kind of caching involved, but the library I used did not contain any. What else could explain why the first time is abnormally slow? My suspicions turned to class loading. Running java with -Xlog:class+load did show something interesting around the first call to Parser:

[0.108s][info][class,load] com.nikodoko.javaimports.parser.Parser source: file:/Users/nicolas.couvrat/javaimports-1.2-SNAPSHOT-all-deps.jar
...
~1100 classes!!
...
[0.313s][info][class,load] com.nikodoko.javaimports.parser.Parser$$Lambda$73/0x0000000800c47040 source: com.nikodoko.javaimports.parser.Parser

The large number of classes involved suggested that improving loading time could lead to serious performance gains. I decided to look at what happens when the Java Virtual Machine (JVM) loads a class.

What Happens When the JVM Loads a Class?

As you probably know, Java is a hybrid language, halfway between compiled languages (like C) and interpreted languages (like Python). Java code is first compiled to bytecode, which is the machine language of a virtual machine called the JVM. The JVM then interprets it1, which means that bytecode is loaded at runtime.

Most of the details of class loading are out of the scope of this article, so I will stick to a broad overview. If you wish to learn more, I recommend you check Oracle’s very thorough documentation.

Loading Bytecode And Creating an Internal Representation

When the JVM first encounters a class name, it will try to load it using an available class loader. A class loader can be viewed as a bytecode provider: most of the time, that bytecode will come from a .jar archive stored locally, but it could occasionally be downloaded over the network, generated on the fly, etc.

The JVM will parse that binary data and, provided it represents a valid .class file, derive an internal representation from it. That representation will eventually be stored in the method area of the JVM (a section of the heap shared by all JVM threads), and used every time the corresponding class is initialized or called by the rest of the code.

Linking

Before using that internal representation, however, the JVM goes through a number of additional steps, together referred to as “linking”.

  • it verifies that the bytecode satisfies a number of constraints,
  • it rewrites parts of it, optimizing what it can,
  • it resolves it by loading and creating the other classes it uses (also checking access permissions),
  • and etc.

Initialization

Finally, static fields and static blocks are initialized: the class is now ready to be used.

As you can see, that’s a lot of work, especially given that it’s done recursively, and for all the core classes too. And because it is only performed when a class is first encountered, most of the price is paid at startup. While this is acceptable for a long running, server-side application, it is very painful for a short-lived command line tool that will have to go through this loading step every single time.

Reducing Startup Time With AppCDS

Maintainers of the Java language have long been aware of this issue, and have introduced Class Data Sharing (CDS) as a way to mitigate it. A first version was included with Java SE 5.0, limited to a small set of core classes. It was improved and extended several times, with AppCDS (that also works for user-defined classes) being finally released in OpenJDK 102.

The idea is simple. The internal representation of loaded classes is dumped in an archive, that can be directly memory-mapped the next time the application runs, removing the need for expensive verification and parsing. This reduces the memory footprint (as part of the class archive can be shared between JVM processes, and less allocations are made), but also obviously helps with speed, which is what I am interested in here.

The original iteration of AppCDS used to require 3 steps to generate and use the archive file3, but it has been simplified with OpenJDK 13. It is now a matter of running the following 2 commands:

# Run the program once to generate a shared archive file
java -XX:ArchiveClassesAtExit=cds.jsa -jar ...

# Then use it every other time and profit!
java -XX:SharedArchiveFile=cds.jsa -jar ...

How efficient is it at decreasing loading time? Using -Xlog:class+load shows the following:

[0.104s][info][class,load] com.nikodoko.javaimporthets.parser.Parser source: shared objects file (top)
...
still ~1000 classes, but a lot of them now come from the shared objects file
...
[0.222s][info][class,load] com.nikodoko.javaimports.parser.Parser$$Lambda$74/0x000000080160cc40 source: com.nikodoko.javaimports.parser.Parser

With AppCDS, the first call to Parser takes 100ms less! Overall, running javaimports on the small project C now takes 400ms instead of 600ms, a 33% speed increase. Of course, the gain is flat: it only affects class loading, and the amount of classes loaded does not change depending on how long the program runs. In my case, on the bigger project A, the same 200ms difference can be observed, going from 1.6s to 1.4s: less impressive, but still a solid 13% improvement.

So, do you want to use AppCDS in your project? For a short-lived Java application not spending too much time waiting on I/O, the answer is probably yes. It’s hard to predict exactly how much it will benefit you, but hoping for a 30% improvement in startup time for a small command line utility is not unreasonable. As for me, I’m not yet entirely satisfied. It’s not bad, but can’t it go even faster? (Spoiler: The answer is yes, and I will cover it in a next article).

As always, shoot me a message or tweet @nicol4s_c if you want to chat about any of this, if you spotted any mistakes or typos, or if you’d like me to cover anything else! Have a great day :)


  1. Of course, modern day JVMs pack plenty of features and optimizations like just-in-time compilation, but these are mostly irrelevant to code that is executed a low number of times like it is the case in command line applications. [return]
  2. See JEP 310 for more details. [return]
  3. See the JEP 310, or alternatively this short blog post. [return]