March 2020 saw the release of Java 14, bringing with it the first major piece of work related to the ongoing Project Panama. Project Panama is an initiative to bring Java closer to the hardware by improving Java’s support for accessing native code and memory.
Amongst other features, Java 14 introduces the new Foreign Memory Access (FMA) API. Two of Neueda’s software developers within the capital market division, Niall Maguire and Ryan Swain, decided to review the new features and provide their feedback, giving particular interest to the ability this provides to allocate and work with direct off-heap memory.
You can view all code and tests discussed in this post in our GitHub repo: https://github.com/blu-corner/OffHeapJavaStudy.
The Garbage Collector (GC) is a fantastic piece of technology. By providing automatic memory management, it allows Java developers to write code largely (but not entirely) free from the worries of memory management.
Manual memory management is difficult; worse still, it’s error-prone and can lead to many difficult-to-debug memory leaks (when an application’s memory use continuously grows over time) and segmentation faults (when an application tries to read or write to an illegal memory location). Ask any C/C++ developer, and they’ll tell you about the headaches of manual memory management.
While automatic memory management and the GC free us from these worries, it comes at a cost: Unpredictable latency spikes (A.K.A GC jitter). A latency spike is a momentary pause (milliseconds or less) in the running of application code to allow the Garbage Collector to run. During this pause, the application stops processing. For most applications, this is not a concern, but for some, and especially the time-critical message-driven applications found in the world of Capital Markets, it can be significant. A latency spike at the wrong moment in a trade processing system can have a major impact.
As the GC only manages on-heap memory, by allocating and using direct off-heap memory, the developer can greatly reduce the impact GC execution has on application latency.
Having come across this issue many times with our clients, we were keen to investigate the new FMA API, which promised safe off-heap memory access.
What are the options?
Prior to the introduction of the FMA API, two ways of allocating off-heap memory existed. However, both have their drawbacks:
- sun.misc.Unsafe – The unsafe class provides access to low-level mechanisms, including direct memory. It’s intended for use by the core Java classes themselves. The major drawback is that the class offers no safety. It’s possible to cause a segmentation fault and crash the application via an incorrect memory access.
- java.nio.ByteBuffer – This class allows the developer to work off-heap and with built-in safety that Unsafe lacks. However, it imposes a hard limit of 2GB of off-heap memory due to its internal use of int for indexes.
The new API aims to improve upon these options by offering the ability to work with unlimited memory space and by providing spatial and temporal safety. It also has the stated aim of being “performance-competitive with these existing APIs”.
A Brief Look At the API
The API is composed of three main interfaces:
An abstraction that represents a “contiguous memory region with given spatial and temporal bounds”, i.e. a block of memory that can only be accessed while it’s open (temporal bounds) and which the developer cannot access outside of (spatial bounds).
An address within a MemorySegment. Each memory segment has a base address, i.e. the first address in the segment. All other addresses in the segment are at an offset from the base address.
An abstraction that allows the developer to describe the structure of the memory, i.e. the data types and the order of the data types. This allows the memory to be populated and read in a controlled manner.
Below is a simple use case of the API that makes use of MemorySegment to reserve and populate an off-heap memory area with 25 random integers. Note the use of MemoryAddress to access each integer within the MemorySegment:
Below is an example which makes use of MemoryLayout to provide a more natural index-based way of accessing the MemorySegment:
In this third example, we introduce a struct based MemoryLayout, which allows us to work with more complex object types in memory and access them again in an indexed manner. Here, we create pairs of integers (i.e. co-ordinates) and store them off in memory, so they’re written and retrieved as a single object, rather than as individual integers.
These demos, as well as others, are available through our linked GitHub repo (see class com.neueda.research.jep370.OffHeapDemo).
For our analysis, we looked at two areas: Throughput and Garbage Collection cycles.
To measure throughput, we created tests to read 10 years of four-hour GBPUSD candlestick data and gathered simple statistics such as max and min price, repeating the test for a number of different data structures:
- Byte Buffer (on-heap)
- Byte Buffer (off-heap)
- MemorySegment (Foreign Memory Access API)
The JMH micro-benchmarking library was used to gather throughput numbers for each test run, measuring the number of operations performed per second.
As seen in Figure 1, the throughput performance of MemorySegment was significantly worse than that of the other data structures. While it may be reasonable to see a substantial difference when compared to Array or List, we expected to see a closer performance with ByteBuffer, as this was one of the original success metrics laid out in the plans for the API. Upon some further investigation, we discovered that it’s a known issue. Brian Goetz discusses these throughput issues in this Beyond ByteBuffers Devoxx presentation: https://youtu.be/iwSCtxMbBLI?t=2341.
After establishing the API’s throughput shortcomings, we investigated the main advantage that this API offers – the ability to allocate memory off-heap and so avoid GC. In doing this, we simply reused the previous throughput test but this time, running continuously for a five-minute period. We focused on the use of an Array and the use of MemorySegment. Using VisualVM, we were able to observe garbage collection activity while each data structure was being interacted with.
As seen in Figure 2 and Figure 3, the benefits of using an off-heap data structure such as a MemorySegment when compared to an on-heap structure are clear – using MemorySegment resulted in significantly less GC activity.
Note – Java 14 build 14+36-1461 was used for this analysis, running on a Windows 10 PC, 16GB RAM and Intel(R) Core(TM) i5-7200U CPU.
Eliminating GC spikes can be particularly critical in the world of Capital Markets, where a GC cycle at an inopportune moment can have a direct impact on P&L. By using off-heap memory allocation, developers can decrease (or in some cases, effectively eliminate) the amount of time an application spends in suspension.
While allocating memory off-heap is not entirely new in Java, the new API provides a standardised and supported way of achieving this. It does this while also avoiding the latency spikes associated with Garbage Collection, without the size limitation imposed by ByteBuffer or the potential instability of using the sun.misc.Unsafe class.
A reduced overall throughput may often be acceptable to achieve consistent latency without any GC related jitter. This may be acceptable if it’s possible to increase throughput by application scaling. However, the current performance issues may deter adoption of the FMA API until such times as it’s improved.
Of course, it should be taken into consideration that the FMA remains, for now, an incubator project. We hope future releases may improve the performance of the API. It may be a little early to consider fully adopting the API, at least until it is promoted out of incubator mode. In the meantime, we look forward to future improvements to this API and the release of other Project Panama features.