Sep 10
Java Internals: Unavoidable Object Overhead

A few months back I completed work on an internal library at AMD to optimize memory usage for reading a large number (tens of millions) of unique records into memory from a compressed proprietary data format. Overall the process contained many new and interesting challenges, most related to reducing the current bloated data representations down to something manageable. I won’t get too much into the details but it boiled down to several strategies: substitution of primitive wrappers (i.e. java.lang.Integer) with the corresponding primitive formats, real-time bytecode generation and object interning. Although, these strategies offered impressive memory compression (> 90% in some cases!) over earlier version of the library, one problem still remained, the unavoidable overhead of java.lang.Object’s internal header. Unfortunately, the java.lang.Object header overhead is incurred by all Java objects and is tragically unavoidable. Typically, Java VM’s make use of an internal VM object model header consisting of at least two fields: a klass_type pointer and a lock_type pointer. Additionally, a Java Array object includes an additional field for the length.

class VMObject
   klass_type* klass;
   lock_type*  lock;

Typical Internal VM Object Representation
class VMArray : VMObject
   size_t length;

Typical Internal VM Array Representation

I won’t get into too much detail about the klass_type pointer since this one speaks for itself. It’s primary purpose is to provide all of the runtime type and method dispatch information to the VM. The lock_type is also straight forward providing the internal mechanism for Java’s synchronized keyword. However, it also happens to double every object’s overhead whether or not synchronization is used at all. A good resource that I found touching on the subject is available on the c2.com wiki which outlines the pitfalls of the Java object model and parallels my own independent discoveries. Unfortunately, Java was never designed for data-only types (i.e. structs in C/C++). As a result, all objects in Java require both fields regardless of synchronization or virtual dispatch. I came to the same conclusion as the authors at c2.com wiki; the memory overhead can only be reduced in two ways: (1) provide a method to mark objects as unsynchronizable and (2) incorporate a data-only type. The changes are not impossible; in fact both could be retrofitted into future Java releases with a little bit of effort. The addition of an java.lang.concurrent.Unsynchronizable interface would provide a backward compatible way for the Java compiler and VM’s object model to support a lockless subtype. Supporting a data-only type is a bit trickier but could be achieved again using an updated Java object model in addition to some compile-time and run-time optimizations.


  1. I don’t think the interface is a proper solution. Reason is that you can pass in a lockless instance of class Foo which implements this interface anywhere where Object is allowed. This means that you must check at runtime whether the instance is lockable or not. That might introduce significant overhead for applications which frequently synchronize. A better way I can think of off the top of my head would be a superclass of Object, but that would break in various other ways (e.g. because suddenly Object.class.getSuperclass() would return a non null value which breaks the existing contract). This leaves us with data-only type. But this would dramatically change the type system, namely you need two different types of references. This might not be a big deal for the compiler but it may make GC much more complex because now there is no longer a uniform object type on the heap. GC is complex enough in modern JVMs so this could be a significant burden.

  2. @Robert Klemme: A superclass of java.lang.Object providing a lockless marker is also possible and would nearly be identical to the type system offered by other runtimes (e.g. C#). I haven’t delved into the intricacies of the JVM’s GC implementations but it would definitely be possible. However, such a change would most definitely require a major version revision and updated object model. A possible alternative approach could be to employ an extension of thin locks which are already used by most JVMs. The protocol could provide markers within the class pointer to indicate what type of lock-state the object is in. When a lock is present the class pointer could point to a trampoline containing the original class pointer and the thin lock implementation. However, this will incur the additional overhead of a masking operation to provide a pointer fixup for accessing the class data and an additional dereference when the trampoline is present. Unfortunately, there does not seem to be an easy way to retrofit a lockless object without the overhead into current JVM’s. Maybe, a flag could be introduced similar to -XX:+UseCompressedOops that would provide a different runtime model of the object to save memory at the expense of speed.

Leave a comment