Search This Blog

Tuesday, July 13, 2010

All About The Java Integer Cache


Java Integer Cache - Introduction

I recently came across the following System property in Java: java.lang.Integer.IntegerCache.high, and looking into this further brought me to some areas of java I had not yet visited. Having gone in like an intrepid traveller and explored these new, as yet (for me anyway) unknown areas and mapped them for myself I figured the only thing to do know was to share that knowledge with others for future explorers to find their way around.


So what is this property about?


Java Integer Cache - Description

The Java Integer Cache was something that I only discovered recently while at work, even though it has been in the language since Java 1.5. So what is it? Well, as the name suggests its the caching of java integer objects. But, why? Well its required by the JLS (Java Language Specification) for starters which is a pretty compelling reason :-) but its also a performance improvement. The JLS section Im refering to is 5.1.7 Boxing Conversions where in particular it says:


If the value p being boxed is true, false, a byte, a char in the range \u0000 to \u007f, or an int or short number between -128 and 127, then let r1 and r2 be the results of any two boxing conversions of p. It is always the case that r1 == r2.

So for integers (or also bytes, chars, shorts) in the range of -128 to 127 the spec requires that the Object Wrappers that represent these values (i.e.: Integer, Short, Byte, Char) are cached. Also, interestingly longs are not mentioned in that part of the spec. They are mentioned a little later in a discussion section where it talks about longs but not in section 5.1.7. Despite this, when I looked at the source, the same type of caching code used for int, short, char, byte is also used for longs as well, so they are cached (and rightly so I think) in the same way as the others.


As I mentioned, this Integer caching was brought in as part of Java 1.5 as a performance improvement. Java 1.5 also introduced Autoboxing of the primitives to their Object Wrapper classes. I think most people who are Java Developers know what Autobxing entails but if you dont then Ill wait here while you have a look at the following link, have a read and then come back...ok...:

....good...so your back hey. Alright, well now that you know all about Autoboxing, lets continue on. The cache as a performance improvement is used for all of the autoboxing (well when I say all I mean for any integer primitves, etc., that are within its cache range). Since for using autoboxing requires the use of Object Wrapper classes, even if autoboxing is just an intermediate step...i.e.: take the example of adding and then removing and using an integer in a java.util.List. There is the use of autoboxing when adding the int primitive to the java.util.List, then un-boxing when taking it from the java.util.List back out prior to using it. In this case the client uses only an int primitive from their point of view, but in the background under the covers the VM will use two Integer Wrapper objects. These, like normal objects need to be garbage-collected, take up memory space, etc. Therefor using caching here is definately a performance imrpovement.


Java Integer Cache - Having a look in the java.lang.Integer java class

Well, next up lets have a look inside the source code. This is one of the things I love about java and its JDK. The fact that there are no mysterious black box API's is cool. If you want to you can simply, using your favourite IDE and inbuilt short cut, i.e.: <cntrl>+click, jump to the source code and see what it does internally. Note: if it happens to be a native piece of code well thats another matter and you cant simply jump to view the c code...although Id love an IDE plugin for Netebeans that would allow that too. Im sure its achievable....oooh oooh I smell an an open source project brewing. Ha ha ha. Anyways, looking into the source code what do you see? Well in part tht depends is the answer. As I mentioned, this Integer caching was brought in as part of Java 1.5, and the code I show below is the inital code. However since that time it has been upgraded in order to allow the passing of a property that allows you to choose the size of the cache yourself, rather than being limited to the default -128 to 127 from the JDK. So, for interests sake Ill present both versionas, the first being around in Java 1.5 up to when the second version was brought in allowing setting of the casche size in Java 1.6.0_14 (i.e.: the 14th update for Java1.6). Ok, so first version Java1.5.0-->Java1.6.0_13:


Java 1.5.0_20 - IntegerCache implementation

private static class IntegerCache {
    private IntegerCache(){}

    static final Integer cache[] = new Integer[-(-128) + 127 + 1];

    static {
        for(int i = 0; i < cache.length; i++)
            cache[i] = new Integer(i - 128);
    }
}

Java 1.6.0_14+ - IntegerCache implementation

The new refactored implementation is below, and some things to note are as follows. For starters the new implementation has some actual comments added in unlike the initial version. Secondly, Ive also added in the static method used to get the system property for setting the cache size static void getAndRemoveCacheProperties(). I figured it would be interesting to show here, plus its used in the IntegerCache as a property so itll be good to see where that property comes from. The only liberty I have taken is to put the IntegerCacahe class first followed byt the other method/property unlike the reverse in the actual code. Thee reasoning being so that its easy to compare the two IntegerCache versions above and below, so keep this in mind:

private static class IntegerCache {
    static final int high;
    static final Integer cache[];

    static {
        final int low = -128;

        // high value may be configured by property
        int h = 127;
        if (integerCacheHighPropValue != null) {
            //Use Long.decode here to avoid invoking methods that require Integer's autoboxing cache to be initialized
            int i = Long.decode(integerCacheHighPropValue).intValue();
            i = Math.max(i, 127);
            // Maximum array size is Integer.MAX_VALUE
            h = Math.min(i, Integer.MAX_VALUE - -low);
        }
        high = h;

        cache = new Integer[(high - low) + 1];
        int j = low;
        for(int k = 0; k < cache.length; k++)
            cache[k] = new Integer(j++);
    }
    private IntegerCache() {}
}

/**
 * Cache to support the object identity semantics of autoboxing for values between -128 and 127 (inclusive) as required by JLS. 
 * The cache is initialized on first usage. During VM initialization the getAndRemoveCacheProperties method may be used to get and remove any 
 * system properites that configure cache size. At this time, size of cache may be controlled by the vm option -XX:AutoBoxCacheMax=<size>.
 */
// value of java.lang.Integer.IntegerCache.high property (obtained during VM init)
private static String integerCacheHighPropValue;

static void getAndRemoveCacheProperties() {
    if (!sun.misc.VM.isBooted()) {
        Properties props = System.getProperties();
        integerCacheHighPropValue = (String)props.remove("java.lang.Integer.IntegerCache.high");
        if (integerCacheHighPropValue != null)
            System.setProperties(props);  // remove from system props
    }
}
 

Java 1.5.0_20 - Java 1.6.0_14+ - IntegerCache implementation comparisons/diffs.

Ok, so lets look at the diffs here. For starters lets look at the IntegerCache class itself. In both cases its a private static. Furthermore the general layout is similar. An Integer[] array for holding the cached values of the Integer objects. However the introduction of the property for setting the cache size now changes teh way this cache array is defined. Lets take a look at the two definitions, Java1.5 first followed by the refactored version:

static final Integer cache[] = new Integer[-(-128) + 127 + 1]; static final Integer cache[];

So as you can see, the Java1.5 initial version was simpler and it defined the array size in the definition of the array because it knew the size striaght up. The new version doesnt know the size until it checks the appropriate system property so it cant define the array size until later. Its also interesting that the array size in th first version is defined as: -(-128) + 127 + 1, rather than 256. I can see the reason I suppose as by define it this way it makes it clearer why this size was chosen with the references to the cache range in the definition (i.e.:-128 to 127) and it only adds 3 extra operations so performance loss is minimal (especially since the operations are just additions and in all likelihood they are optimised away by the compiler into a single value anyway.

Ok so what else?. Well the Java1.5 version fills the cache like so:

    for(int i = 0; i < cache.length; i++)
        cache[i] = new Integer(i - 128);

A simple for loop over the value which is the cache.length ranging from the negative to postive range by the use of the: (i - 128) which uses the cache size and subtracts 128 from each value to create the ppropriate range. The new version first decodes the sytem property. It gets the property using static void getAndRemoveCacheProperties()with the property name being: java.lang.Integer.IntegerCache.high. Then prior to setting up the cache it determines teh high and low values for the range from this property. The next few lines do the following:

  1. int i = Long.decode(integerCacheHighPropValue).intValue(); - Decode the property String into an integer primitive and assign to a temp variable 'i'. Furthermore as the comment above the line specifies we use the Long classes decode method and then convert to an int primitive rather than using Integer to do so which would be quicker/simpler, but since we are initialising the cache for Integer we have to stay away from the class and its methods and hecne the use of the Long.decode() and Long.intValue() method uses.
  2. i = Math.max(i, 127); - Determine the maximum from the property and 127. The purpose of this being to make sure that the value that you pass in is AT LEAST the size of the default cache size as required by the JLS Spec. If you enter a smaller value then 127 this line hence ensures that 127 is used as the cache size/range.
  3. h = Math.min(i, Integer.MAX_VALUE - -low); - This sets up the maximum size of the array. Seems counter intuative to do this since we already did the Math.max() call above, but this call is to be safe and make sure that we dont have a array overload exception. If the value passed in through the property is large enough without this line it could cause the array to overflow. Hence the min() in conjunction with Integer.MAX_VALUE - -low, call here to makesure we dont overflow.
  4. Finally, the Integer[] cache is setup in much the same way as in the first version now that we know the low and high values with a loop creating new Integer() instances over the low to high range.

Now, one very interesting thing that I found out while examining this code was unexpected. Given the following example where I pass in the property -XX:AutoBoxCacheMax=<size> using size=1000, I would've thought, prior to this that the cache would've been from -1000 to 999 or perhaps -1001 to 1000. In actual fact the range of the cache is from -128 to 1000 in this case. Its an interesting side affect of this code that it is not possible to add more negative values to the cache besides the default 0 to -128 even though it is possible to add higher positive values. I think that in the future a java.lang.Integer.IntegerCache.low property should be added with the vm option -XX:AutoBoxCacheLow=<size>. In fact I intend to suggest this to the JDK7 project as a minor API/VM change (along with a patch for the code) as I can see no reason why adding a similar property for a max low in the cache wouldnt be acceptable given the max high is already in. Watch this space for further info on how I go :-) .

Integer.valueOf(int i)

public static final Integer valueOf(int i) Description and Purpose

The provision of the IntegerCache has lead to new rule/pattern for the creation of Integer instances. The public static Integer valueOf(int i) method has been provided in the Integer class API as of Java1.5 and it returns an Integer instance representing the specified int value. If a new Integer instance is not specifically required, which in most cases there should be no reason for requiring an Integer for the same primitive int but with a different hashcode, this method should generally be used in preference to the constructor version (new Integer(int)), as this method is likely to yield significantly better space and time performance by caching frequently requested values.

Java 1.5.0_20 - Java 1.6.0_14+ - Integer.valueOf() implementation comparisons/diffs.

As a final side note for this blog entry I just want to take a quick look at the comparison of the valueOf() method and its implementation and specifically the differences between the two versions. They are shown below:

This is Java1.5 Integer.valueOf(int i):

    public static Integer valueOf(int i) {
        final int offset = 128;
        if (i >= -128 && i <= 127) { // must cache 
            return IntegerCache.cache[i + offset];
        }
        return new Integer(i);
    }
This is Java1.6 Integer.valueOf(int i):
    public static Integer valueOf(int i) {
        if(i >= -128 && i <= IntegerCache.high)
            return IntegerCache.cache[i + 128];
        else
            return new Integer(i);
    }

While the two implementations are almost identical, its intersting to see the minor differences in their implemnetation. Given that the actual implementation is the same in intent its intersting to see that the developer decide to make these minor changes if they have no functionality difference. In particular, the older version defines a final int offset = 128 whereas the newer version removes this initialisation and just uses the number (magic number) in the code. Why would they do this ? Its better coding practice to not use magic numbers. The newer version also uses a 'else' clause for returning the default new Integer(i) value if not returning from the cache. In the older verison there was no else. Like I said, in terms of intent ther is no difference but why these two changes. Also, one change which does have a purpose is the use of IntegerCache.high rather than 127 for the if statement comparison. This is required since by defining the cache high watermark we can no longer use the magic number 127.

Sources / Bibliography

The following links where either a direct help/reference in writing up this article or are simply good for elaborating on various concepts described above (especially for concepts around the edges and not directly part of the Java Integer Cache concept Im describing, such as autoboxing, generics, the new features introduced in Java 5 etc.

No comments:

Post a Comment