Saturday, 7 July 2012

Item 9: Always override hashCode when you override equals


A common source of bugs is the failure to override the hashCode method. You must override hashCode in every class that overrides equals.

Here is the contract, copied from the Object specification [JavaSE6]:

• Whenever it is invoked on the same object more than once during an execution of an application, the hashCode method must consistently return the same integer.

• If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.

• It is not required that if two objects are unequal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results.

The key provision that is violated when you fail to override hashCode is the second one: equal objects must have equal hash codes.

For example, consider the following simplistic PhoneNumber class, whose equals method is constructed according to the recipe in Item 8, but doesn’t override hashCode method. Suppose you attempt to use this class with a HashMap:

Map<PhoneNumber, String> m
= new HashMap<PhoneNumber, String>();
m.put(new PhoneNumber(707, 867, 5309), "Jenny");

At this point, you might expect m.get(new PhoneNumber(707, 867, 5309)) to return "Jenny", but it returns null. Notice that two PhoneNumber instances are involved: one is used for insertion into the HashMap, and a second, equal, instance is used for (attempted) retrieval. The PhoneNumber class’s failure to override hashCode causes the two equal instances to have unequal hash codes, in violation of the hashCode contract.

Fixing this problem is as simple as providing a proper hashCode method for the PhoneNumber class. So what should a hashCode method look like? It’s trivial to write one that is legal but not good. This one, for example, is always legal but should never be used:

// The worst possible legal hash function - never use!
@Override public int hashCode() { return 42; }

It ensures that every object has the same hash code. Therefore, every object hashes to the same bucket.

Luckily it’s not too difficult to achieve a fair approximation. Here is a simple recipe:

1. Store some constant nonzero value, say, 17, in an int variable called result.

2. For each significant field f in your object (each field taken into account by the equals method, that is), do the following:
a. Compute an int hash code c for the field:
i. If the field is a boolean, compute (f ? 1 : 0).
ii. If the field is a byte, char, short, or int, compute (int) f.
iii. If the field is a long, compute (int) (f ^ (f >>> 32)).
iv. If the field is a float, compute Float.floatToIntBits(f).
v. If the field is a double, compute Double.doubleToLongBits(f), and then hash the resulting long as in step 2.a.iii.
vi. If the field is an object reference and this class’s equals method compares the field by recursively invoking equals, recursively invoke hashCode on the field. If a more complex comparison is required, compute a “canonical representation” for this field and
invoke hashCode on the canonical representation. If the value of the field is null, return 0 (or some other constant, but 0 is traditional).
vii. If the field is an array, treat it as if each element were a separate field. That is, compute a hash code for each significant element by applying these rules recursively, and combine these values per step 2.b. If every element in an array field is significant, you can use one of the Arrays.hashCode methods added in release 1.5.
b. Combine the hash code c computed in step 2.a into result as follows:
result = 31 * result + c;

3. Return result.

4. When you are finished writing the hashCode method, ask yourself whether equal instances have equal hash codes. Write unit tests to verify your intuition! If equal instances have unequal hash codes, figure out why and fix the problem.

The value 17 is arbitrary. The value 31 was chosen because it is an odd prime. If it were even and the multiplication overflowed, information would be lost, as multiplication by 2 is equivalent to shifting. The advantage of using a prime is less clear, but it is traditional. A nice property of 31 is that the multiplication can be replaced by a shift and a subtraction for better performance:
31 * i == (i << 5) - i. Modern VMs do this sort of optimization automatically. Let’s apply the above recipe to the PhoneNumber class. There are three significant fields, all of type short:

@Override public int hashCode() {
int result = 17;
result = 31 * result + areaCode;
result = 31 * result + prefix;
result = 31 * result + lineNumber;
return result;
}

If a class is immutable and the cost of computing the hash code is significant, you might consider caching the hash code in the object rather than recalculating it each time it is requested. You might choose to lazily initialize it the first time hashCode is invoked (Item 71).

// Lazily initialized, cached hashCode
private volatile int hashCode; // (See Item 71)
@Override public int hashCode() {
int result = hashCode;
if (result == 0) {
result = 17;
result = 31 * result + areaCode;
result = 31 * result + prefix;
result = 31 * result + lineNumber;
hashCode = result;
}
return result;
}

Do not be tempted to exclude significant parts of an object from the hash code computation to improve performance.

Reference: Effective Java 2nd Edition by Joshua Bloch