Saturday 7 July 2012

Item 77: For instance control, prefer enum types to readResolve


Item 3 describes the Singleton pattern and gives the following example of a singleton class. This class restricts access to its constructor to ensure that only a single instance is ever created:

public class Elvis {
public static final Elvis INSTANCE = new Elvis();
private Elvis() { ... }
public void leaveTheBuilding() { ... }
}

As noted in Item 3, this class would no longer be a singleton if the words “implements Serializable” were added to its declaration. It doesn’t matter whether the class uses the default serialized form or a custom serialized form (Item 75), nor does it matter whether the class provides an explicit readObject method (Item 76). Any readObject method, whether explicit or default, returns a newly created instance, which will not be the same instance that was created at class initialization time.

The readResolve feature allows you to substitute another instance for the one created by readObject [Serialization, 3.7]. If the class of an object being deserialized defines a readResolve method with the proper declaration, this method is invoked on the newly created object after it is deserialized. The object reference returned by this method is then returned in place of the newly created object. In most uses of this feature, no reference to the newly created object is retained, so it immediately becomes eligible for garbage collection.

If the Elvis class is made to implement Serializable, the following readResolve method suffices to guarantee the singleton property:

// readResolve for instance control - you can do better!
private Object readResolve() {
// Return the one true Elvis and let the garbage collector
// take care of the Elvis impersonator.
return INSTANCE;
}

This method ignores the deserialized object, returning the distinguished Elvis instance that was created when the class was initialized. Therefore, the serialized form of an Elvis instance need not contain any real data; all instance fields should be declared transient. In fact, if you depend on readResolve for instance control, all instance fields with object reference types must be declared transient. Otherwise, it is possible for a determined attacker to secure a reference to the deserialized object before its readResolve method is run, using a technique that is vaguely similar to the MutablePeriod attack in Item 76.

The attack is a bit complicated, but the underlying idea is simple. If a singleton contains a nontransient object reference field, the contents of this field will be deserialized before the singleton’s readResolve method is run. This allows a carefully crafted stream to “steal” a reference to the originally deserialized singleton at the time the contents of the object reference field are deserialized.

To make this concrete, consider the following broken singleton:

// Broken singleton - has nontransient object reference field!
public class Elvis implements Serializable {
public static final Elvis INSTANCE = new Elvis();
private Elvis() { }
private String[] favoriteSongs =
{ "Hound Dog", "Heartbreak Hotel" };
public void printFavorites() {
System.out.println(Arrays.toString(favoriteSongs));
}
private Object readResolve() throws ObjectStreamException {
return INSTANCE;
}
}

You could fix the problem by declaring the favorites field transient, but you’re better off fixing it by making Elvis a single-element enum type (Item 3). Historically, the readResolve method was used for all serializable instance-controlled classes. As of release 1.5, this is no longer the best way to maintain instance control in a serializable class. As demonstrated by the ElvisStealer attack, this technique is fragile and demands great care.

If instead you write your serializable instance-controlled class as an enum, you get an ironclad guarantee that there can be no instances besides the declared constants. The JVM makes this guarantee, and you can depend on it. It requires no special care on your part. Here’s how our Elvis example looks as an enum:

// Enum singleton - the preferred approach
public enum Elvis {
INSTANCE;
private String[] favoriteSongs =
{ "Hound Dog", "Heartbreak Hotel" };
public void printFavorites() {
System.out.println(Arrays.toString(favoriteSongs));
}
}

The use of readResolve for instance control is not obsolete. If you have to write a serializable instance-controlled class whose instances are not known at compile time, you will not be able to represent the class as an enum type. The accessibility of readResolve is significant. If you place a readResolve method on a final class, it should be private. If you place a readResolve method on a nonfinal class, you must carefully consider its accessibility. If it is private, it will not apply to any subclasses. If it is package-private, it will apply only to subclasses in the same package. If it is protected or public, it will apply to all subclasses that do not override it. If a readResolve method is protected or public and a subclass does not override it, deserializing a serialized subclass instance will produce a superclass instance, which is likely to cause a ClassCastException.

To summarize, you should use enum types to enforce instance control invariants wherever possible. If this is not possible and you need a class to be both serializable and instance-controlled, you must provide a readResolve method and ensure that all of the class’s instance fields are either primitive or transient.


Reference: Effective Java 2nd Edition by Joshua Bloch