Saturday, 7 July 2012

Item 75: Consider using a custom serialized form


Do not accept the default serialized form without first considering whether it is appropriate. Accepting the default serialized form should be a conscious decision that this encoding is reasonable from the standpoint of flexibility, performance, and correctness. Generally speaking, you should accept the default serialized form only if it is largely identical to the encoding that you would choose if you were designing a custom serialized form.

The default serialized form is likely to be appropriate if an object’s physical representation is identical to its logical content. For example, the default serialized form would be reasonable for the following class, which simplistically represents a person’s name:

// Good candidate for default serialized form
public class Name implements Serializable {
/**
* Last name. Must be non-null.
* @serial
*/
private final String lastName;
/**
* First name. Must be non-null.
* @serial
*/
private final String firstName;
/**
* Middle name, or null if there is none.
* @serial
*/
private final String middleName;
... // Remainder omitted
}

Logically speaking, a name consists of three strings that represent a last name, a first name, and a middle name. The instance fields in Name precisely mirror this logical content.

Even if you decide that the default serialized form is appropriate, you often must provide a readObject method to ensure invariants and security. In the case of Name, the readObject method must ensure that lastName and first- Name are non-null. This issue is discussed at length in Items 76 and 78.

Note that there are documentation comments on the lastName, firstName, and middleName fields, even though they are private. That is because these private fields define a public API, which is the serialized form of the class, and this public API must be documented. The presence of the @serial tag tells the Javadoc utility to place this documentation on a special page that documents serialized forms.

Near the opposite end of the spectrum from Name, consider the following class, which represents a list of strings (ignoring for the moment that you’d be better off using one of the standard List implementations):

// Awful candidate for default serialized form
public final class StringList implements Serializable {
private int size = 0;
private Entry head = null;
private static class Entry implements Serializable {
String data;
Entry next;
Entry previous;
}
... // Remainder omitted
}

Logically speaking, this class represents a sequence of strings. Physically, it represents the sequence as a doubly linked list. If you accept the default serialized form, the serialized form will painstakingly mirror every entry in the linked list and all the links between the entries, in both directions.

Using the default serialized form when an object’s physical representation differs substantially from its logical data content has four disadvantages:

It permanently ties the exported API to the current internal representation. In the above example, the private StringList.Entry class becomes part of the public API. If the representation is changed in a future release, the StringList class will still need to accept the linked list representation on input and generate it on output. The class will never be rid of all the code dealing with linked list entries, even if it doesn’t use them anymore.

It can consume excessive space. In the above example, the serialized form unnecessarily represents each entry in the linked list and all the links. These entries and links are mere implementation details, not worthy of inclusion in the serialized form. Because the serialized form is excessively large, writing it to disk or sending it across the network will be excessively slow.

It can consume excessive time. The serialization logic has no knowledge of the topology of the object graph, so it must go through an expensive graph traversal. In the example above, it would be sufficient simply to follow the next references.

It can cause stack overflows. The default serialization procedure performs a recursive traversal of the object graph, which can cause stack overflows even for moderately sized object graphs. Serializing a StringList instance with 1,258 elements causes the stack to overflow on my machine. The number of elements required to cause this problem may vary depending on the JVM implementation and command line flags; some implementations may not have this problem at all.

A reasonable serialized form for StringList is simply the number of strings in the list, followed by the strings themselves. This constitutes the logical data represented by a StringList, stripped of the details of its physical representation. Here is a revised version of StringList containing writeObject and readObject methods implementing this serialized form. As a reminder, the transient modifier indicates that an instance field is to be omitted from a class’s default serialized form:

// StringList with a reasonable custom serialized form
public final class StringList implements Serializable {
private transient int size = 0;
private transient Entry head = null;
// No longer Serializable!
private static class Entry {
String data;
Entry next;
Entry previous;
}
// Appends the specified string to the list
public final void add(String s) { ... }
/**
* Serialize this {@code StringList} instance.
*
* @serialData The size of the list (the number of strings
* it contains) is emitted ({@code int}), followed by all of
* its elements (each a {@code String}), in the proper
* sequence.
*/
private void writeObject(ObjectOutputStream s)
throws IOException {
s.defaultWriteObject();
s.writeInt(size);
// Write out all elements in the proper order.
for (Entry e = head; e != null; e = e.next)
s.writeObject(e.data);
}
private void readObject(ObjectInputStream s)
throws IOException, ClassNotFoundException {
s.defaultReadObject();
int numElements = s.readInt();
// Read in all elements and insert them in list
for (int i = 0; i < numElements; i++)
add((String) s.readObject());
}
... // Remainder omitted
}

Note that the first thing writeObject does is to invoke defaultWriteObject, and the first thing readObject does is to invoke defaultReadObject, even though all of StringList’s fields are transient. If all instance fields are transient, it is technically permissible to dispense with invoking defaultWriteObject and defaultReadObject, but it is not recommended.

Before deciding to make a field nontransient, convince yourself that its value is part of the logical state of the object. If you use a custom serialized form, most or all of the instance fields should be labeled transient, as in the StringList example shown above.

If you are using the default serialized form and you have labeled one or more fields transient, remember that these fields will be initialized to their default values when an instance is deserialized: null for object reference fields, zero for umeric primitive fields, and false for boolean fields [JLS, 4.12.5]. If these values are unacceptable for any transient fields, you must provide a readObject method that invokes the defaultReadObject method and then restores transient fields to acceptable values (Item 76). Alternatively, these fields can be lazily initialized the first time they are used (Item 71).

Whether or not you use the default serialized form, you must impose any synchronization on object serialization that you would impose on any other method that reads the entire state of the object. So, for example, if you have a thread-safe object (Item 70) that achieves its thread safety by synchronizing every method, and you elect to use the default serialized form, use the following writeObject method:

// writeObject for synchronized class with default serialized form
private synchronized void writeObject(ObjectOutputStream s)
throws IOException {
s.defaultWriteObject();
}

If you put synchronization in the writeObject method, you must ensure that it adheres to the same lock-ordering constraints as other activity, or you risk a resource-ordering deadlock [Goetz06, 10.1.5].

Regardless of what serialized form you choose, declare an explicit serial version UID in every serializable class you write. This eliminates the serial version UID as a potential source of incompatibility (Item 74). There is also a small performance benefit. If no serial version UID is provided, an expensive computation is required to generate one at runtime.

Declaring a serial version UID is simple. Just add this line to your class:

private static final long serialVersionUID = randomLongValue ;

To summarize, when you have decided that a class should be serializable (Item 74), think hard about what the serialized form should be. Use the default serialized form only if it is a reasonable description of the logical state of the object; otherwise design a custom serialized form that aptly describes the object. You should allocate as much time to designing the serialized form of a class as you allocate to designing its exported methods (Item 40). Just as you cannot eliminate exported methods from future versions, you cannot eliminate fields from the serialized form; they must be preserved forever to ensure serialization compatibility. Choosing the wrong serialized form can have a permanent, negative impact on the complexity and performance of a class.


Reference: Effective Java 2nd Edition by Joshua Bloch