您的位置:首页 > 编程语言 > Java开发

The Java serialization algorithm revealed

2009-07-28 11:35 399 查看

The Java serialization algorithm revealed

Submitted by javatips
on Thu, 05/07/2009 - 15:28.

From http://www.javaworld.com/community/node/2915


Serialization
is the process of saving an object's state to a sequence of bytes; deserialization
is the process of rebuilding those bytes into a live object. The Java
Serialization API provides a standard mechanism for developers to
handle object serialization. In this tip, you will see how to serialize
an object, and why serialization is sometimes necessary. You'll learn
about the serialization algorithm used in Java, and see an example that
illustrates the serialized format of an object. By the time you're
done, you should have a solid knowledge of how the serialization
algorithm works and what entities are serialized as part of the object
at a low level.

Why is serialization required?

In today's world, a typical enterprise application will have
multiple components and will be distributed across various systems and
networks. In Java, everything is represented as objects; if two Java
components want to communicate with each other, there needs be a
mechanism to exchange data. One way to achieve this is to define your
own protocol and transfer an object. This means that the receiving end
must know the protocol used by the sender to re-create the object,
which would make it very difficult to talk to third-party components.
Hence, there needs to be a generic and efficient protocol to transfer
the object between components. Serialization is defined for this
purpose, and Java components use this protocol to transfer objects.

Figure 1 shows a high-level view of client/server communication,
where an object is transferred from the client to the server through
serialization.



Figure 1. A high-level view of serialization in action (click to enlarge)

How to serialize an object

In order to serialize an object, you need to ensure that the class of the object implements the
java.io.Serializable

interface, as shown in Listing 1.

Listing 1. Implementing Serializable

import java.io.Serializable;

class TestSerial implements Serializable {

public byte version = 100;

public byte count = 0;

}


[/code]
In Listing 1, the only thing you had to do differently from creating a normal class is implement the
java.io.Serializable

interface. The
Serializable

interface is a marker interface; it declares no methods at all. It
tells the serialization mechanism that the class can be serialized.

Now that you have made the class eligible for serialization, the
next step is to actually serialize the object. That is done by calling
the
writeObject()

method of the
java.io.ObjectOutputStream

class, as shown in Listing 2.

Listing 2. Calling writeObject()

public static void main(String args[]) throws IOException {

FileOutputStream fos = new FileOutputStream("temp.out");

ObjectOutputStream oos = new ObjectOutputStream(fos);

TestSerial ts = new TestSerial();

oos.writeObject(ts);

oos.flush();

oos.close();

}


[/code]
Listing 2 stores the state of the
TestSerial

object in a file called
temp.out

.
oos.writeObject(ts);

actually kicks off the serialization algorithm, which in turn writes the object to
temp.out

.

To re-create the object from the persistent file, you would employ the code in Listing 3.

Listing 3. Recreating a serialized object

public static void main(String args[]) throws IOException {

FileInputStream fis = new FileInputStream("temp.out");

ObjectInputStream oin = new ObjectInputStream(fis);

TestSerial ts = (TestSerial) oin.readObject();

System.out.println("version="+ts.version);

}


[/code]
In Listing 3, the object's restoration occurs with the
oin.readObject()

method call. This method call reads in the raw bytes that we previously
persisted and creates a live object that is an exact replica of the
original object graph. Because
readObject()

can read any serializable object, a cast to the correct type is required.

Executing this code will print
version=100

on the standard output.

The serialized format of an object

What does the serialized version of the object look like? Remember,
the sample code in the previous section saved the serialized version of
the
TestSerial

object into the file
temp.out

. Listing 4 shows the contents of
temp.out

, displayed in hexadecimal. (You need a hexadecimal editor to see the output in hexadecimal format.)

Listing 4. Hexadecimal form of TestSerial

AC ED 00 05 73 72 00 0A 53 65 72 69 61 6C 54 65

73 74 A0 0C 34 00 FE B1 DD F9 02 00 02 42 00 05

63 6F 75 6E 74 42 00 07 76 65 72 73 69 6F 6E 78

70 00 64


[/code]
If you look again at the actual
TestSerial

object, you'll see that it has only two byte members, as shown in Listing 5.

Listing 5. TestSerial's byte members

public byte version = 100;

public byte count = 0;


[/code]
The size of a byte variable is one byte, and hence the total size of
the object (without the header) is two bytes. But if you look at the
size of the serialized object in Listing 4, you'll see 51 bytes.
Surprise! Where did the extra bytes come from, and what is their
significance? They are introduced by the serialization algorithm, and
are required in order to to re-create the object. In the next section,
you'll explore this algorithm in detail.

Java's serialization algorithm

By now, you should have a pretty good knowledge of how to serialize
an object. But how does the process work under the hood? In general the
serialization algorithm does the following:

It writes out the metadata of the class associated with an instance.

It recursively writes out the description of the superclass until it finds
java.lang.object

.

Once it finishes writing the metadata information, it then starts
with the actual data associated with the instance. But this time, it
starts from the topmost superclass.

It recursively writes the data associated with the instance, starting from the least superclass to the most-derived class.

I've written a different example object for this section that will
cover all possible cases. The new sample object to be serialized is
shown in Listing 6.

Listing 6. Sample serialized object

class parent implements Serializable {

int parentVersion = 10;

}

class contain implements Serializable{

int containVersion = 11;

}

public class SerialTest extends parent implements Serializable {

int version = 66;

contain con = new contain();

public int getVersion() {

return version;

}

public static void main(String args[]) throws IOException {

FileOutputStream fos = new FileOutputStream("temp.out");

ObjectOutputStream oos = new ObjectOutputStream(fos);

SerialTest st = new SerialTest();

oos.writeObject(st);

oos.flush();

oos.close();

}

}


[/code]
This example is a straightforward one. It serializes an object of type
SerialTest

, which is derived from
parent

and has a container object,
contain

. The serialized format of this object is shown in Listing 7.

Listing 7. Serialized form of sample object

AC ED 00 05 73 72 00 0A 53 65 72 69 61 6C 54 65

73 74 05 52 81 5A AC 66 02 F6 02 00 02 49 00 07

76 65 72 73 69 6F 6E 4C 00 03 63 6F 6E 74 00 09

4C 63 6F 6E 74 61 69 6E 3B 78 72 00 06 70 61 72

65 6E 74 0E DB D2 BD 85 EE 63 7A 02 00 01 49 00

0D 70 61 72 65 6E 74 56 65 72 73 69 6F 6E 78 70

00 00 00 0A 00 00 00 42 73 72 00 07 63 6F 6E 74

61 69 6E FC BB E6 0E FB CB 60 C7 02 00 01 49 00

0E 63 6F 6E 74 61 69 6E 56 65 72 73 69 6F 6E 78

70 00 00 00 0B


[/code]
Figure 2 offers a high-level look at the serialization algorithm for this scenario.



Figure 2. An outline of the serialization algorithm

Let's go through the serialized format of the object in detail and
see what each byte represents. Begin with the serialization protocol
information:

AC ED

:
STREAM_MAGIC

. Specifies that this is a serialization protocol.

00 05

:
STREAM_VERSION

. The serialization version.

0x73

:
TC_OBJECT

. Specifies that this is a new
Object

.

The first step of the serialization algorithm is to write the
description of the class associated with an instance. The example
serializes an object of type
SerialTest

, so the algorithm starts by writing the description of the
SerialTest

class.

0x72

:
TC_CLASSDESC

. Specifies that this is a new class.

00 0A

: Length of the class name.

53 65 72 69 61 6c 54 65 73 74

:
SerialTest

, the name of the class.

05 52 81 5A AC 66 02 F6

:
SerialVersionUID

, the serial version identifier of this class.

0x02

: Various flags. This particular flag says that the object supports serialization.

00 02

: Number of fields in this class.

Next, the algorithm writes the field
int version = 66;

.

0x49

: Field type code. 49 represents "I", which stands for
Int

.

00 07

: Length of the field name.

76 65 72 73 69 6F 6E

:
version

, the name of the field.

And then the algorithm writes the next field,
contain con = new contain();

. This is an object, so it will write the canonical JVM signature of this field.

0x74

:
TC_STRING

. Represents a new string.

00 09

: Length of the string.

4C 63 6F 6E 74 61 69 6E 3B

:
Lcontain;

, the canonical JVM signature.

0x78

:
TC_ENDBLOCKDATA

, the end of the optional block data for an object.

The next step of the algorithm is to write the description of the
parent

class, which is the immediate superclass of
SerialTest

.

0x72

:
TC_CLASSDESC

. Specifies that this is a new class.

00 06

: Length of the class name.

70 61 72 65 6E 74

:
SerialTest

, the name of the class

0E DB D2 BD 85 EE 63 7A

:
SerialVersionUID

, the serial version identifier of this class.

0x02

: Various flags. This flag notes that the object supports serialization.

00 01

: Number of fields in this class.

Now the algorithm will write the field description for the
parent

class.
parent

has one field,
int parentVersion = 100;

.

0x49

: Field type code. 49 represents "I", which stands for
Int

.

00 0D

: Length of the field name.

70 61 72 65 6E 74 56 65 72 73 69 6F 6E

:
parentVersion

, the name of the field.

0x78

:
TC_ENDBLOCKDATA

, the end of block data for this object.

0x70

:
TC_NULL

, which represents the fact that there are no more superclasses because we have reached the top of the class hierarchy.

So far, the serialization algorithm has written the description of
the class associated with the instance and all its superclasses. Next,
it will write the actual data associated with the instance. It writes
the parent class members first:

00 00 00 0A

: 10, the value of
parentVersion

.

Then it moves on to
SerialTest

.

00 00 00 42

: 66, the value of
version

.

The next few bytes are interesting. The algorithm needs to write the information about the
contain

object, shown in Listing 8.

Listing 8. The contain object

contain con = new contain();


[/code]
Remember, the serialization algorithm hasn't written the class description for the
contain

class yet. This is the opportunity to write this description.

0x73

:
TC_OBJECT

, designating a new object.

0x72

:
TC_CLASSDESC

.

00 07

: Length of the class name.

63 6F 6E 74 61 69 6E

:
contain

, the name of the class.

FC BB E6 0E FB CB 60 C7

:
SerialVersionUID

, the serial version identifier of this class.

0x02

: Various flags. This flag indicates that this class supports serialization.

00 01

: Number of fields in this class.

Next, the algorithm must write the description for
contain

's only field,
int containVersion = 11;

.

0x49

: Field type code. 49 represents "I", which stands for
Int

.

00 0E

: Length of the field name.

63 6F 6E 74 61 69 6E 56 65 72 73 69 6F 6E

:
containVersion

, the name of the field.

0x78

:
TC_ENDBLOCKDATA

.

Next, the serialization algorithm checks to see if
contain

has any parent classes. If it did, the algorithm would start writing that class; but in this case there is no superclass for
contain

, so the algorithm writes
TC_NULL

.

0x70

:
TC_NULL

.

Finally, the algorithm writes the actual data associated with
contain

.

00 00 00 0B

: 11, the value of
containVersion

.

Conclusion

In this tip, you have seen how to serialize an object, and learned
how the serialization algorithm works in detail. I hope this article
gives you more detail on what happens when you actually serialize an
object.

About the author

Sathiskumar Palaniappan
has
more than four years of experience in the IT industry, and has been
working with Java-related technologies for more than three years.
Currently, he is working as a system software engineer at the Java
Technology Center, IBM Labs. He also has experience in the telecom
industry.

Resources

Read the Java object serialization specification
. (Spec is a PDF.)

"Flatten your objects: Discover the secrets of the Java Serialization API
" (Todd M. Greanier, JavaWorld, July 2000) offers a look into the nuts and bolts of the serialization process.

Chapter 10
of Java RMI
(William Grosso, O'Reilly, October 2001) is also a useful reference.

补充:

使用序列化的几种情况:

1、在多层分布式程序中,比如server端向client传输对象,这个时候需要对象可以序列化。

2、把内存中的对象保存在存储设备上的时候,比如在workflow中一个流程可能要需要1个星期或者更长的时间才能完成,一般的做法就是流程不执行的时候,就把它序列化到db或者file中,不能一直让其存储在内存中,如果机器重启,内存中的对象就没有了。

3、不同平台直接交换数据、对象时候,比如java平台和.net平台,这个时候大部分会用到xml序列化,把对象转化为一个xml文档,让这个xml文档在不同平台程序间流转,其实web service也是这样一个例子,.net建立的web service,其它程序都可以访问这个web service,原因就是web service传输的就是xml文档;当然也可以通过把数据、对象写到数据库中,另一个平台的程序再从数据库中读取。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐