Thursday, April 28, 2011

efficient java object graph serialization

What is the best approach for serializing java object graphs?

My requirements for serialization library are 1) speed of deserialization 2) size - as small as possible (smaller than in java default serialization) 3) flexibility - annotation based definitions of what has to be serialized would be nice.

the underlying file format is not important.

I looked at Protocol Buffers and XStream, but the former is not flexible enough due to the need for mapping files and the later produces big files.

Any help appreciated.

From stackoverflow
  • I think default Java serialisation is going to be pretty small. Can you not usefully restrict what you want to serialise via the transient keyword ? That would address your third issue (flexibility and annotations)

  • For serialization Hessian is one of the most efficient.

    This is about 2-3 times smaller and faster than Java Serialization, even using Externalizable classes.

    Whichever serialization you use, you can use compression fairly easily to make the data more compact.

    Beyond that you can write your own serialization. I wrote a serializer which writes to/from ByteBuffer which is about twice as fast and half the size of Hessian (about 5x faster/smaller than Java Serialization) This may be too much effort for little gain if existing serializations will do what you need. However it is as customizable as you like ;)

    Esko Luontola : What kind of a serializer did you write? Does it work for any objects, or do you need to write custom serialization code for each class? Are cyclic object references allowed?
    Peter Lawrey : It very much like Hessian. It can serialize any object except those which model real resources outside Java, like Threads, Sockets etc. Youc an write custom Serialization but as it uses some smart on the fly compression, custom serializers tent to be slower!
    Peter Lawrey : "Are cyclic object references allowed?" - Have an open source version which doesn't support this and one I wrote another version for work which does. ;)
  • For small objects, the Java serialised form is likely to be dominated by the description of the serialised classes.

    You may be able to write out serialised data for commonly used classes, and then use that as a common prefix for a series of serialised streams. Note that this is very fragile, and you'll probably want to recompute and check it for each class loader instance.

  • Would http://jserial.sourceforge.net/ suit your needs?

    Peter Lawrey : From their benchmark results it appears "Bubble" deserialization is *slower* than plain Java 1.4.2 serialization.
    Esko Luontola : In two of the benchmarks it's slower, in others it's faster. It depends on what is being serialized and anyways 1.4.2 is ancient, so you should benchmark it with your own application and environment to see if it suits you.
  • I second the note about usefulness of compression -- all formats compress to about the same, i.e. bigger output compresses more.

    Beyond that and other recommendations, JSON with Jackson works quite well: much faster than XML (competitive with PB, Hessian) and bit more compact; much more flexible than PB, easy to integrate with client-side JS (if that matters) and easy to trouble-shoot.

0 comments:

Post a Comment