Den of Antiquity

Dusting off old ideas and passing them off as new.

Minimizing Hibernate Schema Complexity

In my applications I persist a lot of structured data into the database. Hibernate and other ORMs are great at making this easy to do. One problem is that they tend to map the entire object structure to rigidly structured database fields and tables. Quite often that’s just overkill.

Strings as a black box datatype

Take strings for instance. A VARCHAR makes perfect sense if you know the string will always be less than, say, 30 characters. Even if your not going to use that string field in a where clause, ever.

But if you won’t know the max size the string will be for every instance, or it could be a very large string and you’re never going to use it in a search, then your fine using a LOB column to store that string.

Modern ORMs will even read/write the whole string value into/out from the LOB column for you. Or give you the CLOB instance itself for better performance, whichever you prefer.

I’ve experienced many DBAs that don’t like the use of LOB columns. But I’ve never understood that, as this is exactly the purpose that LOB columns were meant to solve. The storage of arbitrary data in the database, no different than in a file, without any sort of internal random access, indexing, or searching. A Blackbox-column to the database server, if you will.

Complicated Object Graphs

Worse is when you have classes with lots of field and subobjects. This can get messy, even if they’re just value objects.

The multiple tables, table hierarchies, link tables, and huge column lists that can result, while entirely valuable for some classes, are just overkill for others. Especially when you won’t be indexing, searching or retrieving individually, any of those fields/rows/objects.

Persisting Serialized Classes

ORMs can persist a whole instance into one field. They do so using serialization of some form or another to turn the instance into flat data like a byte array. Then this data can be persisted as a BLOB value.

In the case of Hibernate, its as simple as marking a reference (non primitive) field with @Lob, just as with the strings. But in this case, by default, Hibernate will use the standard java serialization mechanism. Which some would take exception *cough* to.

You can provide your own serialization mechanism of course. I’m not sure how to do this across the board for hibernate. But my solution is just to create a custom hibernate UserType implementation. Then I can mark the fields with this custom UserType, that I want stores as non-entities.

This is surprisingly easy.

XML

For the serialization format, I chose XML. There are a number of good XML serializers for java objects. Some even work well on arbitrary classes, though you shouldn’t be putting just any classes in your databases columns.

I use XML because its easy to;

  • read, when debugging
  • manipulate if need be
  • migrate, on object structure changes

I chose xstream as my serializer because;

  • * its easy to setup and use.
  • * it maps most classes automatically
  • * is extremely flexible

Persisting Exceptions

Another reason we went with xstream, is that in some cases we serialize exceptions, for debugging and posterity. This can be an whole mess by itself. So I needed something that could serialize most arbitrary classes, without configuration. As you never know whats going to pop up in an instance field in an exception class/cause hierarchy.

But at the same time I didn’t want to use different serialization solutions, one for exceptions and another for my real data classes.

I’ll leave out the discussion about how to deal with bad/unserializable exception classes *cough oracle cough*. Truth be told, we’re phasing out persisting exceptions after all. Which is just common sense.

There are other good serialization libraries out there, though.

Lob compression

Another discussion I’m leaving out is the compression of the string XML CLOB data into a binary LOB. Which we do as an easy space saving measure. But once you’ve got the hibernate UserType setup, something like this is rather academic.

Whole Entity Serialization

I’ve considered serializing the whole entity to one field. Only pulling out specially annotated fields into individual table columns for indexing and searching.

But this doesn’t seem easy, let alone a good idea. That would leave you with redundant data, broken out into non-LOB columns and serialized in the LOB column at the same time. Which could lead to bugs. And the implementation of this would be complicated to begin with.

Metadata Objects

Its possible to cajole all your non-column fields into one Metadata object. So that they can be serialized into one LOB column in the table for your entity. But I find this is also overdoing it.

Databases can have multiple LOB columns per table. Having a few instead of just one isn’t going to have a major impact. I just design my classes naturally, and only resort to special Metadata objects like this in rare situation. Say, when I have a TON of fields in a class, and I won’t be using them in SQL.

That may mean a few other primitive fields get columns of their own, as well.

Nitty Gritty

Lets get down to some code then. Its pretty straight forward, if you check the javadocs for the hibernate UserType.

Of course Serializer is my singleton class providing the chosen serialization mechanism.

public class XMLUserType implements UserType {

  @Override
  public int[] sqlTypes() {
    return new int[]{ java.sql.Types.CLOB };
  }

  @Override
  public Class<Serializable> returnedClass() {
    return Serializable.class;
  }

  @Override
  public boolean isMutable() {
    return true;
  }

  @Override
  public Object deepCopy(Object value) {
    return Serializer.deserialize(Serializer.serialize((Serializable) value));
  }

  @Override
  public Serializable disassemble(Object value) {
    return (Serializable) value;
  }
  @Override
  public Object assemble(Serializable cached, Object owner) {
    return cached;
  }

  @Override
  public Object replace(Object original, Object target, Object owner){
    return deepCopy(original);
  }

  @Override
  public boolean equals(Object x, Object y) {
    if (x == null ) {
      return y == null;
    }
    else {
      return x.equals(y);
    }
  }

  @Override
  public int hashCode(Object x) {
    return x.hashCode();
  }

  @Override
  public Object nullSafeGet(ResultSet rs, String[] names, Object owner) throws HibernateException, SQLException {
      String columnName = names[0];

    try {

        Reader stream = rs.getCharacterStream(columnName);

        if ( stream == null )
          return null;

      // slurp reader to string.
      StringBuilder buffer = new StringBuilder();
      char[] c = new char[1024];
      int numRead = 0;
      while ( ( numRead = stream.read(c) ) != -1 ) {
        buffer.append(c, 0, numRead);
      }
      stream.close();

        return Serializer.deserialize(buffer.toString());

    } catch (IOException e) {
      throw new HibernateException("IOException while reading clob",e);
    }
    
  } 


  @Override
  public void nullSafeSet(PreparedStatement st, Object value, int index) throws HibernateException, SQLException {

    if( value == null ) {
      st.setNull( index, sqlTypes()[0] );
      return;
    }
   
    String xmlString = Serializer.serialize((Serializable) value);

    Reader reader = new StringReader(xmlString);
    st.setCharacterStream(index, reader, xmlString.length());
  }
 
}

notes

  • You can followup the equals() and hashcode() implementations by comparing the serialized data as well. This might make sense in some cases, if the serialized form is simpler than the runtime form of your objects. This can happen with lots of transient fields.
  • Oracle can put LOBs inlined in the table row data. When the lob is less than 4k. You’ll want to be certain whether its doing it in your table or not. In case that blows up row size.

Comments

Jason Stillwell
Good idea. I also understand YAML is popular.

Do Java<-> JSON serializers handle cyclic reference in the object graph well?
Toi
If I have to do that, I will, personally, choose JSON over XML.