Hibernate: Does MVC make LAZY a 4-letter word?

Let’s discuss how Hibernate, an ever-popular O/R mapping tool, works for data retrieval in enterprise Java applications. Usually, objects are represented in relational DB as trees. When you attempt to load an object, how far down the branches do you want to go to retrieve information? If you get the answer wrong, your retrieval operation will be inefficient. Let’s consider a Customer object that has subordinate Address objects and Account objects (in a real application, it may have dozens more, but for the purpose of this discussion these two will do). Your application requests to load specific Customer. Should Hibernate load Accounts too? If it does not, and you later attempt to access an Account, you are in for a big disappointment, because the Account is unavailable. You probably will receive a NullPointerException. I’m oversimplifying situation and considering a naïve O/R mapper here. Real O/R mappers, such as Hibernate, have developed solutions to deal with this conundrum, but it is my intention to introduce the problem first and explain how O/R mapper approaches it next.

If Hibernate misses your intent in the opposite way and loads Accounts you do not need, it is not very good either. You simply can not load all Accounts and Addresses every single time. It might trigger several SELECT queries for different tables, while you only need data from one – as clear example of waste as it gets. Even if there is only one SQL SELECT query, the database cost of retrieving all this information will be quite high as RDBMS will traverse multiple tables to satisfy all inner joins. Your application will also incur the cost of unmarshalling all the objects you never needed from wire format into Java. Heap memory will be allocated for all the subordinate objects you do not need. And you may not even be able to load the entire object tree at all, depending on the way your database is structured.

As you see, it’s pretty important to know which subordinate objects your application needs loaded. Hibernate addresses this by providing a number of fetching strategies for object associations. Default behavior provided by Hibernate for collection-valued associations is lazy initialization: a collection is fetched when the application invokes an operation upon that collection. In our scenario, if you only want to access the Customer object, lazy initialization does a nice job and protects you from “overloading”: no unwanted associations will be fetched. But what about the other case, when you do want to load the accounts? Hibernate provides several options that help you tune the access, including regular lazy initialization with select fetching, resulting in one SELECT for all accounts, and super-lazy initialization suitable for large collections.

Now, let’s consider an MVC or n-tier application. In an application of this kind, data retrieval operation is separated from code that utilizes data by architectural layering or even physically (in a different JVM). How does this change the way Hibernate’s Lazy Initialization works? Quite a bit.

Problem is, you need to access an Account for Hibernate to trigger a SELECT. If your application is multi-tiered or an MVC, it usually accesses and processes data, such as accounts in a totally different layer then it fetches data. A data access method returns a Customer object loaded from database. Then a different method, maybe a JSP in View phase, accesses the Customer object and tries to navigate to accounts. You’d need to keep database connection open so that your code could trigger additional DB query for an Account(s) AFTER Customer object has been retrieved and returned from the data layer. When applied in MVC application, this pattern is known as Open Session in View, because Hibernate Session remains open even when control has returned to the View layer. I personally find Open Session In View unworkable on too many levels. Conceptually, it makes a joke of layering, as data access is now happening all over your code. DB connection and transaction management becomes more difficult, because you can not close connection at the end of data access method. Now you need a framework for closing connection reliably with some kind of “request finalizer”, such as a Servlet filter (check out an example in this article to see how ugly it is). From often overlooked performance perspective, this dramatically extends the length of time the application holds on to connections, requiring much larger connection pool.

If application tiers are physically separated, Open Session in View becomes impossible altogether (for obvious reasons, JDBC connections and Hibernate Sessions are not serializable/transportable to another JVM).

We see that lazy initialization by itself is not enough for MVC/n-tier application. Because of layer separation, you can not defer association data retrieval “until it is needed”. You must know going into the data layer how much data you need (which associations to retrieve). So you make your data retrieval method in the data layer “need-aware”. Your data retrieval method for the Customer object in Controller (or data layer) will accept additional parameters indicating whether you also need Accounts and Addresses. A numbers game that happens next is very important. Does your application need a lot of different sets of associations? If yes, you will quickly realize that individual methods per association (e.g. getCustomer, getCustomerWithAccounts, getCustomerWithAddresses) are not sustainable. You can not keep adding methods like getCustomerWithAccountsAndAddresses, since the number of methods grows exponentially, doubling with every new association. Six associations is a low number for a typical real-life object and it still gives you a whooping 64 methods, which is impractical. If you only need 2-3 combinations (such as customer alone, customer with accounts and customer with all associations), you are lucky and this article does not apply to your situation. But if you do need all those multiple permutations of associations, please read on. An obvious solution to this puzzle is a single load method with dynamic choice of associations to retrieve (we can dub it getCustomerWithAnyAssociations). The method will take an additional argument: an association filter object comprising a collection of flags, one per association. If you want a particular association to be fetched, you set corresponding flag.

An interesting corollary to this design is that it enables defensive programming of access to associations. Since you make explicit decision on whether to retrieve each association, you can remember these decisions in a form of flags and use them to protect access to business methods that would need data in the association.
if (customer.accountsAvailable() {
// do something that needs accounts
}

Next, let’s drill down into the body of the data retrieval method. For Lazy Initialization to trigger a load of an association, you need to access the association. It is possible to do this in a simplistic way: if you need an Account, your code will “touch” Accounts by calling, e.g. customer.getAccounts().size(), which will trigger loading of the Accounts. This approach is workable and I’ve seen it used in the real world. However, this approach has an obvious flaw: the entire design is built on calling a “touch” method solely for its side effect (you don’t really need Accounts at that point and would be happy if accounts just sat there in the Customer object, but you need to call this method to force Hibernate to load the association). Relying on side effects for core functionality is not a good design approach as it may be difficult to understand and easy to break in maintenance.

How can one improve on the “touch” approach? Give up on the idea of transparent persistence and think of Hibernate as a data service that performs explicitly modeled CRUD operations and design your data loading methods the way you would design data services that retrieve data from a remote location. Thinking of the database as a separate and distinct tier of your infrastructure will give you better insight into performance implications of your design.

So you construct Hibernate query dynamically explicitly specifying which associations to fetch. It is usually fairly easy to accomplish with either HQL or Criteria query. But look around – with this move, Lazy Initialization has disappeared from your design. At most, you are using it to avoid automatically loading any associations when you only need the base object. But you do not allow Hibernate to load associations on access. MVC/n-tier architecture made it difficult to use Lazy Initialization in exactly the situation that is was intended for: a large number of associations and widely varying need to load.

Advertisements

Comments are closed.

%d bloggers like this: