- Wiegand.DK - https://www.wiegand.dk/wordpress -

How to build a #domainmodel with #JPA2 and #Hibernate

How to build a domain model with JPA2 and Hibernate

I have been working with enterprise system for a while and mainly as an Infrastructure Architect – The person that binds all things together to a complete working system. One of the many integrations in an enterprise system, is the internal integration from the code (domain model) to the database. This is normally called ORM (http://en.wikipedia.org/wiki/Object-relational_mapping [1]) – Object-relational mapping. Over the many years I have worked with this, the integration of cause have evolved from a simple and direct SQL approach to gigantic automatic frameworks. This is of cause both a good and a bad thing.

This first question I normally get when talking about ORM is: What about the overhead – is it fast enough? And the simple answer is…. NO!

Out of the box there are many pitfalls when using a big and heavy framework to handle the database integration and by default it is slow as hell. Most of the snippets on the Internet don’t care about the complete domain model and the main focus is only on how easy it is to “automate” the database integration.

But… you can of cause do something to make it faster.

This is my take on how to use a heavy database integration framework, do’s and don’ts, best practice, optimization, etc.

I have chosen to use JPA2 (http://en.wikipedia.org/wiki/Java_Persistence_API) and Hibernate (http://en.wikipedia.org/wiki/Hibernate_(Java)) as implementation. There is probably a better (and much smaller) framework out there but the force for pigging a widely used framework is that you easily can Google information and get help. And there is a lot of muscle for hire (consultants) that have experience with JPA2 and Hibernate.

As core framework I use spring. I will not go into how spring is used and how the setup regarding the database for handling transactions, pools, etc has been done. The JAVA version is depended by the customer I work for – currently from JAVA 6 to the latest version (JAVA 8).

Where to begin

It is hard to start using a new framework and it gets even header using the framework correct. A common used framework as JPA2 generates a lot of information on the Internet and it is not easy to find the good information. There are of cause some good information out there and I will try to add some more – so please comment if you have suggestions on how to make my text better and remind me if I forgot key points or if I made an error or two.

DZone have made a good and easy overview of JPA2. They have done many good refcardz over the years that I have enjoyed reading. Download their JPA2 refcard to get started and take a look around their site for other information on JPA2 and Hibernate.

http://refcardz.dzone.com/refcardz/whats-new-jpa-20 [2]

The domain object

The domain model is the core of your application and should reflect your area of business as correct as possible. Every domain object that should be stored in the database must be annotated with entity (javax.persistence.Entity) and table (javax.persistence.Table).

@entity
@table(name = “PREFIX_FOO”)
public class Foo {
	// The body
}

The @entity annotation is obvious but why use the @table annotation? Why not just use the name of the class as table name? The domain model and the data model are two different models and should of cause be handled as two different models with a mapper in the middle. Another thing is that you don’t want to let the persistence framework handle and/or dictate your database versioning and refactoring. A good framework for handling the data model with care is Liquibase (http://en.wikipedia.org/wiki/Liquibase [3]). The persistence framework (Hibernate) can do it – but in my opinion it is not good enough in an enterprise world and it will by time be hard to make refactoring.

A key feature in a domain object is the primary key – the id of the domain object.  One or more fields can define the domain object id.  You can use more than one field for the id but it is not advisable and it will make your life miserable if you do. Everything is just more difficult – especially maintenance – if you choose more than one field and it will also have impact performance wise. You want to make it easy to generate a new id, easy to reference a domain object and easy to optimize the database footprint when generating the next id. If you seldom create a new domain object you can live with the time it takes to get the next id. But if you create a lot of domain objects you can gain from getting the ids in chunks (batch).

I have landed on using the table generation of domain object ids.  Then I know how the id it created regardless the database and I can set the amount of ids I want to be generated at a time.

@Id
@GeneratedValue(strategy = GenerationType.TABLE, generator = "Foo.id")
@TableGenerator(name = "Foo.id", table = "PREFIX_DATABASE_SEQUENCE", pkColumnName = "SEQUENCE_NAME", valueColumnName = "NEXT_AVAILABLE_ID", pkColumnValue = "PREFIX_FOO.FOO_ID", allocationSize = 1)
@Column(name = "FOO_ID")
private long id;

All annotations etc. are from the package [javax.persistence]. For the id table strategy to work you have to create the table where the ids are stored.

Depending of the database vendor you can use the SQL

create table PREFIX_DATABASE_SEQUENCE (
	SEQUENCE_NAME varchar(100) not null,
	NEXT_AVAILABLE_ID int not null,
	primary key(SEQUENCE_NAME)
)

Note: I prefix all the tables in the database to tie them to the domain. Sometime the same prefix is used for all the tables. Sometimes different prefixes are used to the areas in a big domain.

By default the allocationSize attribute should be set to 1 – but if the domain object is create frequently it should be set to 100 or 1000.

All attributes in a domain object should have the “column” mapping annotation that maps between the domain object and a table. The annotation is named accordingly to the type of relation.

Relations between domain objects

There is not much fun in domain object without relations and there are many things that can be applied to a relation. The two main aspects that need your attention are how to fetch data and how to handle cascade between domain objects.

A normal domain model will quickly be complex and over time the complexity will rice.  Furthermore a normal domain model tends not to have one scenario on querying data but multiple ways (to accommodate the business requirements). Therefore it is hard to decide on a strategy up front that will apply to your every need in the present and future.

With this in mind you don’t want to fetch more data from the database that is needed at a given time. All relations should by default be lazy fetched and there should be a very good reason to make the relation eager fetched – not just that “I could not get it to work otherwise” (a common notion on the Internet).  Now you can make a query (per business requirement) that only fetch the data needed and make the smallest footprint on the database. Furthermore it forces you to make a decision on what data you want to fetch. If you don’t load all the data needed up front you will run into a lazy initialization exception.  The relations where you have to use lazy are:

@ManyToOne(fetch = FetchType.LAZY)
@OneToMany(fetch = FetchType.LAZY)
@OneToOne(fetch = FetchType.LAZY)
@ManyToMany(fetch=FetchType.LAZY)
// etc.

Note: Don’t relay on possible defaults that state that it should be lazy – always state that it is lazy. Then you don’t get doubts when overviewing the code and it is easy to read for others. Read: It is not a coincident – it is a choice.

The relations in the domain model are all about CRUD (http://en.wikipedia.org/wiki/Create,_read,_update_and_delete [4]) – create, read, update and delete. How the domain model will behave by default when you do one of the four operations. It is up to you how you want to traverse your specific domain model – but I can point out the pitfalls when you define relations.

When you handle your domain model it should be done via repository classes and all the repository classes should be transaction aware. I handle the transaction awareness with spring (maybe I do a blog about a good spring setup in the near future :-).  Every repository should have access to the persistence content used (javax.persistence.PersistenceContext) / entity manager (javax.persistence.EntityManager) via annotations or other enterprise magic.

@PersistenceContext
private EntityManager entityManager;

Create

You should be aware of when you are creating a domain object and when you are updating a domain object and act accordingly. A lazy approach would be using merge for both create and updated but there will be and overhead when doing this.

An easy way of deciding between creates and update is to look at the base domain object. If the ID is not set (mostly zero (0)) the persisting framework can only create to object. If the ID is set you probably have to update (if the object exists).

The persisting of an object is pretty strait forward and there are not that many optimizations you can do. You should of cause decide if a batch edition of the method is needed so you don’t overwhelm the database with commits. The default persist method should look something like this:

public Object persist(Object domainObject) {
	entityManager.persist(domainObject);
	entityManager.flush();
	return domainObject;
}

Note: The domain object should be type safe and not of the general object type.

This approach will persist the domain object and its children/parents, etc. according to the specific relations you have set op – meaning that you will (at least) create one SQL insert statement and probably a handful. If this persistence approach fails you should probably look into how you have structured your domain model relations and do a restructure.

Update

The big issue when you update; is the domain object is detached or not? A domain object is mostly detached when working with web services or GUI implementations like Apache Wicket. If a domain object is detached JPA needs to load the full domain object model and all children from the database in order to make the merge. This is not a problem if the model is a single domain object, but when the model becomes “deep” – JPA will by default make a select query for each domain object in the model.

An example of this could be a mother with two daughter lists – a and b. A select will be made for both the mother and one for each daughter list – a and b (3 queries).

An easy way of handling this problem with only one select is to instruct JPA to do so. In JPA this is done by joining the domain object daughters with a join fetch. For now lets state that this is done in a single JPA query called QUERY_FIND_DEEP and that query takes the ID of the domain object as argument.

The default merge method will then be something like this:

public Object merge(Object domainObject) {
	entityManager.createNamedQuery(DomainObject.QUERY_FIND_DEEP, DomainObject.class)
		.setParameter("id", domainObject.getId())
		.getSingleResult();
	entityManager.merge(domainObject);
	entityManager.flush();
	return domainObject;
}

If implemented correct this should result in one query that fetched the domain object model from the database and then JPA/Hibernate does only the needed updates and deletes queries.

At this level you can also easily state attributes you don´t want to overwrite. An example could be a created attribute on the domain object.

public Object merge(Object domainObject) {
	Object currentDomainObject = entityManager.createNamedQuery
		(DomainObject.QUERY_FIND_DEEP, DomainObject.class)
		.setParameter("id", domainObject.getId())
		.getSingleResult();
	domainObject.setCreated(currentDomainObject.getCreated());
	entityManager.merge(domainObject);
	entityManager.flush();
	return domainObject;
}

This is easy to read and understand when new developers are introduced to a project and every update will now have this constraint.

Note: Be careful with this approach if you introduce more that one update method for a given domain object.

Delete

Deletion (or removal) should almost be handled in the same matter as update. The most important thing is that – if not specified in JPA how to handle delete – the default behavior is an SQL delete for each table in domain model where cascade delete is applied. By loading the domain object by question “deep” only the needed deletions will be made.

public void remove(long id) {
	entityManager.remove(
		entityManager.createNamedQuery(
		DomainObject.QUERY_FIND_DEEP, DomainObject.class)
		.setParameter("id", id)
		.getSingleResult());
	entityManager.flush();
}

Read

The hardest operation to handle correct is the read operation. The operation should be tailored depending of how the domain object is used and only the data needed should be loaded to ease the load at the database.

Previously stated in how to handle the relation between the domain objects all relations should be lazy loaded. This means that the developer must imply the relations that need to be loaded in every read. The most common read methods (find in JPA) are:

The read queries are written in JPQL (http://en.wikipedia.org/wiki/Java_Persistence_Query_Language [5]) – JAVA Persistence Query Language. It is almost SQL syntax but don’t confuse it with SQL – you are handling objects – NOT tables and columns.

When creating a new read query it always have an owner (a root object). The query should always be attached to the root object as a named query. This is only possible for static queries. Dynamic queries should be avoided if possible but in some cases they are the only option. The query is attached to the root object with the @NamedQuery annotation from the [javax.persistence] package. The named queries should be nested in the @NamedQueries annotation.

The name of the named query can easily be miss spelled and it furthermore has to be unique without the application. This means that you have to have a naming standard to handle the names and “type saved” way of referencing each query. This can be done in a 1000 ways but the simplest approach is in my opinion mostly the best.

To ensure that the named query is only stated one place and then this reference is used without the application I use a public static final string placed in the root object together with the JPQL query. I always prefix name of this static attribute with QUERY to state that this is a query. Use good describing names or conventions like; QUERY_FIND_DEEP, QUERY_FIND_ALL, QUERY_FIND_ALL_DEEP, etc. Then it is easy for developers and well understood what is expected of the query.

It is a good ide to use the root object name together with the name of the constant as the named query name. It will of cause conflict if you have to domain object named the same but then you will probably also run into other problems. My suggestion is that the string should contain:

[the name of the root object].[the name of the query constant]

The last thing before we can make the query is how to get JPA to load the data needed. In JPA2 this is done with a fetch join. This means that you in your query join the attributes that are needed and state that they should be fetched in a single query. This is one of the most important things when optimizing JPA – only one query is made with only the data needed – beautiful 🙂

But… yes – there is always a but! You can get multiple instances of the root object. This can on some database be fixed by applying a “distinct” in the query but is doesn’t work on all databases. To fix it in general you have to use a set instead of a list when handling collections.

Below is an example of JPQL of a domain object with two children (domainObject – attribute a and b) that load the root object and both of the attributes.

@NamedQueries({
	@NamedQuery(
		name = DomainObject.QUERY_FIND_DEEP,
		query =
			"select o " +
			"from DomainObject o " +
			"left join fetch o.a " +
			"left join fetch o.b " +
			"where o.id = :id ")
})

@Entity
@Table(name = "PREFIX_DOMAIN_OBJECT")
public class DomainObject {
	public static final String QUERY_FIND_DEEP = "DomainObject.QUERY_FIND_DEEP";

	// ...
	private long id;

	// ...
	private A a;

	// ...
	private Set<B> b = new HashSet<B>();

	// ...
}

Note: Remember indexes in the database on foreign keys and “where” / “group by” selections otherwise the selections will be slow.

Join fetch Hibernate issue
When using a join fetch there is a small challenge. The challenge is that Hibernate by default returns multiple instances of the root domain object. This is nicely described by the Hibernate community here: https://community.jboss.org/wiki/HibernateFAQ-AdvancedProblems#jive_content_id_Hibernate_does_not_return_distinct_results_for_a_query_with_outer_join_fetching_enabled_for_a_collection_even_if_I_use_the_distinct_keyword [6]

But does Hibernate do the right thing when returning multiple instances? When you google the issue you will quickly found out that people are divided. I don’t agree or disagree on the matter but I have until now found a use for the multiple results.

As hibernate describes there are multiple ways of solving the matter. The common way of doing it is to convert the result list to a set (http://docs.oracle.com/javase/7/docs/api/java/util/Set.html [7]). Then you can decide on what set you want to use but that is op to the given situation and what you goal is. If you don’t have an opinion you should use the LinkedHashSet.

Another solution as suggested by Hibernate is to use the distinct keyword in the query.  Don’t do this… not all of their database implementation can handle this – for instance will DB2 fail if you try to add distinct.

Another limitation when you use a join fetch query is that you now cannot filter out any results by using setFirstResult or setMaxResults (do paging). If you try to use paging you will in most cases only get part of the domain object structure because the filtering are SQL row based and not root domain object based.

Multiple Bags Hibernate issue
Relations involving @OneToMany or @ManyToMany are common when dooing a domain model. With other words: You will have a lot of mother/daughter relations within your domain model. When joining this model together by join fetch – Hibernate only allows one bag (List) relation for each statement. This means if you have two mother/daughter relations in your statement only one of them can be stored in a bag (list). This is of cause fixed by using a Set. But remember that a set it NOT the same as a list. If you want to preserve the order in a set you have to use the LinkedHashSet (as mentioned above). Another thin is that it is just a bad habit always using a list.

@OneToMany(cascade = {},mappedBy = "myMother", fetch = FetchType.LAZY, orphanRemoval = true)
private Set<MyChild> myChildren = new LinkedHashSet<MyChild>();

Read – but no entity

In some cases special reads are necessary where an entity not exists. This can be done in many ways but is should be easy to understand, easy to maintain and where it is possible look and feel as a normal query.

A good option is to make a dummy entity that behaves as a normal entity. This dummy entity does not have any database references to a table but otherwise is uses the JPA annotations (@column(name = “foo”)) to handle the mapping.

The query can be done in JPQL or in SQL. If done in SQL it should be done as a normal JPQL query but the static string should actual SQL and not a reference to the JPQL. Then the query can be referenced from a repository and handled as just another entity.

Note: When handling SQL queries from a repository the main deference is to use “createNativeQuery” instead of “createNamedQuery”.

Another approach to create a selection without an entity is to use the “select new NotAnEntity(a, b, c) …” constructor. Then you state that this is NOT an entity and you put the sesult into this new object called “NotAnEntity”.

I have used both over time and I am not sure that one of them is better that the other.

Enumerations

Enumerations are handled as a special case. I have written another blog just about how to handle enums in JPA2 – http://www.wiegand.dk/wordpress/?p=25 [8]

Verification

It is important to verify the SQL you think is executed is actually executed. Many developers ignore the verification and just base their trust in the framework to do the right thing. Don’t, don’t, DON’T!!! You must always verify that the framework do exactly as you expect.

The “easy” way of doing this is first of all to enable debug information from Hibernate. This is done with the Hibernate properties and in a Spring application stated:

<prop key="hibernate.show_sql">true</prop>
<prop key="hibernate.format_sql">true</prop>

You must Google this to find out how to enable this in your specific setup.

With the debug enabled you can build some test cases that runs the specific queries. I use Spring JUnit 4 test classes for this.

Thanks for listening. This is all for now.

Best Regards

Rolf Wiegand Storgaard