Mind is Software

Ying’s thoughts about software and business

No Repository Pattern

This article explores the concepts of repository pattern and why not to use it.

The Original Definition

The original repository pattern definition: “a repository mediates between the domain and data mapping layers, acting like an in-memory domain object collection. “.

The purpose of a repository is that it

isolates domain objects from details of the database access code. In such systems it can be worthwhile to build another layer of abstraction over the mapping layer where query construction code is concentrated. This becomes more important when there are a large number of domain classes or heavy querying. In these cases particularly, adding this layer helps minimize duplicate query logic.

It also gives hints about the implementation:

The Repository will carry out the appropriate operations behind the scenes. Conceptually, a Repository encapsulates the set of objects persisted in a data store and the operations performed over them, providing a more object-oriented view of the persistence layer. Repository also supports the objective of achieving a clean separation and one-way dependency between the domain and data mapping layers.

The Implementation and Benefits

Because data persistency is a common requirements in many applicatoins as so-call object-relational mapping (ORM). Java, C#, Ruby all have popular repository implmentations. The repository pattern is implemented and even standardized as JPA in java. The JAP defines dozens of methods like findById, findAll, count, delete, save, saveAll. It abstract details such as paging and sorting in different databases. An implementation may add search by a column name and more common functions.

The JPA “encapsulates the set of objects” and is used to “minimize duplicate query logic”. It achieves the goals originally defined, actually more goals such as cache.

Looking closely, you will find that the implementation is a collection of dumb ojects that supports CRUD operations via a set of standard methods. The biggest benefit is that it is easy to use. It is just a set of simple Java methods to call and you forget the database.

Of course, it is not free, especially for a problem

The Expense

The repository tries to make a complex problem simple. This is documented in a lengthy and detail blog Object-Relational Mapping is the Vietnam of Computer Science.

It has consequences. The three big issues are:

  • You have to know what happenning to use it correctly. This one is even worse than the first one. You deal with cache (session), N+1, transactions etc.
  • The implementation has to be leaky to do the job that it cannot do. It is a feature by its creator Gaving King.
  • It usually brings more data than needed and the performance is too slow, in many cases an order of magnitude slower.

Some debates are the following:

Nonethe less, ORM is not bad for a simple application where perfromance and accurate control is not a big issue.

The Solution

Actually, the popular use of Spring JdbcTemplate and Mybatis give hints that database-first approach are a good alternative to the ORM approach. However, the use of raw SQL is painful. Luckily, tools such as jOOQ and QueryDSL bring typed SQL that could solve the problem nicely. Dapper is a similar tool for DotNet.

The solution can be summarized as:

  • Embrace Typed SQL to explictly control your data access and transactions.
  • Build a data access layer (DAL) uisng data access object (DAO) to share commmon functions.
  • Use DTO to get and update only relevant data.
  • (Extra feature) Use async queris and compose the results.

Disadvantages for the proposed solution:

  • Tight couple to relational database
  • Require good SQL skills
  • A lot of data mapping
  • DB-frist model depende on database schema