Modeling Reference Data in the Application Tier


(6 comments)
July 25th 2011


In every enterprise application, there exists reference data - those simple entities that live on the periphery of your data model, and are used to classify your master data. We often know these entities by other names (depending on our point of view), for instance "lookup tables", "configuration data", etc. - examples being "State", "CustomerType", or "ProductType".

Now there are some strong reasons to model these entities in separate tables in your relational database, and there is generally not much debate here. Where things get a little tricky is when this reference data propagates up into the application tier. In my experience, developers employ a number of different strategies to solve this problem, and so in this post I'll define some of the more popular approaches and describe the pros and cons of each. As always, I'd love to hear your thoughts, so please share. Here goes...

Types of Reference Data

Let's first start with an example. Imagine your system has customers, and every customer has some loyalty level (e.g. "Gold", "Silver", etc."). The tables could be designed simply as follows:

 

reference-data-erd1

 

...and the data in the LOYALTY_LEVEL table might be...

 

id name
G Gold
S Silver
B Bronze

 

This is a pretty vanilla example, so it would be tempting to jump right in and translate this into the application tier, but before we do it's important to stop and ask a few questions first:

  1. Does the reference data need to be retrieved independent of the master data (e.g. display all loyalty levels in a drop-down, etc.)?
  2. Are there other attributes beyond just the normal id-name pair? For example, "description", "abbreviation", or something with more business meaning like "percentage discount", etc.?
  3. Do specific values need to be hard-coded into the business logic? For example, could there be logic if (cust.loyaltyLevel == "Gold") { sendOvernightMail() } ...or something similar?
  4. Is the reference data at all dynamic? For example, could there be a new loyalty level at some point (e.g. "Diamond" or maybe "Coal"!)? Further, is this reference data very dynamic in that it could be added by a user via some UI, or would a new value be a configuration change?

Now I can hear the objections already: "hey, not all of these types of 'reference data' are really reference data!". And that's actually the point. Oftentimes, in my experience, we fail to appreciate these four distinctions, and we shoe-horn all entities that even roughly resemble reference data into the same solution, to sometimes pernicious consequences. For example, we may treat States the same as Loyalty Levels in the application tier, even though the former is probably static, not used in business logic, and will only ever have two-attributes, while the latter could be the exact opposite.

To best understand how to design for different types of reference data, it's helpful to lay out a few common solutions...

1. Flatten in Model Class

An unsophisticated (but not uncommon) approach is to flatten the reference data in the containing model object. For example, the customer class could look like this:

 public class Customer {
    private String name;
    private String loyaltyCode;
    private String loyaltyName;  // optional
  }
  

Obviously this will work, until a new attribute needs to be added to a Loyalty Level (e.g. "description")...or Loyalty Codes need to be retrieved independent of Customers (e.g. to populate some drop-down), in which case you'll probably want to define a...

2. Custom Class

A custom LoyaltyLevel class could be created with the two attributes "code" and "name"...

  public class LoyaltyLevel {
    private String code;
    private String name;
  }

...and then composed within the Customer class:

  public class Customer {
    private String name;
    private LoyaltyLevel level;
  }

On the up side, the LoyaltyLevel class could be fetched independent of Customers (e.g. to populate a drop-down, etc.), and attributes could easily be added to the LoyaltyLevel class without adding unnecessary bloat to the Customer class.

The problem comes when you want to create some business logic that refers to some specific loyalty level. Using the example above, it's possible that some business rule should be written such that customers at the "Gold" level are treated in special manner.

  if("G".equals(customer.getLoyaltyLevel().getCode())) {
    sendOvernightMail();
  }
  else {
    sendViaHorseAndBuggy();
  }

Again, this works, but there are a few problems. First, the "code" attribute in LoyaltyLevel is not type-safe, since it's just a String in the LoyaltyLevel class, and so there's not guarantee that an invalid value (e.g. "X") couldn't creep in. Second, we're hard-coding the level, "G", into our business logic, and while this could be ameliorated with a simple String constant, again, this is not type-safe (and a general anti-pattern). And finally, it's very easy to forget to write bugs where "G" is compared with the LoyaltyLevel object itself and not it's String code. For example:

  if("G".equals(customer.getLoyaltyLevel()) { // forgot .getCode()!
    sendOvernightMail()
  }

In other words, if specific LoyaltyLevels are to be used in the business logic, it's much preferable to keep them typesafe in an...

3. Enum

A LoyaltyLevel enum could be created in the application tier:

  public enum LoyaltyLevel {
    Gold("G", "Gold"),
    Silver("S", "Silver"),
    Bronze("B", "Bronze");

    ...
  }

...and a type handler construct in your ORM tool (like Hibernate's or IBatis, etc.) can be used to easily load this into the Customer object on retrieval from the database:

  public class Customer {
    private String name;
    private LoyaltyLevel level;
  }

Voila! This solves our problems, right? LoyaltyLevels can be retrieved independent of Customers, new attributes can be easily added without affecting the master data, and specific instances can be easily (and type-safely) referred to in business logic (e.g. "LoyaltyLevel.Gold == cust.getLoyaltyLevel()").

Not so fast! What happens when the business wants to add a "Diamond" level? Or wants to get rid of their "Bronze" level and replace it with an "Aluminum"? In either case, both the enum and the database need to change (since both redundantly define the Loyalty Levels), which obviously necessitates a code/configuration change (i.e. not something that could happen via some UI). In cases where the reference data is very static, this may be fine, but if Loyalty Levels are to be added with any fluidity, encapsulating the different values in Enums won't work.

Analysis and Recommendations

So I've presented 3 solutions, and as the table below shows, each solution has its drawbacks. Things are less clear now than before! What gives?

 

  Independent? +2 Attributes? Business Logic? Dynamic?
1. Flatten in Model Class Bad Bad Bad Good
2. Custom Class Good Good Bad Good
3. Enum Good Bad Good Bad

 

In the end, is there an good answer? Well, I've found the following two heuristics to be helpful:

1. If the reference data is static, has few attributes, and is used in business logic, use an Enum. The redundancy of reference data defined both in both an enum and a table in the database is relatively harmless (since the data is static!), and worth the cost for gaining the benefit of type safety, clarity, and ease of use (in my opinion).

2. In all other cases, use a Custom Class. In cases where the business logic seems to reference a specific value, rather than hard-code (e.g. "G".equals(code)), try to refactor this such that the data makes the decision rather than the code. For example, in the example of Loyalty Levels and shipping methods, a new column added to the Loyalty Level table...

id name ship_overnight
G Gold true
S Silver false
B Bronze false

...and this would effectively eliminate the dependency on a specific reference data value in the code and move it to the database.

3. If you absolutely need to hard-code the "code" values of the reference data for business logic, then do so with constants in one place and as close to the reference data class as possible (preferably in it). Do not sprinkle hard-coded values (e.g. "G", "S", etc.) around your business logic. (I know...duh!)

Anyway, I'd love to hear your thoughts. Am I missing something? How do you manage reference data in the application tier?

I'm an "old" programmer who has been blogging for almost 20 years now. In 2017, I started Highline Solutions, a consulting company that helps with software architecture and full-stack development. I have two degrees from Carnegie Mellon University, one practical (Information and Decision Systems) and one not so much (Philosophy - thesis here). Pittsburgh, PA is my home where I live with my wife and 3 energetic boys.
I recently released a web app called TechRez, a "better resume for tech". The idea is that instead of sending out the same-old static PDF resume that's jam packed with buzz words and spans multiple pages, you can create a TechRez, which is modern, visual, and interactive. Try it out for free!
Got a Comment?
Comments (6)
Ruudjah
November 14, 2016
Just an interface with concrete classes implementing them. You can extend by simply adding a class. ORM's support this easily, and you have the logic abstracted nicely. You delegate the responsibility to the correct class, no class with multiple responsibilities (like enum, or scattered business logic in the wrong classes). Furthermore, no switch statements needed (polymorphism, baby!). In your good/bad table, all signs are green.
Doug
November 14, 2016
How about code-genning the enum from the database as part of the build process?

Doug McClean
November 14, 2016
I know it's just an example, but in a case like that I would strongly consider adding a column to the Loyalty_Level table named Receives_Overnight_Mail.

Matt
November 14, 2016
In example 2, it looks like the main thing that a loyalty level does is decide the shipping type a certain order is sent by. So what I would do is something like make LoyaltyLevel abstract with a method that specifies the shipping type to use, or actually sends the shipment. Then Gold, Silver implementations of LoyaltyLevel have to specify what shipping type, so you get type-safe business logic. A LoyaltyLevel.Parse static method handles getting the right type from your database field or whatever. You can even make it's usage look like an enum (eg. LoyaltyLevel.Gold) by using private inner classes to implement and static getters (in .net at least)
Ben
November 14, 2016
Thanks for the comments!


@Ruudjah, Matt - good thought. Basically, the Strategy pattern (GoF), right? Essentially, if I understand, you'd have an LoyaltyLevel interface with a method "sendToCustomer" method, which is implemented appropriately depending on the level of the concrete sub-class (e.g. for "Gold", then send overnight). This could definitely work in some cases, though I'm not sure it's appropriate for all (for example, would you really want a separate class for all 50 states?).


@John - clever idea. Obviously you could never add Loyalty Levels on the fly (e.g. via a UI), but this is rarely necessary with reference data (IMHO). Also, source wouldn't necessarily compile initially (until the Enums were generated), but again, not a big deal. Thanks for sharing that.


@Doug - if I understand what you're saying, then I think that's something I described in the last section.
Manuel
November 14, 2016
I prupose a combination of class and enum, where the class encapsulates the enum. You can add attrbutes to the class and the business logic operates on the enum.