Service Oriented Architectures (or, microservices, as they're called now) add a new twist to an old problem. Imagine this simple scenario:
You are an architect of an online retailer. Your business owners have laid down the gauntlet. Well, not really…they’ve just added a new requirement: they’d like to see a list of all orders for customers matching some specific criteria (e.g. city, name, registration date, etc.). For example, show me all orders for customers named ‘Holmes’ from ‘Pittsburgh, PA’. Further, this isn’t for a static report, but rather a dynamic screen in their admin UI.
Sounds simple, right? In the old days of integrated applications, it would be – just join the customer and order tables and apply said criteria. Voila!
Ahh…but it’s a brave new world of SOA, and some old tricks no longer apply. In your architecture, instead of one integrated application, you have two loosely coupled services, Customer and Order. Both services are deployed independently and have their own data – i.e. the Customer service has a Customer data store and the Order service an Order store.
The question stands then – how can you answer a business question about customers and orders, given that this data spans across multiple services?
Here are four solutions I’ve come across:
1. Retrieve from Service: A pure SOA architecture may dictate that access to another service’s data necessitates a call through that service’s exposed interface – and this is where you come face-to-face with the n + 1 selects problem. In this architecture, your Customer Service would first have to query the customer data store for the list of customers matching the criteria. It would then, for each customer, call the Order service to return all orders associated with that customer. Therefore, for n customers matching the criteria, n + 1 queries would need to be executed (on top of n remote calls to the Order service!). The performance costs here are significant, and I imagine unsatisfactory in most cases.
2. Retrieve Directly: Another option would be to open read-only access to the Orders data store from the Customer service. This would improve performance, by saving the remote calls to the Orders service (and perhaps the associated data binding), but still wouldn’t circumnavigate the “n + 1 selects” problem, since without a database join from customers to orders, a query of orders for each customer is still necessary.
3. Cache Locally: A third route would be to cache another service’s data locally (perhaps through some pub-sub, Master Data Management solution). In the scenario above, some customer specific data could be cached in the Order store, such that finding all orders for customers living in ‘Pittsburgh’ could be handled by the Orders service alone. Alternatively, there could be a special search service which caches both customer and order data locally, and would be used exclusively to handle queries across data stores.
4. Retrieve from Central Database: A fourth solution, not depicted on the diagram, would be to have only one master database from which both services may read and write. Doing so would improve performance and perhaps reduce complexity, but much of the loose-coupling, separation of concerns benefits of your SOA would not be realized – for example, a change to the orders table could have the effect of breaking the Customer service.
You may argue that this is a straw-man example – in the real world, these services wouldn’t exist (e.g. wrong service granularity, etc.). Possibly true. However, any SOA will have multiple services, and some of these services will need access to data that spans other services, so in some form, the problem will still remain.
Anyway, thanks for reading, and I’d be very interested to hear your thoughts or solutions!