"Don't reinvent the wheel", they always say. And at first blush, it all seems so obvious. If you spend the time to build something once, why not reuse it? There's no down side, right? Well, it's not that easy. As a grizzled "old" programmer, I've seen organizations fall victim to this spurious ideal over and over again, paying the dues of up-front design and development, but never reaching the promise land of massive ROI via reuse. In fact, in my opinion, our overly optimistic expectation for the benefits and ease of reuse is one of the most pervasive and pernicious traps in software development.
The root of the problem, I believe, is what Daniel Kahnaman calls What You See Is All There Is., which explains, in a nutshell, that we're hard-wired to want to make decisions quickly, using only the evidence we have at hand and some basic heuristics. Slowing our thinking down would take time and discipline, and so instead we try to substitute complex problems we don't fully understand for simple ones we do.
In the case of reuse, our intuition is simple and strong, represented by that hard-to-shake physical world analogy of software as a "wheel" not to reinvent. And it's this comfortable mental model that we often fall back on when making decisions about reuse. The problem is that this basic intuition about reuse is flawed, or at least woefully incomplete. Let's see how...
(A quick caveat: my argument here is around large-scale reuse, not at the method or function level. I'm completely down with DRY at the lower level of granularity. Also I'm thinking of reuse as the leveraging of some service/library/etc. that was built internally vs. acquired externally. I'm not recommending building your own JS MVC framework! Ok, now back to the originally scheduled post...)
Imagine there is a system, A, that contains some logic, C, within it. Soon, a new system, B, is to be built, and this system too needs the same basic logic, C.
Now it stands to reason that if we just split C out from A, it could then be used by B without the need to re-implement. The savings is therefore equal to the cost of one instance of C - i.e. it only had to be built once, and not again for B.
Further, as more systems are identified to have this same common code, the benefits of this extraction and reuse will be extended to those systems as well in a linear fashion, such that for every new system that reuses C instead of re-implements it, there is another savings equal to the implementation cost of C.
Again, the logic here seems simple and seemingly unassailable - why would any organization choose to build multiple instances of C rather than just build it once and then reap the benefits of reuse. The problem is that there's more to this picture, and what can look like slam-dunk ROI up front can turn into an costly straight-jacket down the road. Here are a few ways in which the basic intution of reuse breaks down...
The first problem is that of extraction. Our intuition is that C should snap apart from A like a piece of lego - nice and easy. The reality, however, is that disentangling common code can be a little like pulling a piece of spaghetti from your bowl of pasta, only to find that the entire dish is just one long noodle. Of course it's not usually that bad, but in code, there are lots of hidden dependencies and connections, and the initial conception as to the scope of C grows as you begin to unwind it. The effort is almost never as easy as you expect.
Moreover, almost always, C needs other things to do its job (e.g. other libraries, utility functions, etc.). In some cases these are shared dependencies (i.e. both A and C need them) and some cases not. Either way, the simple picture of A, B, and C can begin to look less so. For the sake of this example, let's assume both A, B, and C each use a common library, L.
Another problem is that of variation: different consumers of C will often have slightly different requirements for what it should do. For example, there may be some function in C that needs to behave slightly differently if A called it than if B did. A common solution for these cases is parameterization: the given function takes some parameter which lets it know how to behave given who called it. This can work, but it increases complexity of C, and the logic gets messy as well, as the code gets riddled with if blocks like "if called from A then do this block of logic".
Now even if C is a case of perfect reuse for A and B, this type of variation almost always is necessary as new systems, say D and E, come along. They may want to use some of C the way it is, but then need other parts to change in subtle or not-so-subtle ways. Again, each new accommodation that needs to be made within C represents extra complexity, and so what used to be something fairly easy to grok from the perspective of a developer using it, now becomes a lot trickier as C morphs into something that must satisfy the needs of D, E, F, and so on. Which leads to the next problem...
As this complexity increases, developers have a tougher time groking what C does, and how to use it. For example, a developer of A might not understand some parameter for a function of C, since it's only relevant for systems E and F. In most cases, some level of API documentation is necessary (maybe Swagger, Javadoc, or more) to explain the inputs, outputs, exceptional conditions, and other SLAs/expectations. And while documentation in general is a good thing, it's not without its problems (e.g. keeping it up to date, etc.).
Another implication of increased complexity is that it can be harder to maintain quality. C now serves many masters, and so there are a lot of edge cases to test. Further, since C is now used by many other systems, the impact of any given bug is amplified, as it can pop up in any or all of the consuming systems. Often, for any change to C, it's not enough to test just that shared component/service, but some level of regression testing should be done as well for A, B, D, and all the other systems that depend on it (whether the change to C is actually used by that system or not!).
Again, since we're talking about reuse at a non-trivial scale, it's probably the case that C will need to be developed by a separate team of developers, which can lead to a loss of autonomy. Separate teams usually have their own release schedules, and sometimes their own development processes. The obvious implication is that if A's team needs some enhancement in C, it probably needs to work it through C's process - i.e. a champion of A needs to provide requirements, advocate for its priority, and help with the testing. In other words, team A is no longer in control of its own destiny with respect to the functionality that C implements - it is dependent on the team that delivers C.
Lastly, when C is upgraded, by definition there are now different versions. Depending on the nature of the reuse, this can present different issues. In the case of build-time reuse (e.g. library), the different systems (A, B, etc.) could stick with their working version, and choose to upgrade when optimal. The downside here is that there are different versions of C in the wild, and it's not unlikely that one bug might lurk in each of the versions, and need to be patched in each. In the case of run-time reuse (e.g. microservice), C would either have to support multiple versions of its API in one instance or it could just upgrade without consideration of backward compatibility and thus force A and B to upgrade along with it. In either case, the robustness and rigorousness of the processes and organizations to support this reuse are significantly increased.
In the end, my point is not that large-scale reuse should be avoided, but rather that it's not as easy as our intuition suggests. Reuse is hard, and while it may still deliver benefits that outweigh its costs, those costs need to be realistically considered and discussed up front.
Even after careful consideration, if large scale reuse is the right course of action, there's still a decision as to the nature of reuse. My experience tells me to be careful of the dependency arrows. Reuse where the "reuser" is in control is almost always easier to implement and manage than reuse where the reusable asset calls the system. In the example above, having C be a library or microservice puts A and B in the driver seat, and this, in my opinion makes for quicker implementation and less management/coordination in the long run.
Flipping the dependency arrows, and making C a framework or platform, changes the onus of control. Now A and B are beholden to C. This type of reuse is not only more difficult to build (and get right), but results in greater lock-in down the road (i.e. A and B are totally dependent on C). A great adage is that "a library is a tool, a framework is a way of life".
In the end, I'd love to hear your feedback or experience with large-scale reuse. When does it work out and when does it fail?