The Reality of Reuse


(11 comments)
March 22nd 2018


"Don't reinvent the wheel", they always say. And at first blush, it all seems so obvious. If you spend the time to build something once, why not reuse it? There's no down side, right? Well, it's not that easy. As a grizzled "old" programmer, I've seen organizations fall victim to this spurious ideal over and over again, paying the dues of up-front design and development, but never reaching the promise land of massive ROI via reuse. In fact, in my opinion, our overly optimistic expectation for the benefits and ease of reuse is one of the most pervasive and pernicious traps in software development.

The root of the problem, I believe, is what Daniel Kahnaman calls What You See Is All There Is., which explains, in a nutshell, that we're hard-wired to want to make decisions quickly, using only the evidence we have at hand and some basic heuristics. Slowing our thinking down would take time and discipline, and so instead we try to substitute complex problems we don't fully understand for simple ones we do.

In the case of reuse, our intuition is simple and strong, represented by that hard-to-shake physical world analogy of software as a "wheel" not to reinvent. And it's this comfortable mental model that we often fall back on when making decisions about reuse. The problem is that this basic intuition about reuse is flawed, or at least woefully incomplete. Let's see how...

(A quick caveat: my argument here is around large-scale reuse, not at the method or function level. I'm completely down with DRY at the lower level of granularity. Also I'm thinking of reuse as the leveraging of some service/library/etc. that was built internally vs. acquired externally. I'm not recommending building your own JS MVC framework! Ok, now back to the originally scheduled post...)

The Intuition

Imagine there is a system, A, that contains some logic, C, within it. Soon, a new system, B, is to be built, and this system too needs the same basic logic, C.

Now it stands to reason that if we just split C out from A, it could then be used by B without the need to re-implement. The savings is therefore equal to the cost of one instance of C - i.e. it only had to be built once, and not again for B.

Further, as more systems are identified to have this same common code, the benefits of this extraction and reuse will be extended to those systems as well in a linear fashion, such that for every new system that reuses C instead of re-implements it, there is another savings equal to the implementation cost of C.

Again, the logic here seems simple and seemingly unassailable - why would any organization choose to build multiple instances of C rather than just build it once and then reap the benefits of reuse. The problem is that there's more to this picture, and what can look like slam-dunk ROI up front can turn into an costly straight-jacket down the road. Here are a few ways in which the basic intution of reuse breaks down...

The Reality

The first problem is that of extraction. Our intuition is that C should snap apart from A like a piece of lego - nice and easy. The reality, however, is that disentangling common code can be a little like pulling a piece of spaghetti from your bowl of pasta, only to find that the entire dish is just one long noodle. Of course it's not usually that bad, but in code, there are lots of hidden dependencies and connections, and the initial conception as to the scope of C grows as you begin to unwind it. The effort is almost never as easy as you expect.

Moreover, almost always, C needs other things to do its job (e.g. other libraries, utility functions, etc.). In some cases these are shared dependencies (i.e. both A and C need them) and some cases not. Either way, the simple picture of A, B, and C can begin to look less so. For the sake of this example, let's assume both A, B, and C each use a common library, L.

Another problem is that of variation: different consumers of C will often have slightly different requirements for what it should do. For example, there may be some function in C that needs to behave slightly differently if A called it than if B did. A common solution for these cases is parameterization: the given function takes some parameter which lets it know how to behave given who called it. This can work, but it increases complexity of C, and the logic gets messy as well, as the code gets riddled with if blocks like "if called from A then do this block of logic".

Now even if C is a case of perfect reuse for A and B, this type of variation almost always is necessary as new systems, say D and E, come along. They may want to use some of C the way it is, but then need other parts to change in subtle or not-so-subtle ways. Again, each new accommodation that needs to be made within C represents extra complexity, and so what used to be something fairly easy to grok from the perspective of a developer using it, now becomes a lot trickier as C morphs into something that must satisfy the needs of D, E, F, and so on. Which leads to the next problem...

As this complexity increases, developers have a tougher time groking what C does, and how to use it. For example, a developer of A might not understand some parameter for a function of C, since it's only relevant for systems E and F. In most cases, some level of API documentation is necessary (maybe Swagger, Javadoc, or more) to explain the inputs, outputs, exceptional conditions, and other SLAs/expectations. And while documentation in general is a good thing, it's not without its problems (e.g. keeping it up to date, etc.).

Another implication of increased complexity is that it can be harder to maintain quality. C now serves many masters, and so there are a lot of edge cases to test. Further, since C is now used by many other systems, the impact of any given bug is amplified, as it can pop up in any or all of the consuming systems. Often, for any change to C, it's not enough to test just that shared component/service, but some level of regression testing should be done as well for A, B, D, and all the other systems that depend on it (whether the change to C is actually used by that system or not!).

Again, since we're talking about reuse at a non-trivial scale, it's probably the case that C will need to be developed by a separate team of developers, which can lead to a loss of autonomy. Separate teams usually have their own release schedules, and sometimes their own development processes. The obvious implication is that if A's team needs some enhancement in C, it probably needs to work it through C's process - i.e. a champion of A needs to provide requirements, advocate for its priority, and help with the testing. In other words, team A is no longer in control of its own destiny with respect to the functionality that C implements - it is dependent on the team that delivers C.

Lastly, when C is upgraded, by definition there are now different versions. Depending on the nature of the reuse, this can present different issues. In the case of build-time reuse (e.g. library), the different systems (A, B, etc.) could stick with their working version, and choose to upgrade when optimal. The downside here is that there are different versions of C in the wild, and it's not unlikely that one bug might lurk in each of the versions, and need to be patched in each. In the case of run-time reuse (e.g. microservice), C would either have to support multiple versions of its API in one instance or it could just upgrade without consideration of backward compatibility and thus force A and B to upgrade along with it. In either case, the robustness and rigorousness of the processes and organizations to support this reuse are significantly increased.

Conclusion

In the end, my point is not that large-scale reuse should be avoided, but rather that it's not as easy as our intuition suggests. Reuse is hard, and while it may still deliver benefits that outweigh its costs, those costs need to be realistically considered and discussed up front.

Even after careful consideration, if large scale reuse is the right course of action, there's still a decision as to the nature of reuse. My experience tells me to be careful of the dependency arrows. Reuse where the "reuser" is in control is almost always easier to implement and manage than reuse where the reusable asset calls the system. In the example above, having C be a library or microservice puts A and B in the driver seat, and this, in my opinion makes for quicker implementation and less management/coordination in the long run.

Flipping the dependency arrows, and making C a framework or platform, changes the onus of control. Now A and B are beholden to C. This type of reuse is not only more difficult to build (and get right), but results in greater lock-in down the road (i.e. A and B are totally dependent on C). A great adage is that "a library is a tool, a framework is a way of life".

In the end, I'd love to hear your feedback or experience with large-scale reuse. When does it work out and when does it fail?

I'm an "old" programmer who has been blogging for almost 20 years now. In 2017, I started Highline Solutions, a consulting company that helps with software architecture and full-stack development. I have two degrees from Carnegie Mellon University, one practical (Information and Decision Systems) and one not so much (Philosophy - thesis here). Pittsburgh, PA is my home where I live with my wife and 3 energetic boys.
I recently released a web app called TechRez, a "better resume for tech". The idea is that instead of sending out the same-old static PDF resume that's jam packed with buzz words and spans multiple pages, you can create a TechRez, which is modern, visual, and interactive. Try it out for free!
Got a Comment?
Comments (11)
Michael Ciarlillo
March 22, 2018
One thing not even considered is the dependency issues that could arise. If, say, A and C had dependencies on L1 and B and C had dependencies on L2, then C would effectively force both A and C to have a dependency on L1 AND L2, whether or not they truly needed them. Just some additional food for thought here. Generally I'm a fan of reuse but just like every problem in programming, it all depends.
John
March 22, 2018
Lots of good stuff here as always.  I think the dependency direction part at the end is a great point.  C should be agnostic of its consumers (A and B) and resist tailoring its logic to specifically what they each need.  This helps avoid complex logic that is concerned with the context of its consumers.  To achieve the different behaviors, couldn't A and B alter C's behavior using decorator or inheritance (in the case of build-time reuse)?  Do you think the lego vs. spaghetti problem when you're first extracting C is a sign of poor separation and a lack of boundaries within A?
Ben Northrop
March 22, 2018
@Michael - Great point. In the pictures I have L and L' to subtly point this out, but didn't explicitly mention it. This dependency issue definitely crops up often (in the case of build-time/library reuse). Thanks for the comment!

@John - Yup...great insight. Putting thought into a reply...will comment after lunch. :)
Joe
March 22, 2018
I find the best re-usable code was written with intent to be re-used, not just because it "might" be re-used. For example if you are working on V1 of a project and you know there's going to be a V2 with additional features that are similar to what you're working on now, re-use could save you loads of time in the future in both implementation and testing.
Ben
March 22, 2018
@John - Great point about shooting to externalize the variation out of C and into the consumer. This is optimal, and quite possible in the case of build-time reuse, like you mention. Despite being possible, it can still be hard, and I've seen plenty of instances where time pressures (or whatever) lead to just hacking it out and putting "if" blocks all over the place in C to handle this variation. Ugly. And good point about designing for separation/cohesion making it easier to snap things apart later. Design/architecture is so much about managing dependencies, and when you do it right, this type of refactoring should be easier.

@Joe - Good point. Some true "agilists" might push back a little and call YAGNI...arguing that you shouldn't really design reuse into V1 if you don't need it until V2 (because the direction might change), but I completely agree that when you get it right, it saves a lot of implementation and testing.
Flo
March 23, 2018
Nice article. Whole-heartedly agree. Just wanted to point out a typo here that made me trip: In the second paragraph of "The Intuition" it should read "it could then be used by B" instead of "C".
Ben
March 27, 2018
@Flo - thanks!
Nickolai Belakovski
May 03, 2018
As you describe C growing in scope due to potential additions of D and E, it seems to me that further refactoring of C would be necessary at that point, to avoid the issues you're talking about.

The most cogent thoughts that I've seen on how to structure a large project have been from John Lakos and his book "Large Scale c++ Software Design". One thing he talks about is organizing software into levels, and being very strict about prohibiting circular dependencies. Levels here means that you'll have some modules at the lowest level of your project that contain print functions and then at a higher level than that (higher levels depend on lower levels, but strictly not the other way around) you might have logging functions. Next level up might be something like database accessors, and then above all of that would be your application logic.

Of course, you probably won't have print/logging/database functions within your own project, you will probably be using existing solutions for those things, but I bring them up just to show an example of how to structure things. Keep the simple, most re-usable stuff at the lowest level of the stack. And if something needs to be re-used by a lower level module, take that code out and put it into an even lower level module.

Hope that's a useful comment. Lakos' book is getting dated, but I think it still has useful information and is an overlooked book within the space.
Jho
September 16, 2019
Wow very impressive article. I had not even thought of half of these consequences from code reuse. Reuse is taught as a tautology in Colleges. I think bearing in mind that if Reuse goes past the scope and purpose of the original use case it should be reexamined. Having 10 applications depend on C seems like a bad idea depending on the context. Having the original 2 seems ok. If we are talking about a webservice that is being hit though from web requests then the dependency may make more sense than a library or some code that is more tightly coupled to the application. So the purpose of the code being reused and how tightly coupled it will become to the applications seems important. Sharing overall seems like it needs to be thought out from the start and then that usecase frozen. I am not an architect though, I could be persuaded from seeing some large scale use cases that seem to work with sharing C all over.
technocrat
January 15, 2020
I agree that there are practical issues (many of which have been solved) for reuse in an enterprise, however in theory a design that can consolidate reusable pieces and snap them like lego will go a long way in saving cost and reducing development time. In many cases isolating the “C” reusable model as depicted in the diagram can be equivalent to pulling teeth, But the advantages of that modularity will benefit not just “A” and “B” but any other 2000 components.
An example would be: https://www.npmjs.com/package/core-js

Now imagine distributing the already tiny module of corejs into multiple split repo in the 16913 dependencies. The comment by Nickolai Belakovski is apt in that he has explored the modern way of dealing with dependencies and organizing projects in clear levels.

There needs no better metric than the fact that every language has come up with their own library store to solve this problem of dependency resolution in particular and reuse in general.
Ben
January 22, 2020
Thanks @technocrat. Point taken. Reuse is a bedrock of software development, and we all benefit from it daily. No one manually implements sorts or open sockets, instead we use libraries. And maybe the fact that we're all aware of the ubiquity of reuse that we (sometimes) forget how hard it is to do.

That was my main point here, that we should think carefully before we attempt to achieve reuse in the large. In particular, at the time I wrote this I was in an organization where managers and many "astronaut architects" were imposing very unrealistic expectations about reuse on their teams - they wanted complex things to snap together like lego, and didn't understand why it wasn't working. Anyway, thanks for the thoughts.