We Programmers should be Engineers

Published in

The Startup

7 min readDec 8, 2019

Am I a developer or a programmer? Maybe a code ninja or a hacker? My LinkedIn profile boldly claims I am a “technologist” — which I thought was a pretty cool career title back when I made it, and probably didn’t automatically pigeonhole me into any specific kind of role. The more time I spend writing code and maintaining systems in my career, the more I am convinced that I should rely on principles and philosophies drawn from the classical roots of engineering disciplines.

A particularly pivotal manager and leader I had the opportunity to work with several years ago, someone who would know exactly who they are if they read this, tried to explain to me that the gravy train of software developers being inculpable for their messes would end one day. Today, the liability of writing unsafe or plainly broken code is often absorbed at the corporate level, and even when an internal root cause analysis is performed, I rarely see anyone directly involved in building the problem volunteering to explain their participation. Indeed, much of this makes a lot of sense. It takes more than one person to deliver a broken product, and it is often the result of a multitude of mistakes and lack of investment over years that produced an unsafe environment that made such a broken product be possible to begin with. Can you really hold a developer accountable for introducing a regression when unit test coverage is at 50%, integration tests against a deployed environment don’t exist, and there is no continuous delivery pipeline?

As a relatively immature developer at the time, the thought of personal accountability for my work was not something that I embraced. As a now slightly less immature software engineer, I see the landscape of minimal test coverage and slowly degrading system reliability as a reality that needs to be managed. I’ll go for the extreme example: much the same way that I imagine the engineers for New York’s subway system manage to still plan upgrades and maintenance work, I am still obligated to operate with expertise in a less-than-picturesque technology landscape. Doing so requires surfacing some harsh realities on the results of the business decisions made to people that may rather not hear about it, and doing that is often best delivered using some understandable metrics that help communicate the differences between good and poor systems. In software, there’s tons of metrics and patterns and philosophies that try to indicate if we’re being good stewards of our systems, or if we’re slowly threading the needle to create a nest of unintelligible dependencies and innumerable logical code paths.

The same manager I referenced before not only preached of impending doom to our playtime, but also provided a meaningful antidote against the plague of analysis paralysis that such a numerous range of measurements might induce. The proposal was that the single most important concept to grasp when building a system, service, sub-system, component, module, class, or function was the volatility of the problem that was being solved. This isn’t something that can be universally measured like centimeters or kilograms, but I promise two things: (1) we’ll get to measurable items in a moment, and (2) you can readily invent an arbitrary system to compare the risk and complexity of volatility (I’m looking at you, agile story points) so long as you understand the nature of the business you’re involved in. And this is where we engineers need to start being able to communicate these concepts with our business leaders to understand the likelihood of certain changes being introduced, and then we can run some mental models of what it would take to adapt to these changes.

For example, my team is using Amazon’s DynamoDB to cache some data that we fetch from a third-party managed service used during transaction processing. An aside: I still find DynamoDB’s API to be Byzantine, but I’ve actually enjoyed using it. Caching inherently introduces a CAP theorem trade-off, but we decided to accept the lack of consistency in this case. Why? Because after discussions with business and client team members, we understood that this data is generally only set during system configuration time, and does not change once transactions begin, and a simple 24-hour refresh cycle was deemed sufficient. Problem solved… until the first time we actually start setting up a client. We forgot that the people configuring and testing the systems are stakeholders too, and their cycle time is far faster than the real world they’re building for. They will reuse entities, change mappings, fix spelling mistakes, and cycle through a setup that might normally exist for months in the matter of a day. While these people are not the primary stakeholder of these features of our system, their overall approval of the configured environment is an absolute requirement before we can begin fulfilling real world business needs.

What does the blast radius in our system look like if we wanted to skip our caching altogether? What about only for some clients or some transactions based on request headers? The best case scenario is that we create a new implementation of our caching interface, and an IF statement or two in our factory / build function that supplies configurations to our business transaction components. The worst case scenario is we’ve riddled our transaction processing code with direct DynamoDB calls, and we need to introduce changes around every single one of them, each time risking the introduction of a bug to previously working business logic that is not readily testable, because cache hits and misses are almost always flow control indicators.

I’ve seen the term “blast radius” gaining some popularity, mostly around security and network architecture (eg. the one cloud sub-account per service model), and except for its inherent violent undertones, I like it. How much of your organization needs to react when an unplanned event is forced upon you? Well, I promised some measurable items, and I think there are two closely related concepts that will help: our good friends afferent and efferent coupling. For as much as a I like them, I always mix up their definitions:

afferent (adjective): conducting or conducted inward or toward something (using circulatory system’s heart as an example, the veins returning blood to the heart).

efferent (adjective): conducted or conducting outward or away from something (using circulatory system’s heart as an example, the arteries carrying blood away from the heart).

Afferent coupling can be measured by the number of other classes, functions, or external processes that are directly aware of a component. In other words, how many customers does this function have? More importantly, who are they, and do we have the ability to influence them? If they’re other code that is in my repository that I can readily refactor, then I can choose if and when they demand changes. If they’re exposed in an external web service API that all of my paying customers use, then I have no control when they choose to request changes. Conversely, your afferent couplings are also subject to any changes you introduce to the component. We can use API versioning to prevent the introduction of breaking interface changes (eg. new mandatory input) but that won’t protect customers from well-intentioned bug fixes gone bad or changes to implementations (eg. caching or no caching). Generally, an increase in afferent coupling will be followed by an increase in the overall number of required test cases.

Efferent coupling can be measured by the number of other classes, functions, or external processes that a component is aware of, and as you might imagine, we’re just playing the other side. The component is subject to any changes made by any of its dependencies, and needs to orchestrate changes requested or required to one dependency with how it interfaces with the others. We can use strategy patterns and dependency injection to simplify some of these needs, but regardless, once a single class’s dependencies grow beyond four or five, it becomes difficult to mentally rationalize its behavior. As the number of efferent couplings increase, the likelihood and frequency of inherited changes will increase. Generally, an increase in efferent coupling will be followed by an increase in the number of mocks and setup code required to properly unit test a component.

There are tools for many programming languages that will measure these things for you. There’s also a more insidious sub-type of each of these couplings that is not so easily measurable, which I will call implicit transitive coupling. I think this is likely less common in statically typed languages, but a good example would be the use and mutation of a dictionary of key-value pairs being created by one function, passed to a second for subsequent mutation, and then passed to a third for consumption. While not directly measurable, the first function’s dictionary instantiation will absolutely impact the third function’s ability to consume it, and likewise, the third function’s change in requirements may ripple up to the first function’s instantiation. Add the ability for the first function’s instantiation code to be used for passing to more than one direct afferent coupling, and you multiply the complexity to the point where you might create an instance of the irreducible number of errors phenomenon.

Much the same way that our beloved NYC Subway transit engineers from before would not design a system where they need to shut down the power to the entire line in order to replace a single section of third-rail, software developers are responsible for creating systems that can adapt to required changes and sustain maintenance. If we use technology-agnostic and measurable terms, like degrees of coupling, to describe risk, and to measure the reduction of risk during tech debt payoff periods, the chances of business leaders seeing us as professional engineers rather than ninja hackers will rise. Much the same way as there is no absolute rule of degrees of coupling that make a class or a function safe, there is no guarantee that our carefully measured risk analysis will garner attention, but we should be empowering this type of thought and analysis. It will encourage understanding of otherwise unknown risks, and over time will lead to more maintainable and adaptable systems. For those programmers turned engineers, you will be better prepared to personally accept accountability for the impact of the systems our society ends up relying on.

We Programmers should be Engineers

Written by David Jetter