The Architect Owns the Bricks

If you’re on or have been on the technology leadership path, then you might remember that first time you were asked to design and lead a project the same way I remember it. With a whole lot of good intentions, I remember focusing on the “big picture” stuff, because leading a project makes you a “big picture” guy… measures of success, project decomposition, delivery sequence, milestones, estimates, and dependencies. Yep, along with some slick internal sales pitches for a budget and some extra padding for unknowns, the project should run smoothly as long as I do a good job on those things and keep focused on the goals. I mean, I took care of all of the boring stuff, so I can depend on my teammates to help deliver the fun parts and meet all of our goals, right?

Part of my giving the benefit of the doubt to everyone on the team, I think we all envision producing products that remind you of this when you look inside:

For some projects, yeah! This is the way it goes! You’ll have communicated all of the goals and intricacies of the project, and everyone will be on board. Some engineers on the team, who may have done some of that kind of work in their past, will appreciate having the bureaucratic parts of the project handled and will hone their focus on producing neatly organized bricks and walls. On the other hand, none of that planning inherently means you’re going to deliver a quality product, or adapt to changing requirements, or even build bricks that can sustain the weight of a wall. What I’ve experienced is that all of that planning often just gives you the opportunity to let your team start making bricks, and that a professional technical lead needs to be part of designing each brick and each wall.

Any given project, of any significant complexity and value, will end up having a few walls that look like this. Somewhere that the two big pieces of work just didn’t quite line up, and you had to strip out some glue code and replace it to get everything working in order to ship.

🚀

If you’re doing it right, you know where this is — probably in a file with one of the words like: utils, helper, extensions, or if you planned ahead: anti-corruption. And this is paired with some backlogged tasks to refactor, clean up, add test coverage, and eliminate the need for them. These sections of your building are never as strong or reliable, or even evaluated as much, as the well structured parts. No one wants to work in or even look at these too much, except maybe to laugh at the TODO: Fix this nasty hack comments, so you know your priority is to replace them in your next iteration. You’re ready with your repointing crew and you’ve already talked to your product owner about how important this is.

Admission of guilt: I am not a mason. As part of writing this, I learned that the term for replacing the mortar in a brick wall is called “repointing”.

If you’re doing it wrong, your building is littered with these sections, and it’s impossible to distinguish which sections are the throwaway glue and which sections are the pillars of your system. Maintaining any production system over the course of years is a balance between adding new wings to a building and maintaining the existing ones, but you’ve just put your team in the worst possible position: you don’t know what your maintenance schedule needs to be. Are you supposed to repoint your “utils.py” module every month? every year? when it starts leaking? Do you even have the confidence to ship a second version of your system without exposing new cracks?

One of the traps of having success in one project is to assume the same success will come in the next project. There is no tech lead that is inherently successful all of the time, just as there is no tech lead that is inherently destined to produce brownfield after brownfield. I’m reminded of a project from some time ago, following a promotion of mine, presumably a reward for my awesome effectiveness, that highlights this. After about a month of it being in production, the reports from clients were ghastly. Bug after bug, documentation so inaccurate it could be called criminal, even a small potential for one client to access another’s data. How did my team succeed in producing such an abhorrent pile of garbage right after keeping things so clean the last few times? There were a lot of indicators, if I had been paying attention:

  • A tech stack and tool chain no one on the team was familiar with
  • A basis for the project on a prior version that had anemic test coverage (eg: assert the response contains a particular field name, rather than the field’s value)
  • An overly prescriptive set of requirements — perhaps a “Solution Masquerading as Requirements”¹

But I wasn’t paying attention. As I prepared for the eventual retrospective, I found some annoying signs that I had failed the team. We translated the prescriptive requirements directly to tasks in our backlog. We did not do any training or spikes on the technologies being employed. And most telling, I did not contribute code reviews to any of the significant components of the project. We had poured time into estimation, sequencing, and delivery forecasting — and we did deliver on time! There was even ample time to have had done it right, with confidence and good tests in place. The time spent fixing the bugs in the code and the documentation maybe amounted to a couple of days of engineering effort, and was dwarfed by the time required to explain to clients and organize the releases to address the problems after the fact.

So the planning was accurate. At least we got that going for us.

As I looked at all of the failures along the way, it became apparent that we had executed the project with the hubris that comes from prior success and the blind hope that it would continue. Certainly each discipline on the team had some fault and some lesson to learn. I won’t admit to laziness, but I’ll admit to being distracted, and involved in other tasks while the project was underway. At the time, I knew enough about retrospectives not to try to lay blame or chalk it up to something as thin as “human error”.

There were obviously improvements to be made around the room, but a similarly thin conclusion would be suggesting I needed to enforce some “rigor” or “discipline” on the team. Not that these are bad qualities, but they effectively transfer the blame, and don’t align with any notable philosophy or principle relating to process improvement. Recently, I’ve revisited the “Theory of Constraints”² with a few colleagues of mine, I am reminded that the only improvements that actually matter are those that improve the system as a whole, and that local optimization is meaningless in achieving any worthwhile goal. Said differently, for the developers in me: it’s meaningless to refactor some code for a marginal improvement of in-memory operations on a collection if you’re still doing N+1 database queries as you iterate through it.

We’re often so concerned with the time required to build a solution that we ignore all of the other steps of the process and their dependent events when projecting a delivery timeline. We outline, whether implicitly or explicitly, an SDLC that includes requirements gathering and grooming, test automation, and code review. We, if we’re lucky (professional?), have documented our delivery and testing process as a continuous delivery value stream. But do we account for the capacity of each of the actors at each point?

If code review is an important step, then how do we ensure that those responsible for code review are not over capacity? Humans don’t break down like machines do — a machine with a capacity of a single bolt at a time simply won’t produce more, or will cease to operate if you try to make it produce two at a time. How do we even know if a code reviewer has demands that exceed their capacity? Too many LGTM floating through? And how do we prioritize their capacity? What is clear is that they certainly need to end up owning and living in the building that ends up being delivered, and that feedback cycle will help correct the mistakes made along the way.

[1] Löwy, Juval. Righting Software: A Method of System and Project Design. (December 2019) Addison-Wesley

[2] Goldratt, Eliyahu M., Cox, Jeff. The Goal. (1984) North River Press

Tech guy with a business degree, I’ve worked in software engineering, QA automation, and product management. I live and work in NYC.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store