> you lay out a huge specification that would fully work through all of the complexity in advance, then build it.
This has never happened and never will. You simply are not omniscient. Even if you're smart enough to figure everything out the requirements will change underneath you.
But I do still think there's a lot of value into coming up with a good plan before jumping in. A lot of software people like to jump in and I see them portray the planning people as trying to figure everything out first. (I wonder if we reinforce the jumping in head first mentality because people figure out you can't plan everything) A good plan helps you prevent changing specs and prepares you for hiccups. It helps by having others but basically all you do is try to think of all the things that could go wrong. Write them down. Triage. If needed, elevate questions to the decision makers. Try a few small scale tests. Then build out. But building out you're always going to find things you didn't see. You can't plan forever because you'll never solve the unknown unknowns until you build, but also good prep makes for smoother processes. It's the reason engineers do the math before they build a bridge. Not because the math is a perfect representation and things won't change (despite common belief, it's not static) but because the plan is cheaper than the build and having a plan allows you to better track changes and helps you determine how off the rails you've gone.
It is also perplexing to me that people think they can just plan everything out and give it to a LLMs. Do you really believe your manager knows everything that needs to be done when they assign jobs to you? Of course not, they couldn't. Half the job is figuring out what the actual requirements are.
>> you lay out a huge specification that would fully work through all of the complexity in advance, then build it.
> This has never happened and never will. You simply are not omniscient. Even if you're smart enough to figure everything out the requirements will change underneath you.
I am one of those "battle-scarred twenty-year+ vets" mentioned in the article, currently working on a large project for a multinational company that requires everything to be specified up-front, planned on JIRA, estimates provided and Gantt charts setup before they even sign the contract for the next milestone.
I've worked on this project for 18 months, and I can count on zero hands the times a milestone hasn't gone off the rails due to unforeseen problems, last-minute changes and incomplete specifications. It has been an growing headache for the engineers that have to deliver within these rigid structures, and it's now got to the point that management itself has noticed and is trying to convince the big bosses we need a more agile and iterative approach.
Anyone who claims upfront specs are the solution to all the complexity of software either has no real world experience, or is so far removed from actual engineering they just don't know what they're talking about.
Working on a project for 18 months doesn't give you enough insight into it to know what is good or now about it. You need several more year before you can usefully figure out what changes will help get you to make milestones. (other than trivially obvious things, which might be the low handing fruit - but sometimes they are the better way to do things but the real problem makes that stand out instead).
Nothing will get you to hit every milestone. However you can make progress if you have years of experience in that project and the company is willing to invest in the needed time to make things better (they rarely are)
> A lot of software people like to jump in and I see them portray the planning people as trying to figure everything out first.
My approach, especially for a project with a lot of unknowns, is usually to jump in right away and try to build a prototype. Then iterate a few times. If it's a small enough thing, a few iterations is enough to have a good result.
If it's something bigger, this is the point where it's worth doing some planning, as many of the problems have already been surfaced, and the problem is much better understood.
I've seen some issue with this approach is that management will want to sell the prototype, bypassing the "rewrite from the lesson learned" step, and then every shortcut took into the prototype will bite you, a lot..
And things like "race conditions"/lack of scalability due to improper threading architecture aren't especially easy to fix(!)..
The Anna Karenina principle looms large in software engineer projects. Basically there are an infinite failure modes that can occur due to small actions or wrong thinking by one or more influential people, but there is only one way to make large projects successful. Basically the team has to have sufficient expertise to cover the surface area, and those individuals need enough trust from leadership to navigate the million known and unknown pitfalls that await.
Sometimes you don't know what needs to be built until you build it. These end-to-end prototypes are how to enhance your understanding and develop deeper intuition about possibilities, where risks lie, etc.
Exactly. On a yuuge project, I first identify the Risks. Then, evaluate the risks -- can <ZZ> actually be done? In XX time? For YY dollars? with acceptable bugs on 1st version?
It's sort of the old General Eisenhower quote: "In preparing for battle I have always found that plans are useless, but planning is indispensable."
I’m the same. Often the first step is a time-boxed exploration, just trying to make the key pieces work in any way to encounter major blockers as early as possible. No planning, no design, not following any best practices, often all in a single file. Then from there, either refactor/rewrite or just use it as input for planning.
Of course, it requires some discipline to not just yolo the prototype into production when that’s not appropriate.
I'd like you to go look at PRINCE2 and SSADM. Or read the original Royce paper - https://www.praxisframework.org/files/royce1970.pdf was written explicitly to term this Antipattern "Waterfall." (Note that Royce marks it as an antipattern.)
We are nearly 70 years into this discussion at this point. I'm sure Grace Hopper and John Mauchly were having discussions about this around UNIVAC programs.
The book "How Big Things Get Done" by Bent Flyvbjerg nicely answers all the concerns mentioned in this thread. I'll answer here to avoid littering replies everywhere.
> But I do still think there's a lot of value into coming up with a good plan before jumping in.
Definitely, with emphasis on a _good_ plan. Most "plans" are bad and don't deserve that name.
> be specified up-front, planned on JIRA
Making a plan up-front is a good approach. A specification should be part of that plan. One should be ready to adapt it when needed during execution, but one should also strive to make the spec good enough to avoid changing.
HOWEVER, the "up-front specification" you mentioned was likely written _before_ making a plan, which is a bad approach. It was probably written as part of something that was called "planning" and has nothing to do with actual planning. In that case, the spec is pure fiction.
> estimates provided
Unless this project is exceptional, the estimates are probably fiction too.
> and Gantt charts setup
Gantt charts are a model, not a plan. Modeling is good; it gives you insight into the project. But a model should not be confused with a plan. It is just one tiny fragment you need to build a plan, and Gantt charts are just one of many many many types of models needed to build a plan.
> before they even sign the contract for the next milestone
That's a good thing. Signing a contract is an irreversible decision. The only contract that should be signed before planning is done is the contract that employs the planners.
> Anyone who claims upfront specs are the solution
See bove. A rigid upfront spec is usually not a plan, but pure fiction.
> My approach, especially for a project with a lot of unknowns, is usually to jump in right away and try to build a prototype.
Whether this is called planning or "jumping in" is a difference in terminology, not in the approach. The relevant clue is that you are experimenting with the problem to understand it, but you are NOT making irreversible decisions. By the terminology used in that book, you are _planning_, not _executing_.
> after the 2000 pages specification document was written, and passed down from the architects to the devs
If the 2000 page spec has never been passed to the devs while writing it, it's not part of a plan, it's pure fiction. Trying to develop software against that spec is part of planning.
Yes it did, however it never works in pratice when it comes to integration testing two years later after the 2000 pages specification document was written, and passed down from the architects to the devs.
2000 page specification documents are rarely useful (if ever?).
You need smaller documents - this is the core technology we are using. This is how one subsystem is designed - often this should be on a whiteboard because once you get into the implementation details you need to change the plan, but the planning was useful. This is how to use core parts of the system so new comers can start working quick.
You need disciple to accept that sometimes libfoo is the best way to solve a problem in isolation, but since libbar is used elsewhere and can solve the problem your local problem will use libbar despite making your local problem uglier. Have a small set of core technologies that everyone knows and uses is sometimes more valuable than using the best tool for the job - but only sometimes.
> This has never happened and never will. You simply are not omniscient. Even if you're smart enough to figure everything out the requirements will change underneath you.
My best project to date was a largely waterfall one - there was somewhere around 50-60 pages of A4 specs, a lot of which I helped the clients engineer. As with all plans, a lot of it changed during implementation, actually I figured out a way of implementing the same functionality, but automating it to a degree where about 15 of those could be cut out.
Furthermore, it was immensely useful because by the time I actually started writing code, most of the questions that needed answers and would alter how it should be developed had already come up and could be resolved, in addition to me already knowing about some edge cases (at least when it came to how the domain translates into technology) and how the overall thing should work and look.
Contrast that to some cases where you're just asked to join a project and help out and you jump into the middle of ongoing development, not going that much about any given system or the various things that the team has been focusing on in the past few weeks or months.
> It’s not hard to see that if they had a few really big systems, then a great number of their problems would disappear. The inconsistencies between data, security, operations, quality, and access were huge across all of those disconnected projects. Some systems were up-to-date, some were ancient. Some worked well, some were barely functional. With way fewer systems, a lot of these self-inflicted problems would just go away.
> Barbara has multiple "rings", or namespaces, but the default ring is more or less a single, global, object database for the entire bank. From the default ring you can pull out trade data, instrument data (as above), market data and so on. A huge fraction, the majority, of data used day-to-day comes out of Barbara.
> Applications also commonly store their internal state in Barbara - writing dataclasses straight in and out with only very simple locking and transactions (if any). There is no filesystem available to Minerva scripts and the little bits of data that scripts pick up has to be put into Barbara.
I know that we might normally think that fewer systems might mean something along the lines of fewer microservices and more monoliths, but it was so very interesting to read about a case of it being taken to the max - "Oh yeah, this system is our distributed database, file storage, source code manager, CI/CD environment, as well as web server. Oh, and there's also a proprietary IDE."
But no matter the project or system, I think being able to fit all of it in your head (at least on a conceptual level) is immensely helpful, the same way how having a more complete plan ahead of time can be helpful with a wide variety of assumptions vs "we'll decide in the next sprint".
Indeed. And writing out a design is actually a good method for thinking through the design. It helps uncover assumptions, including those that are flawed. It allows you to weigh various design options explicitly. It provides a place for identifying and resolving ambiguity and lack of clarity in the requirements. Contracts can be distilled in the process. Such design docs can also focus and direct implementation; you have a clearer picture of the parts and contours of your system. In a way, it is like programming, but at a conceptually higher, architectural level, where you work through and chew on the thing to flesh out and validate it in the very act of specifying.
And by doing this sort of exercise, you can avoid wasting time on dead ends, bad design, and directionless implementation. It's okay if requirements change or you discover something later on that requires rethinking. The point is to make your thinking more robust. You can always amend a design document and fill in relevant details later.
Furthermore, a mature design begins with the assumption that requirements (whether actual requirements or knowledge of them) may change. That will inform a design where you don't paint yourself into a corner, that is flexible enough to be adapted (naturally, if requirements change too dramatically, then we're not really talking about adaptation of a product, but a whole new product).
How much upfront design work you should do will depend on the project, of course. So there's a middle way between the caricature of waterfall and the caricature of agile.
“A complex system that works is invariably found to have evolved from a simple system that worked. The inverse proposition also appears to be true: A complex system designed from scratch never works and cannot be made to work. You have to start over, beginning with a working simple system.”
Gall’s Law
Some systems require a total commitment to the complexity because it is intrinsic. There is no "simple" form that also works, even if poorly. In many contexts, "systems thinking" is explicitly about the design of systems that are not reducible to simpler subsystems, which does come up in many types of engineering. Sometimes you have to eat the whole elephant.
There is a related phenomenon in some types of software where the cost of building an operational prototype asymptotically converges on the cost of just writing the production code. (This is always a fun one to explain to management that think building a prototype massively reduces delivery risk.)
This is the point we are at now with wide-scale societal technologies; combining the need for network effects with the product being the prototype, and no option but to work on the system live.
Some projects have been forced so far, by diverting resources (either public-funded or not-yet-profitable VC money), but these efforts have not proven to be self-sustaining. Humans will be perpetually stuck where we are as a species if we cannot integrate the currently opposing ideas of up-front planning vs. move fast and break things.
Society is slowly realizing the step-change in difficulty between projects in controlled conditions that can have simplified models to these irreducibly complex systems. Western doctors are facing an interesting parallel, now becoming more aware to treat human beings in the same way--that we emerge as a result of parts which can be simplified and understood, but could never describe the overall system behavior. We are good examples of the intrinsic fault-tolerance required for such systems to remain stable.
I think the important part here is "from scratch". Typically when you're designing a new (second, third, whatever) system to replace the old one you actually take the good and the bad parts of the previous design into account, so it's no longer from scratch. That's what allows it to succeed (at least in my experience it usually did).
These days software has been done a lot. You should be able to find others who have done similar things and learn lessons from them. Considering microservices - there are lots of people who have done them and can tell you what worked well and what didn't. Considering using QT - lots of others have and can give you ideas. Considering writing your own framework - there are lots of others: look at what they do good and bad.
If you are doing a CRUD web app for a local small business - there are thousands of examples. If you are writing control software for a space station - you may not have access to code from NASA/Russia/China but you can at least look at generic software that does the things you need and learn some lessons.
This is often quoted, but I wonder whether it's actually strictly true, at least if you keep to a reasonable definition of "works". It's certainly not true in mechanical engineering.
The definition of a complex system is the qualifier for the quote. Many systems that are designed, implemented and found working are not complex systems. They may be complicated systems. To paraphrase Dr. Richard I. Cook’s ”How Complex Systems Fail” where he claims that complex systems are inherently hazardous, operate near the edge of failure and cannot be understood by analyzing individual components. These systems are not just complicated (like a machine with fixed parts) but dynamic, constantly evolving, and prone to multiple, coincidental failures.
A system of services that interact, where many of them are depending on each other in informal ways may be a complex system. Especially if humans are also involved.
Such a system is not something you design. You just happen to find yourself in it. Like the road to hell, the road to a complex system is paved with good intentions.
Then what precisely is the definition of complex? If "complex" just means "not designed", then the original quote that complex systems can't be designed is true but circular.
If the definition of "complex" is instead something more like "a system of services that interact", "prone to multiple, coincidental failures", then I don't think it's impossible to design them. It's just very hard. Manufacturing lines would be examples, they are certainly designed.
The manufacturing lines would be designed, and they'd be designed in an attempt to affect the "design" of the ultimate resulting supply chain they're a part of. But the relationship between the design of some lines and the behavior of the larger supply chain is non-linear, hard to predict, and ultimately undesigned, and therefore complex.
The design of the manufacturing lines and the resulting supply chain are not independent of each other -- you can trace features from one to the other -- but you cannot take apart the supply chain and analyze the designs of its constituent manufacturing lines and actually predict the behavior of the larger system.
AFAIK there's not a great definition of a complex system, just a set of traits that tend to indicate you're looking at one. Non-linearity, feedbacks, lack of predictability, resistance to analysis (the "you can't take it apart to reason about the whole" characteristic mentioned above"). All of these traits are also kind of the same things... they tend to come bundled with one another.
Consider systems that require continuous active stabilization to not fail because the system has no naturally stable equilibrium state even in theory. Some of our most sophisticated engineering systems have this property e.g. the flight control systems that allow a B-2 bomber to fly. In a software context you see these kinds of design problems in large-scale data infrastructure systems.
The set of system designs that exhibit naturally stable behavior doesn't overlap much with the set of system designs that deliver maximum performance and efficiency. The capability gap between the two can be large but most people choose easy/simple.
There is an enormous amount of low-hanging opportunity here but most people, including engineers, struggle with systems thinking.
IMHO, the key is where you add complexity. In software you have different abstraction layers. If you make a layer too fat, it becomes unwieldly. A simple system evolves well if you're adding the complexity in the right layer, avoiding making a layer responsible for task outside its scope. It still "works" if you don't, but it's increasingly difficult to maintain it.
The law is maybe a little too simplistic in its formulation, but it's fundamentally true.
You built this gear using the knowledge from your last gear. You didn't start with no knowledge, read a manual on operating a lathe, grab a hunk of metal and make a perfect gear the first time.
That's what happened though? First humans built sheds, then we built 2-story buildings, then taller and taller, until we built skyscrapers. Obviously it wasn't a single structure, but we did have to evolve our thinking on how to build things, we didn't just start building a skyscraper before we built a shed.
You can't do that. A small bike shed is often just put some concrete blocks on the ground, and then build on top of them with wood. A correct house needs a stronger foundation at higher costs (sheds larger than bike shed are build the same way), but is still made of wood. A skyscraper is built with a very different foundation, and needs a steel frame that would not be affordable in a house. In between the two there are also building made of brick which allows building taller than wood. (and there are lots of other options with different costs - engineered wood is different)
Point is though eventually some system runs out of ability. It works different in programming from physical construction, but the concept is the same, eventually you can't make a bad early design work anymore.
But you didn't upgrade the shed into a skyscraper. The iterative process you describe involves a human respecifying from scratch using the knowledge developed building the previous instance and seeing it's limitations first hand. That part can't be automated, no LLM is going to challenge your design assumptions by itself. Hence people pushing agent-built projects way past what their inherent architecture should support, delivering an unmaintainable code spaghetti.
to put it in another way than the other replies: you will have 100x more pushback to an arguably-necessary ground-up rewrite instead of "just add this new feature to the existing codebase", even when you (as an engineer) know full well why "just adding a feature" is probably a bad idea.
That's exactly why software is so bad. No one ever knows their shed would ultimately have to become a skyscraper, and management doesn't allocate any budget to lay stronger foundations when expectations change; you make do with what you have.
See also: "there is nothing more permanent than a temporary solution"
It’s actually the opposite - you actually can. The feel I'm getting reading anti-AI sentiment is people are expect one shot results out of limited context.
I'm pretty sure that you can't gradually upgrade a shed into a skyscraper unless you pour a skyscraper-ready foundation before even starting on the shed. But if you're doing that, why start with a shed and not with a skyscraper?
Not sure why you're trying to bring AI development into this.
You can, start by clearing and grading the site - get a shed up over your head. Then you can start then start the skyscraper next to it and work out of the shed.
But this is about the first systems? I tend to tell people, the fourth try usually sticks.
The first is too ambitious and ends in an unmaintainable pile around a good core idea.
The second tries to "get everything right" and suffers second system syndrome.
The third gets it right but now for a bunch of central business needs. You learned after all. It is good exactly because it does not try to get _everything_ right like the second did.
The fourth patches up some more features to scoop up B and C prios and calls it a day.
Sometimes, often in BigCorp:
Creators move on and it will slowly deteriorate from being maintenaned...
> There should be some balanced path in the middle somewhere, but I haven’t stumbled across a formal version of it after all these decades.
That's very simple. The balanced path depends directly on how much of the requirements and assumptions are going to change during the life time of the thing you are building.
Engineering is helpful only to the extent you can forsee the future changes. Anything beyond that requires evolution.
You are able to comment on the complexity of that large company only because you are standing in the future into 50 years from when those things started take shape. If you were designing it 50 years back, you would end up with same complexity.
The nature's answer to it is, consolidate and compact. Everything that falls onto earth gets compacted into a solid rock over time, by a huge pressure of weight. All complexity and features are flattened out. Companies undergo similar dynamics driven by pressures over time, not by big-bang engineering design upfront.
1. This sounds great in theory. In theory there is no difference between theory and practice, but in practice there is.
2. I would be more receptive to this argument if they had listed some famous examples of successful, large systems that were built like this. On the other hand, I can easily list many failures: FAA Advanced Automation System (1980s), IRS Tax Systems Modernization (1990s), UK NHS National Programme for IT (2000s).
3. Waterfall vs. agile is a continuum. Nobody plans everything, down to each if-statement, and nobody wings it without some kind of planned architecture (even if just inside one person's head). Where you are on the continuum depends on the nature of the problem (are all requirements known?), the nature of the team (have they done this before?), and the criteria for success (are there lives depending on this?).
4. The analogy to building a building is flawed. At large enough scale, software is like a city, and all successful cities have gradually evolved in complexity. Come back to me when someone builds a 1-million person arcology on some island in the Pacific.
5. Just as some PhDs are sensitive about being called "Doctor", some software engineers are sensitive about being "real engineers". Stop thinking about that. What we do as software engineers is immensely valuable and literally changing the world (usually, but not always, for the better). Let's stop worrying about whether or not what we do is "engineering" and focus on what we do best: building complex systems that have never before existed on earth.
Lots of wisdom in this post about some of the realities of software development.
The core point they're trying to make is that agile (or similar) practices are the incorrect way to approach consolidation of smaller systems into bigger ones when the overall system already works and is very large.
I agree with their assertion that being forced to address difficult problems earlier on in the process results in ultimately better outcomes, but I think it ignores the reality that properly planning a re-write of monumentally sized and already in use system is practically impossible.
It takes a long time (years?) to understand and plan all the essential details, but in the interim the systems you're wanting to rewrite are evolving and some parts of the plan you thought you had completed are no longer correct. In essence, the goal posts keep shifting.
In this light, strangler fig pattern is probably the pragmatic approach for many of these re-writes. It's impossible to understand everything up front, so understand what you reasonably can for now, act on that, deliver something that works and adds value, then rinse and repeat. The problem is that for sufficiently large system, this will take decades and few software architects stick around at a single company long enough to see it through.
A final remark I want to make is that, after only a few years of being a full-time software developer, "writing code" is one of the easiest parts of the job. The hard part is knowing what code needs to be written, this requires skills in effective communication with various people, including other software developers and (probably more importantly) non-technical people who understand how the business processes actually need to work. If you want to be a great software developer, learn how to be good at this.
> There are two main schools of thought in software development about how to build really big, complicated stuff.
> The most prevalent one, these days, is that you gradually evolve the complexity over time. You start small and keep adding to it.
> The other school is that you lay out a huge specification that would fully work through all of the complexity in advance, then build it.
I think AI will drive an interesting shift in how people build software. We'll see a move toward creating and iterating on specifications rather than implementations themselves.
In a sense, a specification is the most compact definition of your software possible. The knowledge density per "line" is much higher than in any programming language. This makes specifications easier to read, reason about, and iterate on—whether with AI or with peers.
I can imagine open source projects that will revolve entirely around specifications, not implementations. These specs could be discussed, with people contributing thoughts instead of pull requests. The more articulated the idea, the higher its chance of being "merged" into the working specification. For maintainers, reviewing "idea merge requests" and discussing them with AI assistants before updating the spec would be easier than reviewing code.
Specifications could be versioned just like software implementations, with running versions and stable releases. They could include addendums listing platform-specific caveats or library recommendations. With a good spec, developers could build their own tools in any language. One would be able to get a new version of the spec, diff it with the current one and ask AI to implement the difference or discuss what is needed for you personally and what is not. Similarly, It would be easier to "patch" the specification with your own requirements than to modify ready-made software.
Iceberg is, primarily, a spec [0]. It defines exactly what data is stored and how it is interacted with. The community debates broadly on spec changes first, see a recent one on cross-platform SQL UDFs [1].
We have yet to see a largely llm driven language implementation, but it is surely possible. I imagine it would be easier to tell the llm to instead translate the Java implementation to whatever language you need. A vibe-coded language could do major damage to a companies data.
If I had a spec for something non-trivial, I probably would ask AI to create a test suite first. Or port tests from an existing system since each test is typically orders of magnitude easier to rewrite in any language, and then run AI in a loop until the tests pass.
> I can imagine open source projects that will revolve entirely around specifications
This is a really good observation and I predict you will be correct.
There is a consequence of this for SaaS. You can imagine an example SaaS that one might need to vibecode to save money. The reason its not possible now is not because Claude can't do it, its because getting the right specs (like you suggested) is hard work. A well written spec will not only contain the best practices for that domain of software but also all the legal compliance BS that comes along with it.
With a proper specification that is also modular, I imagine we will be able to see more vibecoded SaaS.
Interested in ideas for this. I've mulled over different compact DSLs for specs, but unstructured (beyond file-specific ownership boundaries) has served me better.
There are parallels of thought here to template and macro libraries.
One issue is that a spec without a working reference implementation is essentially the same as a pull request that's never been successfully compiled. Generalization is good but you can't get away from actually doing the thing at the end of the day.
I've run into this issue with C++ templates before. Throw a type at a template that it hasn't previously been tested with and it can fall apart in new and exciting ways.
> The WHATWG was based on several core principles, (..) and that specifications need to be detailed enough that implementations can achieve complete interoperability without reverse-engineering each other.
But in my experience you need more than a spec, because an implementation is not just something that implements a spec, it is also the result of making many architectural choices in how the spec is implemented.
Also even with detailed specs AI still needs additional guidance. For example couple of weeks ago Cursor unleashed thousands of agents with access to web standards and the shared WPT test suite: the result was total nonsense.
So the future might rather be like a Russian doll of specs: start with a high-level system description, and then support it with finer-grained specs of parts of the system. This could go down all the way to the code itself: existing architectural patterns provide a spec for how to code a feature that is just a variation of such a pattern. Then whenever your system needs to do something new, you have to provide the code patterns for it. The AI is then relegated to its strength: applying existing patterns.
TLA+ has a concept of refinement, which is kind of what I described above as Russian dolls but only applied to TLA+ specs.
Here is a quote that describes the idea:
There is no fundamental distinction between specifications and implementations. We simply have specifications, some of which implement other specifications. A Java program can be viewed as a specification of a JVM (Java Virtual Machine) program, which can be viewed as a specification of an assembly language program, which can be viewed as a specification of an execution of the computer's machine instructions, which can be viewed as a specification of an execution of its register-transfer level design, and so on.
I disagree with most of this article, but this part stood out:
> the size of the iterations matters, a whole lot. If they are tiny, it is because you are blindly stumbling forward. If you are not blindly stumbling forward, they should be longer, as it is more effective.
You are not blindly stumbling forward, you're moving from (working software + tiny change) to (working software including change). And repeat. If there's a problem, you learn about it immediately. To me that's the opposite of moving blindly.
> you really should stop and take stock after each iteration.
Who is not taking stock after every iteration? This is one of the fundamental principles of agile/lean/devops/XP/scrum. This one sentence drastically lowers my impression of the author's ability to comment on the subject.
> The faster people code, the more cleanup that is required. The longer you avoid cleaning it up, the worse it gets, on basically an exponential scale.
Unsafe tempo is as likely to happen in big-spec design projects as in small iterations. In fact, working in careful small iterations helps us manage a realistic tempo because we know we can't move faster than we can get things into production and evaluate.
The terrible outcomes listed in the same paragraph are linked to unwise practice and have nothing to do with small iteration size.
exactly - and this is what good agile attempts to address - implement (importantly) user-facing function in exactly the smallest size possible (also important) with those two things you can build, identify issues / questions / problems in the fastest feedback loop as possible
indeed, i would argue 'big iterations' are the ones where all the problems which the author mentions crop up in the first place!
100%. I came here to find something new from a field that I don't know but imagine has some good lessons for software. Instead I found someone commenting on small iterations vs big design which is quite ho-hum by comparison.
Warning, this post is not covering the ‘systems thinking’ that most of you will be expected to know in staff level jobs, it is using the term for up front design.
In a more typical modern sense systems thinking is more about relationships and wholes, rather than isolating parts, which is the traditional engineering approach.
While much of the base material on systems thinking will be based around cybernetics, it is really a complement to traditional engineering, used in parallel to identity more natural complexity boundaries and to help avoid confusion and accidental complexity.
Gregor Hohpe’s Architect Elevator is probably a good place to start on why this change in perspective is important and why investing in flexibility is crucial when there is uncertainty.
While you may have to accept this article’s definition in some groups, accepting the more modern definition will help you get jobs in places that are nicer to work.
This type of false dichotomy that is presented in the article is a warning that there is soft work to be done.
People mentioning mechanical engineering in this thread are possibly the people who may benefit most from examining the material. I encourage you to see if this is a path forward for your needs.
Waterfall specifications have never worked as advertised.
The 1980s and 90s were full of DOD-497 multi-kilogram documents being analyzed atomically to determine the specification, and they rarely came in close any of the 3 main dimensions of success: time, quality, or cost.
On the other hand, neither has Agile with a capital A, with the ceremony of documents replaced with the ceremony of JIRA tickets and t-shirts.
I've only ever built something that worked by first building a couple of things that didn't. No amount of theory or specification can replace what you learn by actually building and interacting with a system. Accept that there will be a version 2.
>> If you ignore a dependency and try to fix it later, it will be more expensive. More time, more effort, more thinking. And it will require the same level of coordination that you tried to avoid initially.
Would add that, if you only address fixing these dependencies one by one, as they manifest, i.e. continue in the evolutionary way, you risk resolving those parts of your Big System into some local minima; over time, you go from lots of little presumed-independent bubbles, to an intermediate stage with fewer but larger medium sized bubbles. When those get into conflict, the pain will be correspondingly greater.
>In a sense, it is the difference between the way an entrepreneur might approach doing a startup versus how we build modern skyscrapers. Evolution versus Engineering.
There are core differences in software engineering that, unlike in construction work:
- making changes is often cheaper
- we might not know beforehand everything that is needed to be built, especially unknown unknowns
I would still agree that the truth is somewhere in between, but I would argue that, for software, it's closer to the evolutionary approach.
"It’s not that you could cut the combined complexity in half, but more likely that you could bring it down to at least one-tenth of what it is today, if not even better. It would function better, be more reliable, and would be far more resilient to change. It would likely cost far less and require fewer employees as well. All sorts of ugly problems that they have now would just not exist."
Incidentally this highlights a problem when using chatbots to build large software projects that are intended to be used for a long period of time.
The key is not how much code you can add but how little you can get away with.
Chatbots only solution ever is to ADD code. They're not good at NOT writing code or even deleting it because after all the training set for the lines of code that do not exist is an empty set. Therefore it's impossible to train a robot to not write code.
What's better than generating 10kloc really fast? Not having it in the first place.
> you lay out a huge specification that would fully work through all of the complexity in advance, then build it.
I have tried this a couple of time even for small projects ( a few sprints ), and they never worked out. I'd argue it never works out if you are doing non-system programming projects, and only has a theoretical non-zero possibility to work out for system programming projects, and perhaps a 5-10% to work out for very critical and no patch possible projects (like moon landing).
Because requirements always change. Humans always change. That's it. No need to elaborate.
Nah I’m good. I’ve watched system architecture framework views be developed. Years of prep and planning. System is released and half the employees that had requirements no longer work there and the business already pivoted to a new industry focus.
There’s a reason we went this way in software development a quarter century ago.
Everything that is touching hardware, for example. Bluetooth stack, HDMI, you name it.
Everything W3C does. Go is evolving through specs first. Probably every other programming language these days.
People already do that for humankind-scale projects where there have to be multiple implementations that can talk to each other. Iteration is inevitable for anything that gains traction, but it still can be iteration on specs first rather than on code.
Prototype and the specs go hand in hand. Write a spec - prove you can implement it. Write an implementation - write a spec so we can talk about what is important (vs details of how you implemented it but someone else is allowed to implement differently) Often parts that are "obvious" are not implemented, or only the trivial version is implemented. You need to do both.
> Also, the other side of it is that evolutionary projects are just more fun. I’ve preferred them. You’re not loaded down with all those messy dependencies. Way fewer meetings, so you can just get into the work and see how it goes. Endlessly arguing about fiddly details in a giant spec is draining, made worse if the experience around you is weak.
IMO the problem isn't discussing the spec per se. It's that the spec doesn't talk back the way actual working code does. On a "big upfront design" project, there is a high chance you're spending a lot of time on moot issues and irrelevant features.
Making a good spec is much harder than making working software, because the spec may not be right AND the spec may not describe the right thing.
I suppose it's primarily a matter of experience. And as the article alludes, it's very important to deeply understand the subject matter. I highly value some of my non-programmer colleagues responsible for documentation. But can't put my finger on what exactly they brought to table that made their prose exceptionally good (clear, concise, spot on)...
Something i see pop up in large Orgs and software solutions is as follows.
- you create large number of working small apps .
- you create a spec from these apps .
- create a huge app .
- make a Dsl to make extensible .
- extend the Dsl to fit what you need in the future .
- optimize the Dsl remove obvious N+1 stuff.
The hard part is throwing away the code in each step . Both managment and devs cant stomach the reality that the code is useless at each stage prior to dsl. They cant molt and discard the shell and hence the project dies.
Big upfront designs are obviously based on big upfront knowledge which nobody has.
When they turn out to be based on false assumptions of simplicity the fallout is that the whole thing can't go forward because of one of the details.
Evolutionary systems at least always work to some degree even if you can look after the fact and decide that there's a lot of redundancy. Ideally you would then refactor the most troublesome pieces.
Big upfront design always tries to design too many things that should be implementation details. Meanwhile the things that are really important are often ignored - because you don't even realize they are important at the time.
The Evolution method outlined also seems born from the Continuous Delivery paradigm that was required for subscription business models. I would argue Engineering is the superior approach as the Lean/Agile methods of production were born from physical engineering projects whose end result was complete. Evolution seems to be even more chaotic because an improper paradigm of 'dev ops' was used instead of organically emerged as one would expect with an evolving method.
Ai assistance would seem to favor the engineering approach as the friction of teams and personalities is reduced in favor of quick feasibility testing and complete planning.
I think that a comparison with Engineering is not that helpful for software.
Software has 0 construction cost but that it does have is extremely complicated behavior.
Take a bridge for example: the use case is being able to walk or drive or ride a train across it. It essentially proves a surface to travel on. The complications of providing this depend on the terrain, length etc etc and are not to be dismissed but there's relatively little doubt about what a bridge is expected to do. We don't iterate bridge design because we don't need to know much from the users of the bridge: does it fulfill their needs, is it "easy to use" etc AND because construction of a bridge is extremely expensive so iteration is also incredibly costly. We do, however, not build all bridges the same and people develop styles over time which they repeat for successive bridges and we iterate that way.
In essence, cycling is about discovering more accurately what is wanted because it is so often the case that we don't know precisely at the start. It allows one to be far more efficient because one changes the requirements as one learns.
> There are two main schools of thought in software development about how to build really big, complicated stuff.
That feels like a straw man to me. This is not a binary question. For each small design decision you have a choice about how much uncertainty you accept.
There are no "two schools". There is at least a spectrum between two extremes and no real project was ever at either of the very ends of it. Actually, I don't think spectrum is a proper word even because this is not just a single dimension. For example, speed and risk often correlate but they are also somewhat independent and sometimes they anti-correlate.
Grady Booch said that any large system that works is invariably found to have evolved from a smaller system that worked. I've seen this cited as Gall's Law, from John Gall's 2012 book Systemantics, but I read it in a book by Booch back in the late 80's/early 90's. At that time the "waterfall model" was the conventional wisdom: to the extent possible, gather all the requirements, then do all the design, then do all the coding, then do all the testing, doing the minimum of rework at each step.
It didn't work, even for the "large" systems of that time: and Booch had worked on more than a few. The kind of "system" the OP is describing is vastly larger, and vastly more complex. Even if you could successfully apply the waterfall model to a system built over two or three years, you certainly can't for a system of systems built over 50 years: the needs of the enterprise are evolving, the software environment is evolving, the hardware platform is evolving.
What you can do, if you're willing to pay for it, is ruthlessly attack technical debt across your system of systems as a disciplined, on-going activity. Good luck with that.
The paradox in post is resolved by limiting the planning to Russian doll like nested timeframes and scopes, upgrading from endless 2 week sprints and quarterly or annual "planning" to cycles within cycles, scoped to human magnitudes of time, and JIT re-planned at the Nyquist interval of each cycle by those at the corresponding level of the enterprise org chart who must also be domain leads with mastery at that level and who have retained ability to probe two levels below while practiced at bringing along at least one level up.
The 1970 Royce paper was about how waterfall didn't work, and most "Agile" is a subset of DSDM, each flavor missing a necessary thing or two whether working in large systems or growing them greenfield from nothing. But DSDM wasn't "little a" agile (and SAFE just isn't). There is a middle way.
If you like applying this stuff (e.g. you've chatted with Gene Kim, follow Will Larsen, whatever, sure, but you've deliberately iterated your approach based on culture and outcome observability), feel free to drop me a note to user at Google's thing.
A major factor supporting evolution over big up-front design is the drift in system requirements over time. Even on large military like projects, apparently there's "discovery"--and the more years that pass, the more requirements change.
This isn't my experience. Requirements tend to settle over time (unless they're stupidly written). Users tend to like things to stay the same, with perhaps some improvement to performance here and there.
But if anything, all development is the search for the search for the requirements. Some just value writing them down.
> There should be some balanced path in the middle somewhere, but I haven’t stumbled across a formal version of it after all these decades.
Well, there isn't a formal version of it, because the answer is not formal, it is cultural.
In enterprise software, you have an inherent tension between good software engineering culture, where you follow the Boy Scouts' principle of leaving a codebase cleaner than you found it, and SOC2 compliance requirements that expect every software change to be tracked and approved.
If every kind of clean up requires a ticket, that has to be exhaustively filled out, then wait for a prioritization meeting, then wait for the managers and the bean counters to hem and haw while they contemplate whether or not it's worth it to spend man-hours on non-functional work, then if you're lucky then decide you can spend some time on it three weeks from now and if you're unlucky they decide nope, you gotta learn to work within an imperfect system. After once or twice or trying to work By The Book, most engineers with an ounce of self-respect will decide "fuck it, clearly The System doesn't care," and those with two ounces of self-respect will look for work elsewhere.
Or, you get together with the members of your team and decide, you know what, the program managers and the bean counters, they're not reading any of the code, not doing any of the reviews, and they have no idea how any of this works anyway. So you collectively decide to treat technical debt as the internal concern that it anyway was in the first place - you take an extra half hour, an extra day, however long it takes to put in the extra cleaning or polish, and just tack it on to an existing ticket. You give a little wink and you get a little nod and you help the gears turn a little more smoothly, which is all the stakeholders actually care about anyway.
You cannot replace culture with process. All attempts to replace culture with process will fail. People are not interchangeable cogs in the machine. If you try to treat them as such, they will gum up and get stuck. Ownership and autonomy are the grease that allows the human flywheel to spin freely. That means allowing people to say, "I'm going to do this because I think that it is Right And Good For My System Which I Own", and allowing them to be responsible for the consequences. To pass SOC2, that means treating people like adults and allowing them to sometimes say, instead of "can I get this reviewed because I legit need another set of eyes to take a serious look?", to say "can I get a quick rubber-stamp on this please?"
Software cannot be built like skyscrapers because the sponsors know about the malleability of the medium and treat it like a lump of clay that by adding water can be shaped to something else.
You're mixing up design and manufacturing. A skyscraper is first completely designed (on paper, cad systems, prototypes), before it is manufactured. In software engineering, coding is often more a design phase than a manufacturing phase.
Designers need malleability, that is why they all want digital design systems.
Yep! Manufacturing is the running of the software, either via testing or via deployment. That’s when you’ll find bugs or design defects. Operational errors (misconfigurations, under allocation of resources) are not related to the design of the software itself.
Splitting coding and design is a bad idea. It’s like asking engineers not to draw and measure.
But software is in fact not very malleable at all. It's true the medium supports change, it's just a bunch of bits, but change is actually hard and expensive, perhaps more than other mediums.
How rapidly has business software changed since COVID? Yet how many skyscrapers remain partially unoccupied in big cities like London, because of the recent arrival of widespread hybrid working?
The buildings are structurally unchanged and haven't been demolished to make way for buildings that better support hybrid working. Sure office fit outs are more oriented towards smaller simultaneous attendance with more hot desking. Also a new industry boom around team building socials has arrived. Virtual skeet shooting or golf, for example.
On the whole, engineered cities are unchanged, their ancient and rigid specifications lacking the foresight to include the requirements that accommodate hybrid working. Software meanwhile has adapted and as the OP says, evolved.
Borrowing from mechanical/electrical etc. Limit the number of things you can build with. An example in the comments here was a gear. You make a new gear based on examples of gears that work. So whats the software equivalent of a gear? an axle, a bearing, etc.. Using OO or some ABI, you specify an object is a gear and behaves like a gear and magically you know how it does or doesnt fit together with other objects. I know this idea has been used before but im wondering if theres a well known software framework or library. We have things like the stl in cpp or built in libraries in python but im thinking of a higher level abstraction.
> if theres a well known software framework or library
Those are called data/structures and design patterns (not only the ones in the GoF book). If you have a good understanding of those and you know your data and the operation you will apply to it, it’s easy to model your data using those structures. Software are state machines specifications. Knowing how to model states and thus figuring the transitions is helpful. And there’s a lot of samples out there.
This missed the point that they are ignoring evolution is literally the way you build things. There is no other way. You don't know what is actually needed or what might work really. You try things and then compress later. If you can try bigger things, bigger leaps great.
> There are two main schools of thought in software development about how to build really big, complicated stuff.
> The most prevalent one, these days, is that you gradually evolve the complexity over time. You start small and keep adding to it.
> The other school is that you lay out a huge specification that would fully work through all of the complexity in advance, then build it.
I doubt many people (if they stop to think about it) actually fit into either of these two schools of thought, they're both extremist positions. It's like claiming that the US population is filled with people believing in either big government autocracy and communism or small government libertarianism and free markets. That's an absurd position to take, just as these opening paragraphs are absurd.
The most generous interpretation is that the author is setting up a strawman.
These are extremist positions. No one except a fool would think that they can design a truly large system from scratch before ever writing a piece of code, and no one but a fool thinks they can write the code for a large system without ever thinking about the design.
The reality is that it sits in between, the author figures this out by the end, fortunately, but wants something they can't have:
> There should be some balanced path in the middle somewhere, but I haven’t stumbled across a formal version of it after all these decades.
First, they at least admit that they're stumbling. That's good, groping around in the dark is not an effective way to find an answer, turn on the lights. You aren't the only one thinking about this subject.
Second, for small projects and simple projects, or rehashes of projects you've done before, the approach often doesn't matter. This only matters for large, complex, and/or novel projects.
Software development is a design process.
Again, in case this was missed: Software development is a design process.
The idea of separating design from development is foolish (BDUF). The idea of separating development from design is equally foolish (extreme take on Agile, see Ron Jeffries failing at sudoku because he tries to use a development technique, TDD, without thinking about the design).
Take the "over 3000 active systems" from paragraph 4. There is no way anyone could have designed all 3000 active systems (either as the 3k systems or in a compressed form) from scratch in a reasonable amount of time. The only reason the author can think of a better design is because they have a design, even if it's not formally documented. The existing software is the design [0] that they can draw from to come up with the better design.
But wait, the foolish BDUF people would not try to refine the system, they'd try and build a new system from scratch. Don't be a fool.
The foolish extreme Agile people would not look at the whole (or a large enough section) and think about refining it, they'd just add to it or change the existing systems.
> Since the foundations like tech stacks, frameworks, and libraries are always changing rapidly these days, there are few accepted best practices, and most issues are incorrectly believed to be subjective.
Huh? at least in web, the "big ones" (angular/react/vue/svelte) have been around for YEARS at this point and IMO mostly stablized (though i still don't understand why angular needs to releast a breaking change version every 6-12 months)
the major 'issue' is often near 0 understanding in the fundamentals i'm talking like, super basic, what is an onclick function, how do we get our website to talk to our backend. if you can have clean domain and abstraction cuts, the rest really is a 'technical detail' - i.e. language / framework / technologies truly are all subjective. there are probably combined over 1000 valid tech stacks for a standard "show me table entries in a web dashboard" - not one is 'more correct' or 'more wrong' than the other, but rather in a given _organization_ or with a _given set of devs_ etc. that makes it 'more wrong'. the tech doesn't care, its the team of humans and HOW that team of humans interacts with the stack that is where things can go off the rails
there are so many ways to skin a cat and there ARE tradeoffs (positive AND negative) to each of these tech decisions... i'm not sure what the author is getting at - he seems to hint there are a select few sets of known best practices and tech choices, but fails to list them explicitly... this is dubious at best, and in some way counteracts his claim that it is "incorrect" that tech choices are subjective.
if you want to be non-subjective, be objective. name exactly the tech decisions and best practices you are talking about!
But I do still think there's a lot of value into coming up with a good plan before jumping in. A lot of software people like to jump in and I see them portray the planning people as trying to figure everything out first. (I wonder if we reinforce the jumping in head first mentality because people figure out you can't plan everything) A good plan helps you prevent changing specs and prepares you for hiccups. It helps by having others but basically all you do is try to think of all the things that could go wrong. Write them down. Triage. If needed, elevate questions to the decision makers. Try a few small scale tests. Then build out. But building out you're always going to find things you didn't see. You can't plan forever because you'll never solve the unknown unknowns until you build, but also good prep makes for smoother processes. It's the reason engineers do the math before they build a bridge. Not because the math is a perfect representation and things won't change (despite common belief, it's not static) but because the plan is cheaper than the build and having a plan allows you to better track changes and helps you determine how off the rails you've gone.
It is also perplexing to me that people think they can just plan everything out and give it to a LLMs. Do you really believe your manager knows everything that needs to be done when they assign jobs to you? Of course not, they couldn't. Half the job is figuring out what the actual requirements are.
> This has never happened and never will. You simply are not omniscient. Even if you're smart enough to figure everything out the requirements will change underneath you.
I am one of those "battle-scarred twenty-year+ vets" mentioned in the article, currently working on a large project for a multinational company that requires everything to be specified up-front, planned on JIRA, estimates provided and Gantt charts setup before they even sign the contract for the next milestone.
I've worked on this project for 18 months, and I can count on zero hands the times a milestone hasn't gone off the rails due to unforeseen problems, last-minute changes and incomplete specifications. It has been an growing headache for the engineers that have to deliver within these rigid structures, and it's now got to the point that management itself has noticed and is trying to convince the big bosses we need a more agile and iterative approach.
Anyone who claims upfront specs are the solution to all the complexity of software either has no real world experience, or is so far removed from actual engineering they just don't know what they're talking about.
Nothing will get you to hit every milestone. However you can make progress if you have years of experience in that project and the company is willing to invest in the needed time to make things better (they rarely are)
My approach, especially for a project with a lot of unknowns, is usually to jump in right away and try to build a prototype. Then iterate a few times. If it's a small enough thing, a few iterations is enough to have a good result.
If it's something bigger, this is the point where it's worth doing some planning, as many of the problems have already been surfaced, and the problem is much better understood.
And things like "race conditions"/lack of scalability due to improper threading architecture aren't especially easy to fix(!)..
Also, there's a certain point where you can't avoid management sabotaging things.
It's sort of the old General Eisenhower quote: "In preparing for battle I have always found that plans are useless, but planning is indispensable."
Of course, it requires some discipline to not just yolo the prototype into production when that’s not appropriate.
I discussed some of this in https://www.ebiester.com/agile/2023/04/22/what-agile-alterna... and it gives a little bit of history of the methods.
We are nearly 70 years into this discussion at this point. I'm sure Grace Hopper and John Mauchly were having discussions about this around UNIVAC programs.
> But I do still think there's a lot of value into coming up with a good plan before jumping in.
Definitely, with emphasis on a _good_ plan. Most "plans" are bad and don't deserve that name.
> be specified up-front, planned on JIRA
Making a plan up-front is a good approach. A specification should be part of that plan. One should be ready to adapt it when needed during execution, but one should also strive to make the spec good enough to avoid changing.
HOWEVER, the "up-front specification" you mentioned was likely written _before_ making a plan, which is a bad approach. It was probably written as part of something that was called "planning" and has nothing to do with actual planning. In that case, the spec is pure fiction.
> estimates provided
Unless this project is exceptional, the estimates are probably fiction too.
> and Gantt charts setup
Gantt charts are a model, not a plan. Modeling is good; it gives you insight into the project. But a model should not be confused with a plan. It is just one tiny fragment you need to build a plan, and Gantt charts are just one of many many many types of models needed to build a plan.
> before they even sign the contract for the next milestone
That's a good thing. Signing a contract is an irreversible decision. The only contract that should be signed before planning is done is the contract that employs the planners.
> Anyone who claims upfront specs are the solution
See bove. A rigid upfront spec is usually not a plan, but pure fiction.
> My approach, especially for a project with a lot of unknowns, is usually to jump in right away and try to build a prototype.
Whether this is called planning or "jumping in" is a difference in terminology, not in the approach. The relevant clue is that you are experimenting with the problem to understand it, but you are NOT making irreversible decisions. By the terminology used in that book, you are _planning_, not _executing_.
> after the 2000 pages specification document was written, and passed down from the architects to the devs
If the 2000 page spec has never been passed to the devs while writing it, it's not part of a plan, it's pure fiction. Trying to develop software against that spec is part of planning.
You need smaller documents - this is the core technology we are using. This is how one subsystem is designed - often this should be on a whiteboard because once you get into the implementation details you need to change the plan, but the planning was useful. This is how to use core parts of the system so new comers can start working quick.
You need disciple to accept that sometimes libfoo is the best way to solve a problem in isolation, but since libbar is used elsewhere and can solve the problem your local problem will use libbar despite making your local problem uglier. Have a small set of core technologies that everyone knows and uses is sometimes more valuable than using the best tool for the job - but only sometimes.
My best project to date was a largely waterfall one - there was somewhere around 50-60 pages of A4 specs, a lot of which I helped the clients engineer. As with all plans, a lot of it changed during implementation, actually I figured out a way of implementing the same functionality, but automating it to a degree where about 15 of those could be cut out.
Furthermore, it was immensely useful because by the time I actually started writing code, most of the questions that needed answers and would alter how it should be developed had already come up and could be resolved, in addition to me already knowing about some edge cases (at least when it came to how the domain translates into technology) and how the overall thing should work and look.
Contrast that to some cases where you're just asked to join a project and help out and you jump into the middle of ongoing development, not going that much about any given system or the various things that the team has been focusing on in the past few weeks or months.
> It’s not hard to see that if they had a few really big systems, then a great number of their problems would disappear. The inconsistencies between data, security, operations, quality, and access were huge across all of those disconnected projects. Some systems were up-to-date, some were ancient. Some worked well, some were barely functional. With way fewer systems, a lot of these self-inflicted problems would just go away.
Also this reminds me of https://calpaterson.com/bank-python.html
In particular, this bit:
> Barbara has multiple "rings", or namespaces, but the default ring is more or less a single, global, object database for the entire bank. From the default ring you can pull out trade data, instrument data (as above), market data and so on. A huge fraction, the majority, of data used day-to-day comes out of Barbara.
> Applications also commonly store their internal state in Barbara - writing dataclasses straight in and out with only very simple locking and transactions (if any). There is no filesystem available to Minerva scripts and the little bits of data that scripts pick up has to be put into Barbara.
I know that we might normally think that fewer systems might mean something along the lines of fewer microservices and more monoliths, but it was so very interesting to read about a case of it being taken to the max - "Oh yeah, this system is our distributed database, file storage, source code manager, CI/CD environment, as well as web server. Oh, and there's also a proprietary IDE."
But no matter the project or system, I think being able to fit all of it in your head (at least on a conceptual level) is immensely helpful, the same way how having a more complete plan ahead of time can be helpful with a wide variety of assumptions vs "we'll decide in the next sprint".
And by doing this sort of exercise, you can avoid wasting time on dead ends, bad design, and directionless implementation. It's okay if requirements change or you discover something later on that requires rethinking. The point is to make your thinking more robust. You can always amend a design document and fill in relevant details later.
Furthermore, a mature design begins with the assumption that requirements (whether actual requirements or knowledge of them) may change. That will inform a design where you don't paint yourself into a corner, that is flexible enough to be adapted (naturally, if requirements change too dramatically, then we're not really talking about adaptation of a product, but a whole new product).
How much upfront design work you should do will depend on the project, of course. So there's a middle way between the caricature of waterfall and the caricature of agile.
There is a related phenomenon in some types of software where the cost of building an operational prototype asymptotically converges on the cost of just writing the production code. (This is always a fun one to explain to management that think building a prototype massively reduces delivery risk.)
Some projects have been forced so far, by diverting resources (either public-funded or not-yet-profitable VC money), but these efforts have not proven to be self-sustaining. Humans will be perpetually stuck where we are as a species if we cannot integrate the currently opposing ideas of up-front planning vs. move fast and break things.
Society is slowly realizing the step-change in difficulty between projects in controlled conditions that can have simplified models to these irreducibly complex systems. Western doctors are facing an interesting parallel, now becoming more aware to treat human beings in the same way--that we emerge as a result of parts which can be simplified and understood, but could never describe the overall system behavior. We are good examples of the intrinsic fault-tolerance required for such systems to remain stable.
If you are doing a CRUD web app for a local small business - there are thousands of examples. If you are writing control software for a space station - you may not have access to code from NASA/Russia/China but you can at least look at generic software that does the things you need and learn some lessons.
A system of services that interact, where many of them are depending on each other in informal ways may be a complex system. Especially if humans are also involved.
Such a system is not something you design. You just happen to find yourself in it. Like the road to hell, the road to a complex system is paved with good intentions.
If the definition of "complex" is instead something more like "a system of services that interact", "prone to multiple, coincidental failures", then I don't think it's impossible to design them. It's just very hard. Manufacturing lines would be examples, they are certainly designed.
The design of the manufacturing lines and the resulting supply chain are not independent of each other -- you can trace features from one to the other -- but you cannot take apart the supply chain and analyze the designs of its constituent manufacturing lines and actually predict the behavior of the larger system.
AFAIK there's not a great definition of a complex system, just a set of traits that tend to indicate you're looking at one. Non-linearity, feedbacks, lack of predictability, resistance to analysis (the "you can't take it apart to reason about the whole" characteristic mentioned above"). All of these traits are also kind of the same things... they tend to come bundled with one another.
(And no, this is not "my" definition, it's how it's defined in the systems-related disciplines.)
The set of system designs that exhibit naturally stable behavior doesn't overlap much with the set of system designs that deliver maximum performance and efficiency. The capability gap between the two can be large but most people choose easy/simple.
There is an enormous amount of low-hanging opportunity here but most people, including engineers, struggle with systems thinking.
The law is maybe a little too simplistic in its formulation, but it's fundamentally true.
Care to exemplify?
Point is though eventually some system runs out of ability. It works different in programming from physical construction, but the concept is the same, eventually you can't make a bad early design work anymore.
See also: "there is nothing more permanent than a temporary solution"
In this sense, web applications haven't changed so much in the last twenty years: client, server, database...
Not sure why you're trying to bring AI development into this.
The first is too ambitious and ends in an unmaintainable pile around a good core idea.
The second tries to "get everything right" and suffers second system syndrome.
The third gets it right but now for a bunch of central business needs. You learned after all. It is good exactly because it does not try to get _everything_ right like the second did.
The fourth patches up some more features to scoop up B and C prios and calls it a day.
Sometimes, often in BigCorp: Creators move on and it will slowly deteriorate from being maintenaned...
That's very simple. The balanced path depends directly on how much of the requirements and assumptions are going to change during the life time of the thing you are building.
Engineering is helpful only to the extent you can forsee the future changes. Anything beyond that requires evolution.
You are able to comment on the complexity of that large company only because you are standing in the future into 50 years from when those things started take shape. If you were designing it 50 years back, you would end up with same complexity.
The nature's answer to it is, consolidate and compact. Everything that falls onto earth gets compacted into a solid rock over time, by a huge pressure of weight. All complexity and features are flattened out. Companies undergo similar dynamics driven by pressures over time, not by big-bang engineering design upfront.
2. I would be more receptive to this argument if they had listed some famous examples of successful, large systems that were built like this. On the other hand, I can easily list many failures: FAA Advanced Automation System (1980s), IRS Tax Systems Modernization (1990s), UK NHS National Programme for IT (2000s).
3. Waterfall vs. agile is a continuum. Nobody plans everything, down to each if-statement, and nobody wings it without some kind of planned architecture (even if just inside one person's head). Where you are on the continuum depends on the nature of the problem (are all requirements known?), the nature of the team (have they done this before?), and the criteria for success (are there lives depending on this?).
4. The analogy to building a building is flawed. At large enough scale, software is like a city, and all successful cities have gradually evolved in complexity. Come back to me when someone builds a 1-million person arcology on some island in the Pacific.
5. Just as some PhDs are sensitive about being called "Doctor", some software engineers are sensitive about being "real engineers". Stop thinking about that. What we do as software engineers is immensely valuable and literally changing the world (usually, but not always, for the better). Let's stop worrying about whether or not what we do is "engineering" and focus on what we do best: building complex systems that have never before existed on earth.
The core point they're trying to make is that agile (or similar) practices are the incorrect way to approach consolidation of smaller systems into bigger ones when the overall system already works and is very large.
I agree with their assertion that being forced to address difficult problems earlier on in the process results in ultimately better outcomes, but I think it ignores the reality that properly planning a re-write of monumentally sized and already in use system is practically impossible.
It takes a long time (years?) to understand and plan all the essential details, but in the interim the systems you're wanting to rewrite are evolving and some parts of the plan you thought you had completed are no longer correct. In essence, the goal posts keep shifting.
In this light, strangler fig pattern is probably the pragmatic approach for many of these re-writes. It's impossible to understand everything up front, so understand what you reasonably can for now, act on that, deliver something that works and adds value, then rinse and repeat. The problem is that for sufficiently large system, this will take decades and few software architects stick around at a single company long enough to see it through.
A final remark I want to make is that, after only a few years of being a full-time software developer, "writing code" is one of the easiest parts of the job. The hard part is knowing what code needs to be written, this requires skills in effective communication with various people, including other software developers and (probably more importantly) non-technical people who understand how the business processes actually need to work. If you want to be a great software developer, learn how to be good at this.
I highly applaud this idea. IMO this is why big upfront design is so risky.
> The most prevalent one, these days, is that you gradually evolve the complexity over time. You start small and keep adding to it.
> The other school is that you lay out a huge specification that would fully work through all of the complexity in advance, then build it.
I think AI will drive an interesting shift in how people build software. We'll see a move toward creating and iterating on specifications rather than implementations themselves.
In a sense, a specification is the most compact definition of your software possible. The knowledge density per "line" is much higher than in any programming language. This makes specifications easier to read, reason about, and iterate on—whether with AI or with peers.
I can imagine open source projects that will revolve entirely around specifications, not implementations. These specs could be discussed, with people contributing thoughts instead of pull requests. The more articulated the idea, the higher its chance of being "merged" into the working specification. For maintainers, reviewing "idea merge requests" and discussing them with AI assistants before updating the spec would be easier than reviewing code.
Specifications could be versioned just like software implementations, with running versions and stable releases. They could include addendums listing platform-specific caveats or library recommendations. With a good spec, developers could build their own tools in any language. One would be able to get a new version of the spec, diff it with the current one and ask AI to implement the difference or discuss what is needed for you personally and what is not. Similarly, It would be easier to "patch" the specification with your own requirements than to modify ready-made software.
Interesting times.
We have yet to see a largely llm driven language implementation, but it is surely possible. I imagine it would be easier to tell the llm to instead translate the Java implementation to whatever language you need. A vibe-coded language could do major damage to a companies data.
[0] https://iceberg.apache.org/spec/ [1] https://lists.apache.org/thread/whbgoc325o99vm4b599f0g1owhgw...
This is a really good observation and I predict you will be correct.
There is a consequence of this for SaaS. You can imagine an example SaaS that one might need to vibecode to save money. The reason its not possible now is not because Claude can't do it, its because getting the right specs (like you suggested) is hard work. A well written spec will not only contain the best practices for that domain of software but also all the legal compliance BS that comes along with it.
With a proper specification that is also modular, I imagine we will be able to see more vibecoded SaaS.
Overall I think your prediction is really strong.
One issue is that a spec without a working reference implementation is essentially the same as a pull request that's never been successfully compiled. Generalization is good but you can't get away from actually doing the thing at the end of the day.
I've run into this issue with C++ templates before. Throw a type at a template that it hasn't previously been tested with and it can fall apart in new and exciting ways.
> The WHATWG was based on several core principles, (..) and that specifications need to be detailed enough that implementations can achieve complete interoperability without reverse-engineering each other.
But in my experience you need more than a spec, because an implementation is not just something that implements a spec, it is also the result of making many architectural choices in how the spec is implemented.
Also even with detailed specs AI still needs additional guidance. For example couple of weeks ago Cursor unleashed thousands of agents with access to web standards and the shared WPT test suite: the result was total nonsense.
So the future might rather be like a Russian doll of specs: start with a high-level system description, and then support it with finer-grained specs of parts of the system. This could go down all the way to the code itself: existing architectural patterns provide a spec for how to code a feature that is just a variation of such a pattern. Then whenever your system needs to do something new, you have to provide the code patterns for it. The AI is then relegated to its strength: applying existing patterns.
TLA+ has a concept of refinement, which is kind of what I described above as Russian dolls but only applied to TLA+ specs.
Here is a quote that describes the idea:
There is no fundamental distinction between specifications and implementations. We simply have specifications, some of which implement other specifications. A Java program can be viewed as a specification of a JVM (Java Virtual Machine) program, which can be viewed as a specification of an assembly language program, which can be viewed as a specification of an execution of the computer's machine instructions, which can be viewed as a specification of an execution of its register-transfer level design, and so on.
Source: https://cseweb.ucsd.edu/classes/sp05/cse128/ (chapter 1, last page)
> the size of the iterations matters, a whole lot. If they are tiny, it is because you are blindly stumbling forward. If you are not blindly stumbling forward, they should be longer, as it is more effective.
You are not blindly stumbling forward, you're moving from (working software + tiny change) to (working software including change). And repeat. If there's a problem, you learn about it immediately. To me that's the opposite of moving blindly.
> you really should stop and take stock after each iteration.
Who is not taking stock after every iteration? This is one of the fundamental principles of agile/lean/devops/XP/scrum. This one sentence drastically lowers my impression of the author's ability to comment on the subject.
> The faster people code, the more cleanup that is required. The longer you avoid cleaning it up, the worse it gets, on basically an exponential scale.
Unsafe tempo is as likely to happen in big-spec design projects as in small iterations. In fact, working in careful small iterations helps us manage a realistic tempo because we know we can't move faster than we can get things into production and evaluate.
The terrible outcomes listed in the same paragraph are linked to unwise practice and have nothing to do with small iteration size.
indeed, i would argue 'big iterations' are the ones where all the problems which the author mentions crop up in the first place!
That’s certainly my experience
In a more typical modern sense systems thinking is more about relationships and wholes, rather than isolating parts, which is the traditional engineering approach.
While much of the base material on systems thinking will be based around cybernetics, it is really a complement to traditional engineering, used in parallel to identity more natural complexity boundaries and to help avoid confusion and accidental complexity.
Gregor Hohpe’s Architect Elevator is probably a good place to start on why this change in perspective is important and why investing in flexibility is crucial when there is uncertainty.
While you may have to accept this article’s definition in some groups, accepting the more modern definition will help you get jobs in places that are nicer to work.
This type of false dichotomy that is presented in the article is a warning that there is soft work to be done.
People mentioning mechanical engineering in this thread are possibly the people who may benefit most from examining the material. I encourage you to see if this is a path forward for your needs.
The 1980s and 90s were full of DOD-497 multi-kilogram documents being analyzed atomically to determine the specification, and they rarely came in close any of the 3 main dimensions of success: time, quality, or cost.
On the other hand, neither has Agile with a capital A, with the ceremony of documents replaced with the ceremony of JIRA tickets and t-shirts.
There are core differences in software engineering that, unlike in construction work:
- making changes is often cheaper
- we might not know beforehand everything that is needed to be built, especially unknown unknowns
I would still agree that the truth is somewhere in between, but I would argue that, for software, it's closer to the evolutionary approach.
Incidentally this highlights a problem when using chatbots to build large software projects that are intended to be used for a long period of time.
The key is not how much code you can add but how little you can get away with.
Chatbots only solution ever is to ADD code. They're not good at NOT writing code or even deleting it because after all the training set for the lines of code that do not exist is an empty set. Therefore it's impossible to train a robot to not write code.
What's better than generating 10kloc really fast? Not having it in the first place.
Because requirements always change. Humans always change. That's it. No need to elaborate.
In short: the tension described in "systems thinking" is the same one as the one between "spec driven" vs. "iterative prompting"
Nah I’m good. I’ve watched system architecture framework views be developed. Years of prep and planning. System is released and half the employees that had requirements no longer work there and the business already pivoted to a new industry focus.
There’s a reason we went this way in software development a quarter century ago.
Software is not a skyscraper.
Everything W3C does. Go is evolving through specs first. Probably every other programming language these days.
People already do that for humankind-scale projects where there have to be multiple implementations that can talk to each other. Iteration is inevitable for anything that gains traction, but it still can be iteration on specs first rather than on code.
In fact, this is how you build an aerospace program, satelite, and more.
It is even possible to develop software with agile processes in such a framework, even though strictly speaking it’s not fully agile.
IMO the problem isn't discussing the spec per se. It's that the spec doesn't talk back the way actual working code does. On a "big upfront design" project, there is a high chance you're spending a lot of time on moot issues and irrelevant features.
Making a good spec is much harder than making working software, because the spec may not be right AND the spec may not describe the right thing.
I suppose it's primarily a matter of experience. And as the article alludes, it's very important to deeply understand the subject matter. I highly value some of my non-programmer colleagues responsible for documentation. But can't put my finger on what exactly they brought to table that made their prose exceptionally good (clear, concise, spot on)...
- you create large number of working small apps .
- you create a spec from these apps .
- create a huge app .
- make a Dsl to make extensible .
- extend the Dsl to fit what you need in the future .
- optimize the Dsl remove obvious N+1 stuff.
The hard part is throwing away the code in each step . Both managment and devs cant stomach the reality that the code is useless at each stage prior to dsl. They cant molt and discard the shell and hence the project dies.
When they turn out to be based on false assumptions of simplicity the fallout is that the whole thing can't go forward because of one of the details.
Evolutionary systems at least always work to some degree even if you can look after the fact and decide that there's a lot of redundancy. Ideally you would then refactor the most troublesome pieces.
Ai assistance would seem to favor the engineering approach as the friction of teams and personalities is reduced in favor of quick feasibility testing and complete planning.
Software has 0 construction cost but that it does have is extremely complicated behavior.
Take a bridge for example: the use case is being able to walk or drive or ride a train across it. It essentially proves a surface to travel on. The complications of providing this depend on the terrain, length etc etc and are not to be dismissed but there's relatively little doubt about what a bridge is expected to do. We don't iterate bridge design because we don't need to know much from the users of the bridge: does it fulfill their needs, is it "easy to use" etc AND because construction of a bridge is extremely expensive so iteration is also incredibly costly. We do, however, not build all bridges the same and people develop styles over time which they repeat for successive bridges and we iterate that way.
In essence, cycling is about discovering more accurately what is wanted because it is so often the case that we don't know precisely at the start. It allows one to be far more efficient because one changes the requirements as one learns.
That feels like a straw man to me. This is not a binary question. For each small design decision you have a choice about how much uncertainty you accept.
There are no "two schools". There is at least a spectrum between two extremes and no real project was ever at either of the very ends of it. Actually, I don't think spectrum is a proper word even because this is not just a single dimension. For example, speed and risk often correlate but they are also somewhat independent and sometimes they anti-correlate.
It didn't work, even for the "large" systems of that time: and Booch had worked on more than a few. The kind of "system" the OP is describing is vastly larger, and vastly more complex. Even if you could successfully apply the waterfall model to a system built over two or three years, you certainly can't for a system of systems built over 50 years: the needs of the enterprise are evolving, the software environment is evolving, the hardware platform is evolving.
What you can do, if you're willing to pay for it, is ruthlessly attack technical debt across your system of systems as a disciplined, on-going activity. Good luck with that.
The 1970 Royce paper was about how waterfall didn't work, and most "Agile" is a subset of DSDM, each flavor missing a necessary thing or two whether working in large systems or growing them greenfield from nothing. But DSDM wasn't "little a" agile (and SAFE just isn't). There is a middle way.
If you like applying this stuff (e.g. you've chatted with Gene Kim, follow Will Larsen, whatever, sure, but you've deliberately iterated your approach based on culture and outcome observability), feel free to drop me a note to user at Google's thing.
Comments on actual blog post: 0
Why are people so afraid to leave replies on the author’s OC
But if anything, all development is the search for the search for the requirements. Some just value writing them down.
Well, there isn't a formal version of it, because the answer is not formal, it is cultural.
In enterprise software, you have an inherent tension between good software engineering culture, where you follow the Boy Scouts' principle of leaving a codebase cleaner than you found it, and SOC2 compliance requirements that expect every software change to be tracked and approved.
If every kind of clean up requires a ticket, that has to be exhaustively filled out, then wait for a prioritization meeting, then wait for the managers and the bean counters to hem and haw while they contemplate whether or not it's worth it to spend man-hours on non-functional work, then if you're lucky then decide you can spend some time on it three weeks from now and if you're unlucky they decide nope, you gotta learn to work within an imperfect system. After once or twice or trying to work By The Book, most engineers with an ounce of self-respect will decide "fuck it, clearly The System doesn't care," and those with two ounces of self-respect will look for work elsewhere.
Or, you get together with the members of your team and decide, you know what, the program managers and the bean counters, they're not reading any of the code, not doing any of the reviews, and they have no idea how any of this works anyway. So you collectively decide to treat technical debt as the internal concern that it anyway was in the first place - you take an extra half hour, an extra day, however long it takes to put in the extra cleaning or polish, and just tack it on to an existing ticket. You give a little wink and you get a little nod and you help the gears turn a little more smoothly, which is all the stakeholders actually care about anyway.
You cannot replace culture with process. All attempts to replace culture with process will fail. People are not interchangeable cogs in the machine. If you try to treat them as such, they will gum up and get stuck. Ownership and autonomy are the grease that allows the human flywheel to spin freely. That means allowing people to say, "I'm going to do this because I think that it is Right And Good For My System Which I Own", and allowing them to be responsible for the consequences. To pass SOC2, that means treating people like adults and allowing them to sometimes say, instead of "can I get this reviewed because I legit need another set of eyes to take a serious look?", to say "can I get a quick rubber-stamp on this please?"
Designers need malleability, that is why they all want digital design systems.
Splitting coding and design is a bad idea. It’s like asking engineers not to draw and measure.
It was discussed here just 2 days ago intensively.
https://news.ycombinator.com/item?id=46881543
How rapidly has business software changed since COVID? Yet how many skyscrapers remain partially unoccupied in big cities like London, because of the recent arrival of widespread hybrid working?
The buildings are structurally unchanged and haven't been demolished to make way for buildings that better support hybrid working. Sure office fit outs are more oriented towards smaller simultaneous attendance with more hot desking. Also a new industry boom around team building socials has arrived. Virtual skeet shooting or golf, for example.
On the whole, engineered cities are unchanged, their ancient and rigid specifications lacking the foresight to include the requirements that accommodate hybrid working. Software meanwhile has adapted and as the OP says, evolved.
Those are called data/structures and design patterns (not only the ones in the GoF book). If you have a good understanding of those and you know your data and the operation you will apply to it, it’s easy to model your data using those structures. Software are state machines specifications. Knowing how to model states and thus figuring the transitions is helpful. And there’s a lot of samples out there.
> The most prevalent one, these days, is that you gradually evolve the complexity over time. You start small and keep adding to it.
> The other school is that you lay out a huge specification that would fully work through all of the complexity in advance, then build it.
I doubt many people (if they stop to think about it) actually fit into either of these two schools of thought, they're both extremist positions. It's like claiming that the US population is filled with people believing in either big government autocracy and communism or small government libertarianism and free markets. That's an absurd position to take, just as these opening paragraphs are absurd.
The most generous interpretation is that the author is setting up a strawman.
These are extremist positions. No one except a fool would think that they can design a truly large system from scratch before ever writing a piece of code, and no one but a fool thinks they can write the code for a large system without ever thinking about the design.
The reality is that it sits in between, the author figures this out by the end, fortunately, but wants something they can't have:
> There should be some balanced path in the middle somewhere, but I haven’t stumbled across a formal version of it after all these decades.
First, they at least admit that they're stumbling. That's good, groping around in the dark is not an effective way to find an answer, turn on the lights. You aren't the only one thinking about this subject.
Second, for small projects and simple projects, or rehashes of projects you've done before, the approach often doesn't matter. This only matters for large, complex, and/or novel projects.
Software development is a design process.
Again, in case this was missed: Software development is a design process.
The idea of separating design from development is foolish (BDUF). The idea of separating development from design is equally foolish (extreme take on Agile, see Ron Jeffries failing at sudoku because he tries to use a development technique, TDD, without thinking about the design).
Take the "over 3000 active systems" from paragraph 4. There is no way anyone could have designed all 3000 active systems (either as the 3k systems or in a compressed form) from scratch in a reasonable amount of time. The only reason the author can think of a better design is because they have a design, even if it's not formally documented. The existing software is the design [0] that they can draw from to come up with the better design.
But wait, the foolish BDUF people would not try to refine the system, they'd try and build a new system from scratch. Don't be a fool.
The foolish extreme Agile people would not look at the whole (or a large enough section) and think about refining it, they'd just add to it or change the existing systems.
The sensible person says, "Wait, I acn'
Huh? at least in web, the "big ones" (angular/react/vue/svelte) have been around for YEARS at this point and IMO mostly stablized (though i still don't understand why angular needs to releast a breaking change version every 6-12 months)
the major 'issue' is often near 0 understanding in the fundamentals i'm talking like, super basic, what is an onclick function, how do we get our website to talk to our backend. if you can have clean domain and abstraction cuts, the rest really is a 'technical detail' - i.e. language / framework / technologies truly are all subjective. there are probably combined over 1000 valid tech stacks for a standard "show me table entries in a web dashboard" - not one is 'more correct' or 'more wrong' than the other, but rather in a given _organization_ or with a _given set of devs_ etc. that makes it 'more wrong'. the tech doesn't care, its the team of humans and HOW that team of humans interacts with the stack that is where things can go off the rails
there are so many ways to skin a cat and there ARE tradeoffs (positive AND negative) to each of these tech decisions... i'm not sure what the author is getting at - he seems to hint there are a select few sets of known best practices and tech choices, but fails to list them explicitly... this is dubious at best, and in some way counteracts his claim that it is "incorrect" that tech choices are subjective.
if you want to be non-subjective, be objective. name exactly the tech decisions and best practices you are talking about!