What a fantastic list. I'll be saving it to show the junior developers.
My only nitpick is that "reliability" should have been a point by itself. All the other "ilities" can be appropriately sacrificed in some context, but I've never seen unreliable software being praised for its code quality.
Which is part of why LLMs are so frustrating. They're extremely useful and extremely unreliable.
> Code has always been expensive. Producing a few hundred lines of clean, tested code takes most software developers a full day or more. Many of our engineering habits, at both the macro and micro level, are built around this core constraint.
> ...
> Writing good code remains significantly more expensive
I think this is a bad argument. Code was expensive because you were trying to write the expensive good code in the first place.
When you drop your standards, then writing generated code is quick, easy and cheap. Unless you're willing to change your standard, getting it back to "good code" is still an equivalent effort.
There are alternative ways to define the argument for agentic coding, this is just a really really bad argument to kick it off.
In my experience, it’s even more effort to get good code with an agent-when writing by hand, I fully understand the rationale for each line I write. With ai, I have to assess every clause and think about why it’s there. Even when code reviewing juniors, there’s a level of trust that they had a reason for including each line (assuming they’re not using ai too for a moment); that’s not at all my experience with Codex.
Last month I did the majority of my work through an agent, and while I did review its work, I’m now finding edge cases and bugs of the kind that I’d never have expected a human to introduce. Obviously it’s on me to better review its output, but the perceived gains of just throwing a quick bug ticket at the ai quickly disappear when you want to have a scalable project.
I hear you, but it seems quicker to predict whether the agent's solution is correct/sound before running it than to compose and "start" coding yourself. Understanding something that's already there seems like less effort. But I guess it highly depends on what you are doing and its level of complexity and how much you're offloading your authority and judgment.
I was careful to say "Good code still has a cost" and "delivering good code remains significantly more expensive than [free]" rather than the more aesthetically pleasing "Good code is expensive.
I chose this words because I don't think good code is nearly as expensive with coding agents as it was without them.
You still have to actively work to get good code, but it takes so much less time when you have a coding agent who can do the fine-grained edits on your behalf.
I firmly believe that agentic engineering should produce better code. If you are moving faster but getting worse results it's worth stopping and examining if there are processes you could fix.
Totally agreed. I’ve been reverse engineering Altium’s file format to enable agents to vibe-engineer electronics and though I’m on my third from scratch rewrite in as many weeks, each iteration improves significantly in quality as the previous version helps me to explore the problem space and instruct the agent on how to do red/green development [1]. Each iteration is tens of thousands of lines of code which would have been impossible to write so fast before so it’s been quite a change in perspective, treating so much code as throw away experimentation.
I’m using a combination of 100s of megabytes of Ghidra decompiled delphi DLLs and millions of lines of decompiled C# code to do this reverse engineering. I can’t imagine even trying such a large project for LLMs so while a good implementation is still taking a lot of time, it’s definitely a lot cheaper than before.
[1] I saw your red/green TDD article/book chapter and I don’t think you go far enough. Since we have agents, you can generalize red/green development to a lot of things that would be impractical to implement in tests. For example I have agents analyze binary diffs of the file format to figure out where my implementation is incorrect without being bogged down by irrelevant details like the order or encoding of parameters. This guides the agent loop instead of tests.
> I was careful to say "Good code still has a cost" and "delivering good code remains significantly more expensive than [free]" rather than the more aesthetically pleasing "Good code is expensive.
Which is nuance that will get overlooked or waved away by upper management who see the cost of hiring developers, know that developers "write code", and can compare the developer salary with a Claude/Codex/whatever subscription. If the correction comes, it will be late and at the expense of rank and file, as usual. (And don't be naive: if an LLM subscription can let you employ fewer developers, that subscription plus offshore developers will enable even more cost saving. The name of the game is cost saving, and has been for a long time.)
Code is cheaper. Simple code is cheap. More complex code may not be cheaper.
The reason you pay attention to details is because complexity compounds and the cheapest cleanup is when you write something, not when it breaks.
This last part is still not fully fleshed out.
For now. Is there any reason to not expect things to improve further?
Regardless, a lot of code is cheap now and building products is fun regardless, but I doubt this will translate into more than very short-term benefits. When you lower the bar you get 10x more stuff, 10x more noise, etc. You lower it more you get 100x and so on.
I think the cost and work remains the same. What has change is efficiency. Previously people had to manually program byte after byte. Then came C and streamlined it, allowing faster development.
With python I can write a simple debugging UI server with a few lines.
There are frameworks that allow me to complete certain tasks in hours.
You do not need to program everything from scratch.
The more code, the faster everything gets, since the job is mostly done.
We are accelerating, but we still work 9 to 5 jobs.
C, Python, and frameworks don't generate all-new code for every task: you're taking advantage of stuff that's thoroughly tested. That simple debugging UI server is probably using some well-tested libraries, which you can reasonably trust to be bug-free (and which can be updated later to fix any bugs, without breaking your code that relies on them). With AI-generated code, this isn't the case.
Definitely the market incentives for "good code" have never been worse, but I'm wouldn't be so sure the cost of migrating decent pieces of generated code to good code is worse than writing good code from whole cloth.
I find that implementing a sound solution from scratch is generally lower effort than taking something that already exists and making it sound.
The former: 1) understand the problem, 2) solve the problem.
The latter: 1) understand the problem, 2) solve the problem, 3) understand how somebody or something else understood & solved the problem, 4) diff those two, 5) plan a transition from that solution to this solution, 6) implement that transition (ideally without unplanned downtime and/or catastrophic loss of data).
This is also why I’m not a fan of code reviews. Code review is basically steps 1–4 from the second approach, plus having to verbally explain the diff, every time.
That's specious reasoning. Code reviews are a safeguard against cowboy coding, and a tool to enforce shared code ownership. You might believe you know better than most of your team members, but odds are a fresh pair of eyes can easily catch issues you snuck in your code that you couldn't catch due to things like PR tunnel vision.
And if your PR is sound, you certainly don't have a problem explaining what you did and why you did it.
Code reviews have their place. I just personally don’t like being the reviewer, because it’s more effort on your part than just writing the damn thing from scratch while someone else gets the credit for the result[0]. Of course, having multiple pairs of eyes on the code and multiple people who understand it is crucial.
[0] Reviews are OK if I enjoy working with the person whose work I’m reviewing and I feel like I’m helping them grow.
Every modern (and not so modern) software development method hinge on one thing: requirements are not known and even if known they'll change over time. From this you get the goal of "good" code which is "easy to change code".
Do current LLM based agents generate code which is easy to change? My gut feeling is a no at the moment. Until they do I'd argue code generated from agents is only good for prototypes. Once you can ask your agent to change a feature and be 100% sure they won't break other features then you don't care about how the code looks like.
All the hype is on how fast it is to produce code. But the actual bottleneck has always been the cost of specifying intent clearly enough that the result is changeable, testable, and correct AND that you build something that brings value.
I'd add in "code is easier to write than it is to read" - hence abstraction layers designed to present us with higher level code, hiding the complex implementations.
But LLMs are both really good at writing code _and_ reading code. However, they're not great at knowing when to stop - either finishing early and leaving stuff broken, over-engineering and adding in stuff that's not needed or deciding it's too hard and just removing stuff it deems unimportant.
I've found a TDD approach (with not just unit tests but high-level end-to-end behaviour-driven tests) works really well with them. I give them a high-level feature specification (remember Gherkin specifications?) and tell it to make that pass (with unit tests for any intermediate code it writes), make sure it hasn't broken anything (by running the other high-level tests) then, finally, refactor. I've also just started telling it to generate screenshots for each step in the feature, so I can quickly evaluate the UI flow (inspired by Simon Willison's Rodney tool).
Now I don't actually need to care if the code is easy to read or easy to change - because the LLM handles the details. I just need to make sure that when it says "I have implemented Feature X" that the steps it has written for that feature actually do what is expected and the UI fits the user's needs.
> Once you can ask your agent to change a feature and be 100% sure they won't break other features then you don't care about how the code looks like.
That bar is unreasonably high.
Right now, if I ask a senior engineer to change a feature in a mature codebase, I only have perhaps 70% certainty they won't break other features. Tests help, but only so far.
But if push come to shove any other engineer can come in and debug your senior engineer code. That's why we insist on people creating easy to change code.
With auto generated code which almost no one will check or debug by hand, you want at least compiler level exactitude. Then changing "the code" is as easy as asking your code generator for new things. If people have to debug its output, then it does not help in making maintainable software unless it also generates "good" code.
This bar only seems high because the bar in most companies is already unreasonably low. We had decades of research into functional programming, formal methods and specification languages. However, code monkey culture was cheaper and much more readily available. Enterprise software development has always been a race to the bottom, and the excitement for "vibe coding" is just the latest manifestation of its careless, thoughtless approach to programming.
I am constantly getting LLMs to change features and fix bugs. The key is to micromanage the LLM and its context, and read the changes. It's slower that vibe coding but faster than coding by hand, and it results in working, maintainable software.
We won't be able to be sure of 100% with LLMs but maybe proper engineering around evals get us to an acceptable level of quality based on the blast radius/safety profile.
I'd also argue that we should be pushing towards tracer bullets as a development concept and less so prototypes that are nice but meant to be thrown away and people might not do that.
The clean room auto porting, after a messy exploratory prototyping session would be a nice pattern, nonetheless.
Code generation is cheap in the same way talk is cheap.
Every human can string words together, but there's a world of difference between words that raise $100M and words that get you slapped in the face.
The raw material was always cheap. The skill is turning it into something useful. Agentic engineering is just the latest version of that. The new skill is mastering the craft of directing cheap inputs toward valuable outcomes.
> The new skill is mastering the craft of directing cheap inputs toward valuable outcomes.
Strongly agree with this. It took me awhile to realize that "agentic engineering" wasn't about writing software it was about being able to very quickly iterate on bespoke tools for solving a very specific problem you have.
However, as soon as you start unblocking yourself from the real problem you want to solve, the agentic engineering part is no longer interesting. It's great to be solving a problem and then realize you could improve it very quickly with a quick request to an agent, but you should largely be focused on solving the problem.
Yet I see so many people talking about running multiple agents and just building something without much effort spent using that thing, as though the agentic code itself is where the value lies. I suspect this is a hangover from decades where software was valuable (we still have plenty of highly valued, unprofitable software companies as a testament to this).
I'm reminded a bit of Alan Watts' famous quote in regards to psychedelics:
> If you get the message, hang up the phone.
If you're really leveraging AI to do something unique and potentially quite disruptive, very quickly the "AI" part should become fairly uninteresting and not the focus of your attention.
That's a great insight about iterating on bespoke tools. I have seen the most speed up when diving into new tools, or making new tools as AI can make the initial jump quite painless, and I can get straight to the problem solving. But I get barely any speedup using it on legacy projects in tools I know well. Often enough it slows me down so net benefit is nil or worse.
Another commentor said it makes the easy part easy, and the hard part harder, which I resonate with at the moment.
I am pretty excited by being able to jump deep into real problems without code being the biggest bottleneck. I love coding but I love solving problems more, and coding for fun is very different to coding for outcomes.
That's my observation / fear as well. It makes delivering something that sort of works easy. It makes doing that well more difficult by obscuring the problem domain from the humans and expanding the standard library of tools into patterns of using said standard library. Hope they're correct for your use case.
There's also the question of the true cost of all the hardware, electricity, and potential output that's being tossed onto the pyres. We aren't getting the real Cortana from the books / games; we're getting GIR trained on the corpus of fallible human code, prompted by fallible humans.
It's funny that so many people are using AI and still hasn't really shown up in productivity numbers or product quality yet. I'm going to be really confused if this is still the case at the end of the year. A whole year of access to these latest agentic models has to produce visible economic changes or something is wrong.
My intuition from talking to people across different parts of the industry, is that adoption at bigger companies is really limited or slow, or totally banned. Additionally some developers are not seeing it help their specific roles all that much anyway. This is hard to level with success other people are having, but software is a super broad discipline which I think explains a lot of the mixed success stories.
It seems to depend a lot on the industry and niche you're in, working at an agency I get experience across many different projects and industries and sometimes you are just at the edge of AIs training and it can get very unhelpful. Noting many if not most companies are working on proprietary code in donain specific problems, that isn't all that surprising either.
>funny that so many people are using AI and still hasn't really shown up in productivity numbers or product quality yet.
That's because the threat is now not other businesses, but your own users who decide to vibe-code their own "Claw" product instead of using your company's vibeslop, so there are no buyers for your single-week product. All these new harness developers are engaging in resume-driven development to save their own asses. The only ones that are not naked when the tide recedes are the ones that are able to jump to the next layer of abstraction on the infinite staircase, until the next tide comes five seconds later.
I used to think this was a sign that AI code isn't really useful, but I've changed my tune (also I believe these numbers have changed in the last few months).
As an example: One of my most promising projects I was discussing with a friend and we realized together we could potentially use these tools to build a two person agency with no need to hire anyone ever. If this were to work, could theoretically make nice revenue and it shouldn't show up in any metric anywhere.
Additionally I've heard of countless teams cancelling their contracts with outsourced engineers because cheap but bad coders in India are worse that an LLM and still cost more. I'm not sure if there's a number around this activity, but again, these type of changes don't show up in the usual places.
My current belief is not that AI will replace traditional software engineering it will replace a good chunk of the entire model of software.
>One of my most promising projects I was discussing with a friend and we realized together we could potentially use these tools to build a two person agency with no need to hire anyone ever...My current belief is not that AI will replace traditional software engineering it will replace a good chunk of the entire model of software
You're not following your last line to its logical conclusion regarding your own prospects: no one is going to buy the vibeslop your two person agency is selling because they'd rather create and maintain their own vibeslop instead of dealing with yours.
If you follow some of your thoughts to their logical conclusion you'll realize the parent is right: there will be limited productivity that ends up fueling the economy when nobody is buying each other's vibeslop.
We're not selling vibe slop, the "vibe slop" tools which work for one person enable of automation of tasks for the services we sell. Whether or not we use AI behind the scenes is entirely irrelevant to the service we're providing other than that it allows our margins to be higher and our speed of implementation to be faster.
I absolutely agree that it's not logical to think "oh we'll sell our AI stuff", that's the old model (which is just a variation on SaaS). I suspect a lot of HNers can't imagine a "product" that isn't code, but that's not at all what I'm describing.
The products that most people on HN have traditionally built are used by other companies to make money by allowing those processes to be scaled. AI, in many new cases, eliminates the need for a 'software' middle man. The case I'm describing is "I know how to make money doing X if only I could scale it up with out hiring people" and my offering is "I can scale it up without hiring people".
This is increasingly where I think the future of work is headed, and it's more than fine if you aren't convinced.
> it allows our margins to be higher and our speed of implementation to be faster
Faster than what? You will be faster than your previous self, just like all of your competitors. Where’s the net gain here? Even if you somehow managed to capture more value for yourself, you’ve stopped providing value to 5-10x that many employees who are no longer employed.
When costs approach zero on a large scale, margins do not increase. Low costs = you’re not paying anyone = your competitors aren’t paying anyone = your customers no longer have money = your revenue follows your costs straight to zero.
Companies that provide physical services can’t scale without hiring. A one-man “crew” isn’t putting a roof on a data center.
I want to be wrong. Tell me why you think any of this is wrong.
Correct me, but if two people create a SAAS that can replace a 50 people SAAS, compete on price and the competitor is forced out of the market, wouldn’t this show up as an reduction in GDP? Efficiency (GDP/time_worked) should be up though, and AFAIK it isn’t.
>One of my most promising projects I was discussing with a friend and we realized together we could potentially use these tools to build a two person agency with no need to hire anyone ever. If this were to work, could theoretically make nice revenue and it shouldn't show up in any metric anywhere.
potentially...if this were to work...theoretically
shouldn't show up? I would worry that something with so many variables wouldn't show up.
This is actually an old syndrome with technology. It takes a longt ime for the effect to be reliably measured. Famously, it took many years for the internet itself to show up in significant productivity gains (if the internet is actually useful why don't the numbers show that? - a common comment in the 1990s and 2000s). So it seems to me we're just the usual dynamic here. Productivity in trillion-dollar economies do not turn on a dime
>Famously, it took many years for the internet itself to show up in significant productivity gains
Yeah but the actual productivity gains that the internet and software tools introduced has had diminishing returns after a while.
Like, are people more productive today when they use Outlook and Slack than they were 20 years ago when using IBM Lotus Notes and IBM Sametime? I'm not. Are people more productive with the Excel of today than with Excel 2003/2007? I'm not. Is Windows 11 and MacOS Tahoe making people more productive than Windows 7 and Snow Leopard? Not me. Are IDEs of today offering so much more productivity boost than what Visual Studio, CodeWarrior and Borland Delphi did back in the day? Don't think so.
To me it seems that at least on the productivity side, we've mostly been reinventing the wheel "but in Rust/Electron" for the last 15 or so years, and the biggest productivity gains came IMHO from increased compute power due to semiconductor advancement, so that the same tasks finished faster today than 20 years ago, but not that the SW or the internet got so much more capable since then.
I think if you're doing front-end development AI is good. If you are reading a db and sending a json to said webpage AI is decent, if you are doing literally anything else AI is next to useless.
I wouldn't say it hasn't shown up. The number of ShowHN's per weekend has definitely gone up, and while that isn't rigorous scientific proof, I'd consider is a leading edge indicator of something. Unfortunately, we as an industry have yet to agree on anything approaching a scientific measure of productivity, other than to collectively agree that Lines of Code is universally agree that LoC is terrible. Thus even if someone was able to quantify that, say, they're having days where they generate 5000 LoC when previously they were getting O(500) LoC, that's not something we could agree upon as improved productivity.
So then the question is, lis there anything other than feels to say productive has or has not gone up? What would we accept as actual evidence one way or another? Commits-per-day is similarly not a good measure either. Jira tickets and tshirts sizes? We don't have a good measure, so while ShowHN's per weekend is equally dumb, it's also equally good in the bag of lies, damn lies, and statistics.
There was a post a few days ago about how the quality of SnowHN had gone down with people asking how they could block this category of submissions - so I wouldn't be too quick to equate an increase in ShowHN with anything positive.
Or another way of looking at it: just because digging a ditch became cheap and fast with the backhoe doesn't mean you can just dig a bunch of ditches and become rich.
Indeed: The act of actually typing the code into an editor was never the hard or valuable part of software engineering. The value comes from being able to design applications that work well, with reasonable performance and security properties.
It wasn't the hard or valuable part of software engineering, but it was a very time-consuming part. That's what's interesting about this new era - the time-consuming-but-easy bit has suddenly stopped being time-consuming.
Because designing systems that work well is difficult. It takes years of experience to develop the muscle memory behind quality systems architecture. Writing the code is an implementation detail (albeit a large one).
Because coding bootcamps and CS programs were churning out squillions of people who could type the code but had poor design and analytical skills, because there was a time where being able to implement Dijkstra on a whiteboard would get you 400k at a FAANG.
Bootcamp grads are basically obsolete now. The real skill has always been the ability to make good design decisions and that's still the case in the LLM era.
I beg to differ. I know for a fact that some companies started hiring people with LLM experience, whose only expertise is spending all Copilot enterprise account tokens on their first week at the job and proceed to whine that the lack of tokens was stifling their creativity.
Say what you may about boot camps, but at least the people getting hired could do things and understand what they are doing.
I mean, juicero got the money instead of the slaps in the face it deserved. And there's thousands of startup like that. I think VCs are terrible at picking and a dice would probably do a better job.
A raise is random noise, not signal, based a confidence game within the VC ecosystem. LP capital call->GP gamble based on waves arms around considering VC underperforms as an asset [1] [2] class even when accounting for the grand slam returns. It's 0DTE options gambling dressed up as skill and an art. But, you know [3] [4] [5], lottery still pays out sometimes.
I think we’re falling into a trap of overestimating the value of incrementally directing it. The output is all coming from the same brain so what stops someone just getting lucky with a prompt and generation that one-shots the whole thing you spent time breaking down and thinking about. The code quality will be the same, and unless you’re directing it to the point where you may as well be coding the old way, the decision-making is the same too.
I basically fully agree with this. I am not sure how to handle the ramifications of this in my day to day work yet. But at least one habit I have been forming is sometimes I find that even though the cost of writing code is immensely cheap, reviewing and validating that it works in certain code bases (like the millions of line mono repo I work in at my job) is extremely high. I try to think through, and improve, our testability such that a few hundred line of code change that modifies the DB really can be a couple of hours of work.
Also, I do want to note that these little "Here is how I see the world of SWE given current model capabilities and tooling" posts are MUCH appreciated, given how much you follow the landscape. When a major hype wave is happening and I feel like I am getting drowned on twitter, I tend to wonder "What would Simon say about this?"
> I find that even though the cost of writing code is immensely cheap, reviewing and validating that it works in certain code bases (like the millions of line mono repo I work in at my job) is extremely high.
That is my observation as well. Churning code is easy, but making sure the code is not total crap is a completely new challenge and concern.
It's not like prior to LLMs code reviews didn't required work. Far from it. It's just that how the code is generated in a completely different way, and in some cases with barely any oversight from vibecoders who are trying to punch way above their weight. So they generate these massive volumes of changes that fail in obvious and subtle ways, and the flow is relentless.
> What tremendously helps is asking the LLM to add a lot a lot explanations by adding comments to each and every line or function.
No, it doesn't. It's completely useless and unhelpful. These machine-generated comments are only realizations of the context that already outputted crap. Dumping volumes of this output adds more work to reviewers to parse through to figure out the mess presented by vibecoders who didn't even bothered to check what they are generating.
The cost of code never lived in the typing — it lived in the intent, the constraints, and the reasoning that shaped it.
LLMs make the typing cheap, but they don’t make the reasoning cheap.
So the economics shift, but the bottleneck doesn’t disappear.
For most non-hobby project, the cost of code was in breaking a working system (whether by a bona fide bug, or a change in some unspecified implicit assumption). That made changes to code incredibly expensive - often much more than the original implementation.
It sounds harsh, but over the lifetime of a project, 10-lines/person/day is often a high estimate of the number of lines produced. It’s not because humans type so slow - it is because after a while, it’s all about changing previously written lines in ways that don’t break things.
LLMs are much better at that than humans, if the constraints and tests are reasonably well specified.
The human in that case is not "so slow", but at the current state it is slower than an LLM as simple as that.
The difference comes in confidence that the solution works and can be maintained in the future, but in terms of purely making the decisions and applying the changes an LLM is faster when it has all the required infos available
Because humans need to type with a keyboard, then click around with a mouse.
In that time the LLM has made a change, ran tests, committed, pushed, checked that the CI build failed, looked at the CI logs, fixed the issue and the PR is now passing.
It doesn't disappear but it does easy up some instances of fighting configuration, documentation, syntax or even comparing three approaches that are similar but you don't know their full effect.
I think it's a very fun space, finally being able to empower many people who in the past wouidve been bottlenecked unless they were using very simple tools for their domain and upskilled enough. Those 2 things will still be true, but the speed at which some things can happen at the exploration and other layers has seen a significant speedup.
Other problems like entropy/slop, security, system testing, lack of automation fundamentals arise but it's a good problem to start tackling.
I'm very focused on evals [1] because is what allows me to not to be the bottleneck with economists who I want to empower to code end to end and I'd like that mental shift to happen for anyone becoming a builder so non traditional developers and developers by trade have a common language for product building [2]. That part of speaking to different audiences and combating hype that promises to do everything for you Vs the intent that's actually needed is hard, but trying gets you to advance quite a lot.
> LLMs make the typing cheap, but they don’t make the reasoning cheap.
LLMs lower the cost of copy/pasting code around, or troubleshooting issues using standard error messages.
Instead of going through Stack Overflow to find how to use a framework to do some specific thing, you prompt a model. You don't even need to know a thing about the language you are using to leverage a feedback loop.
LLMs lower the cost of a multitude of drudge work in developing software, such as having to read the docs to learn how a framework should be used to achieve a goal. You still need to know what you are doing, but you don't need to reinvent the wheel.
I'm going to shill my own writing here [1] but I think it addresses this post in a different way. Because we can now write code so much faster and quicker, everything downstream from that is just not ready for it. Right now we might have to slow down, but medium and long term we need to figure out how to build systems in a way that it can keep up with this increased influx of code.
> The challenge is to develop new personal and organizational habits that respond to the affordances and opportunities of agentic engineering.
I don't think it's the habits that need to change, it's everything. From how accountability works, to how code needs to be structured, to how languages should work. If we want to keep shipping at this speed, no stone can be left unturned.
The focus is on downstream, but is upstream ready for this speed up?
The linked blog post draws comparisons to the industrial revolution however in the industrial revolution the speed up caused innovation upstream not downstream.
The first innovation was mechanical weaving. The bottleneck was then yarn. This was automated so the bottleneck became cotton production, which was then mechanised.
So perhaps the real bottleneck of being able to write code faster is upstream.
Can requirements of what to build keep up with pace to deliver it?
Nice to see you here (Just reached out on bluesky over sandboxing - gandolin). I follow your work and agree and am hoping that you and others who have well earned audiences based on awesome open source work, can help with the advocacy on mental shifts, not just for developers but also non Devs that become builders.
I'm very focused on their minimalistic building experience as a way to make me and other traditional developers, not the bottleneck and empowering them end to end.
I think AI evals [1] are a big part of that route and hope that different disciplines can finally have probable product design stories [2] instead of there being big gaps of understanding between them.
I don’t think we can expect all workers at all companies to just adopt a new way of working. That’s not how competition works.
If agentic AI is a good idea and if it increases productivity we should expect to see some startup blowing everyone out of the water. I think we should be seeing it now if it makes you say ten times more productive. A lot of startups have had a year of agentic AI now to help them beat their competitors.
We're already seeing eye-watering, blistering growth from the new hot applied AI startups and labs
Imo the wave of top down 'AI mandates' from incumbent companies is a direct result of the competitive pressure, although it probably wont work as well as the execs think it will
that being said even Dario claims a 5-20% speedup from coding agents, 10x productivity only exists in microcosm prototypes, or if someone was so unskilled oneshotting a localhost web app is a 10x for them
Only a personal anecdote, but the humans I know that have used it are all aware of how buggy it is. It feels like it was made in 2 weeks.
Which gets back to the outsourcing argument: it’s always been cheap to make buggy code. If we were able to solve this, outsourcing would have been ubiquitous. Maybe LLMs change the calculus here too?
That's certainly a good example of a tool developed quickly thanks to AI assistance.
But coding assistance tools must themselves be evaluated by what they produce. We won't see significant economic growth through using AI tools to build other AI tools recursively unless the there are companies using these tools to make enough money to justify the whole stack.
I believe there are teams out there producing software that people are willing to pay for faster than they did before. But if we were on the verge of rapid economic growth, I would expect HN commenters to be able to rattle these off by the dozen.
AI has been a lifesaver for my low performing coworkers. They’re still heavily reliant on reviews, but their output is up. One of the lowest output guys I ever worked with is a massive LinkedIn LLM promoter.
Not sure how long it’ll last though. With the time I spend on reviews I could have done it myself, so if they don’t start learning…
OpenClaw is not going to be a thing in 6 months. The core idea might exist but that codebase is built on a house of cards and is being replicated in 10% of the code.
I don’t think anyone is arguing against code agents being good at prototypes, which is a great feat, but most SWE work is built on maintaining code over time.
But that only gets you to a philosophical argument about what "value" is. Many would argue that being able to get your thing into a Super Bowl commercial is extremely valuable. I definitely have never built anything that did.
It's very much imperfect, but the only consistently agreed upon and useful definition of "value" we have in the West is monetary value, and in that sense, we have at least a few major examples of AI generating value rapidly.
One of the most interesting aspects is when LLMs are cheap and small enough so that apps can ship with a builtin one so that it can adjust code for each user based on input/usage patterns.
The clear intent is to stop allowing regular people to be able to compute...anything. Instead, you'll be given a screen that only connects to $LLM_SERVER and the only interface will be voice/text in which you ask it to do things. It then does those things non-deterministically, and slower than they would be done right now. But at least you won't have control over how it works!
Weather or not the intent is as nefarious as you suggest, that type of UI is going to be a boon for a lot of people. Most people on the planet are incredibly computer illiterate.
If this could ever happen, there will be no point in GUI apps anymore, your AI assistant or what have you will just interact with everything on your behalf and/or present you with some kind of master interface for everything.
I don't see a bunch of small agents in the future, instead just one per device or user. Maybe there will be a fleeting moment for GUI/local apps to tie into some local, OS LLM library (or some kind of WebLLM spec) to leverage this local agent in your app.
>If this could ever happen, there will be no point in GUI apps anymore, your AI assistant or what have you will just interact with everything on your behalf and/or present you with some kind of master interface for everything.
sort of how the hammer is the most useful tool ever and all we have to do is to make every thing that needs doing look like a nail.
Agents will still have to communicate with each other, the communication protocols, how data is stored, presented and queried will be important for us to decide?
Will we stop using web browsers as we understand them today in the next few decades in favor of only interacting with agents? Maybe.
I've heard this referenced multiple times and I have yet to hear the value be clearly articulated.
Are you saying that every user would eventually be using a different app? Wouldn't it eventually get to the point that negates the need for the app developer anyways since you would eventually be unable to offer any kind of support, or are we just talking design changing while the actual functionality stays the same? How would something like this actually behave in reality?
These are valid points, taken to the extreme we will have apps that cannot be supported.
In short term, we already have SQL/reports being automated. Lovable etc is experimenting with generating user interfaces from prompts, soon we will have complete working apps from a prompt. Why not have one core that you can expand via a prompt?
I am currently studying and depending heavily on Anki, its been amazing to use Claude Code to add new functionality on the fly. Its a holy mess of inconsistent/broken UX but it so clearly gives me value over the core version. Sometimes it breaks, but CC can usually fix it within a prompt or two.
>but medium and long term we need to figure out how to build systems in a way that it can keep up with this increased influx of code.
Why? Why do we need to "write code so much faster and quicker" to the point we saturate systems downstream? I understand that we can, but just because we can, does'nt mean we should.
But that's point of TFA, no? Now that writing code is no longer the bottleneck, the upstream and downstream processes have become the new bottlenecks, and we need to figure out how to widen them.
As I see it, the end goal for all of this is generating software at the speed of thought, or at least at the speed of speech. I want the digital butler to whom I could just say - "I'm not happy with the way things happened to day, please change it so that from here on, it'll be like x" - and it'll just respond with "As you wish", and I'll have confidence that it knows me well enough and is capable enough to have actually implemented the best possible interpretation of what I asked for, and that the few miscommunications that do occur would be easy to fix.
We're obviously not close that yet, but why shouldn't we build towards it?
If we want to continue to ship at that speed we will have to. I’m not sure if we should, but seemingly we are. And it causes a lot of problems right now downstream.
We were already rushing and churning products and code of inferior quality before AI (let's e.g. consider the sorry state of macOS and Windows in the past decade).
Using AI to ship more and more code faster, instead of to make code more mature, will make this worse.
Less code isn't as important as it used to be, because the cost of maintaining (simple) code has gone down as well.
With coding agent projects I find that investing in DRY doesn't really help very much. Needing to apply the same fix in two places is a waste of time as a human. An agent will spot both places with grep and update them almost as fast as if there was just one.
It's another case where my existing programming instincts appear to not hold as well as I would expect them to.
When you talk about maintaining code, do you mean having the LLM do it and you maintain a write-only codebase? Because if you're reading the code yourself and you have a bloated tangled codebase it would make things much harder right?
Is the goal basically a codebase where your interactions are mediated through an LLM?
I'm betting on it meaning the product quality going down - and technical debt increasing, which will be dealt with more AI in a downward spiral. Meanwhile college CS majors wont ever bother learning the basics (as AI will handle their coursework, and even their hobby work). Then future AI will train on previous AI output, with the degredation that brings...
I was having this conversation at work, where if the promise of AI coding becomes true and we see it in delivery speed, we would need to significantly increase the throughput of all other aspects of the business.
Totally agree - that's what I was trying to get at with "organizational habits". The way we plan, organize and deliver software projects is going to radically change.
I'm not ready to write about how radically though because I don't know myself!
The linked article is worth reading alongside this one.
The thing I'd add from running agents in actual production (not demos, but workflows executing unattended for weeks): the hard part isn't code volume or token cost. It's state continuity.
Agents hallucinate their own history. Past ~50-60 turns in a long-running loop, even with large context windows, they start underweighting earlier information and re-solving already-solved problems. File-based memory with explicit retrieval ends up being more reliable than in-context stuffing - less elegant but more predictable across longer runs.
Second hard part: failure isolation. If an agent workflow errors at step 7 of 12, you want to resume from step 6, not restart from zero. Most frameworks treat this as an afterthought. Checkpoint-and-resume with idempotent steps is dramatically more operationally stable.
Agree it's not just habits - the infrastructure mental model has to change too. You're not writing programs so much as engineering reliability scaffolding around code that gets regenerated anyway.
I don't agree that the code is cheap.
It doesn't require a pipeline of people to be trained and that is huge, but it's not cheap.
Tokens are expensive.
We don't know what the actual cost is yet.
We have startups, who aren't turning a profit, buying up all the capacity of the supply chain.
There are so many impacts here that we don't have the data on.
You completely ignored the post you're replying to.
To recap, the author disagrees that writing code is cheap, because we've collectively invested trillions of dollars and redirected entire supply chains into automating code generation. The externalities will be paid for generations to come by all of humanity; it's just not reflected in your Claude subscription.
GP is not totally ignoring the post he replied to: we have models that are basically 6-months behind closed SOTA models and that we can run in the cloud and we fully know how much these costs to run.
The cat is out of the bag: compute shall keep getting cheaper as it's always been since 60 years or something.
It's always been maintenance that's been the killer and GP is totally right about that.
And if we look at a company like Cloudflare who basically didn't have any serious outage for five years then had five serious outages in six months since they drank the AI kool-aid, we kinda have a first data point on how amazing AI is from a maintenance point of view.
We all know we're generating more lines of underperforming, insecure, probably buggy, code than ever before.
Maintaining it is becoming more costly. The increasing burden of review on FOSS maintainers is one example. AWS going down because an agent decided to re-write a piece of critical infrastructure is another. We are rapidly creating new kinds of liability.
unlikely, FOSS is mostly driven by zero-cost maintenance but AI tools needs money to burn. So only few FOSS project will receive sponsored tools and some definitely reject to use by ideological reasons (for example it could be considered as poison pill from copyright perspective).
We kind of do? Local models (thought no state of the art) set a floor on this.
Even if prices are subsidized now (they are) that doesn't mean they will be more expensive later. e.g. if there's some bubble deflation then hardware, electricity, and talent could all get cheaper.
I like using the analogy of 'living in a small apartment' when building systems with a small team. You need to choose carefully what furniture you can fit into your apartment, and that choice depends a lot on how you live your live. Do you want a large table to host friends, or a comfortable couch to fall asleep on in front of the TV? If you get both the space will probably be cluttered.
The same applies to a small software project - you need to choose what features you can fit. And while the cost of building is part of the consideration, I'd say most of it is about the cost of maintaining features, not only in code, but also in product coherence and other incidental 'costs' like documentation and user support.
Be careful of building too many features and ending up being overwhelmed by the maintenance, or worse, diluting the product's value to a point where you loose users.
Dollars to donuts that at some point someone is going to discover that senior engineers spend just as much time reviewing, fixing, and dealing with blowups caused by, shitty AI-generated code produced by more junior coders....as they did providing various forms of mentoring of said junior coders, except those junior coders become better developers in the latter case, whereas the AI generates the same shitty results or even worse, inconsistent quality code.
Yeah, it’s odd watching the outsourcing debate play out again. The results are gonna be the same.
Which is a shame, cause I think LLMs have a lot more use for software dev than writing code. And that’s really what’s going to shift the industry - not just the part willing to cut on quality.
The interesting thing nobody's talking about here is that cheap code generation actually makes throwaway prototypes viable. Before, you'd agonize over architecture because rewriting was expensive. Now you can build three different approaches in a day and pick the one that works.
The real cost was never the code itself. It was the decision-making around what to build. That hasn't gotten cheaper at all.
I think the prototype thing is absolutely true but breaks down like all prototypes at the level of collaborating, sharing and evolving while handling entropy throug simplicity UNLESS you know what you're doing or the agent steers you with very opinionated tooling customized to your context. I'm thinking about empowering people to be builders and less so a software developer who can make the right tradeoffs.
Empowering people to work Tracer bullet style after they've selected their prototype of choice and thrown it away might be a powerful pattern that actually gets us into a nice collaborative space.
This feels to me like peak sfba mentality on par with "move fast and break things". Outside of trying to create a unicorn, is this really how people create things?
It seems to me that in order to obtain the ability to build things that other people like, you need to go through the process of creating things they won't. Like a painter needs to paint a bunch of crappy paintings to learn how to create a good painting. If you have the LLM create these throwaway prototypes, how will you even know when you come across a good idea and how will you be able to build it.
> It seems to me that in order to obtain the ability to build things that other people like, you need to go through the process of creating things they won't.
Okay, granted. What does that have to do with how the code is written? Do people generally care if a web app is running from nicely formatted JS or minified JS? Is a product manager not getting better at building things people like because they're not iterating on the code themselves?
Without agreeing or disagreeing with the premise, I think a relevant metaphor* here is that the painter can practice and iterate and go from creating crappy paintings to creating good paintings, without needing to make their own paint and canvas and brushes. If they're particular, they can have their assistant go to the supply shop and get just the right things they want, with increasing specificity as needed, but they don't need to manufacture them by hand.
* Like most metaphors, it's not perfect; please try to understand the intent.
But you need to actually be the one doing the iterating, you can't outsource it. The entire point to doing the iteration is the process, not the artefacts.
Hmm interesting, I didn't realise people were using it as a typing replacement instead of having it work agentically. Does that mean when you want to change a line of code somewhere, you just prompt the LLM to replace line 334 with your changes etc? So do you not use the LLM autonomously at all then? Sounds like it since you're still doing the iteration yourself.
It's like the allegory of the retired consultant's $5000 invoice (hitting the thing with a hammer: $5, knowing where to hit it: $4995).
Yeah, coding is cheaper now, but knowing what to code has always been the more expensive piece. I think AI will be able to help there eventually, but it's not as far along on that vector yet.
Possibly even more important than knowing where to hit it (what to code), is knowing where not to hit it (what not to code). Hitting the thing in the wrong place can lead to catastrophe. Making a code change you don't need can blow up production or paint your architecture into a corner.
AIs so far seem to prefer addition by addition, not addition by subtraction or addition by saying "are you sure?".
This doesn't mean that "code is cheap" is bad. Rather, it means that soon our primary role will be to guide AIs to produce a high proportion of "code that was cheap", while being able to quickly distinguish, prevent, and reject "cheap code".
I am pushing for this judicious Eval [1] Driven Development where tech/non tech users use intent when coding to minimally design some known aspects and try to build simply and using common standards across the team (that should be in the context of the agent) to produce the minimal amount of human readable and clean code that would do the job. The more the building blocks they work with are simple, easy to validate and rely on the proper unit tests, integration tests, data tests the more chance that things can be "one shotted".
One huge barrier is fighting entropy. You should be wary of prototypes which create false expectations and don't help product evolution whereas tracer bullets [2] might be better if you want to quickly show something and adjust.
Testing and testability are concepts that aren't intuitive or easy until you develop a feel for them so we should be preaching feeling that pain and moving slowly and with intent and working minimally [3] when you actually want to share or maintain your coding artifact. There should be no difference between judicious human and computer code. Don't suddenly start putting What instead of why in comments or repeating everything.
Helping non tech people become builders or sharers is a challenge beyond "vibe coding" and the agent skills [4] space is fascinating for that. Like most things AI (LLM), UX matters more than almost anything else.
1. The time spent to think and iteratively understand what you want to build
2. The time spent to spell out how you want to build it
The cost for #2 is nearly zero now. The cost for #1 too is slashed substantially because instead of thinking in abstract terms or writing tests you can build a version of the thing and then ground your reasoning in that implementation and iterate until you attain the right functionality.
However, once that thing is complex enough you still need to burn time on identifying the boundaries of the various components and their interplay. There is no gain from building "a browser" and then iterating on the whole thing until it becomes "the browser". You'll be up against combinatorial complexity. You can perhaps deal with that complexity if you have a way to validate every tiny detail, which some are doing very well in porting software for example.
Code you can’t just throw away is a liability because you have to keep supporting it / servicing it. Claude Code and friends also change that part of the cost equation:
You might not get gcc/llvm level optimization from a newly built compiler - but if you had a home-built one, which took $15,000/month engineer to support (for years!) you can now get a new one for $20,000 every 3 months, for a 50% cost saving, each time changing your requirements (which you couldn’t do before).
Code used to be a liability, like a car or an apartment for the average person. Now it’s a liability, like a car or apartment for Bill Gates.
I would normally agree, but I think the "code is a liability" quote assumes that humans are reading and modifying the code. If AI tools are also reading and modifying their own code, is that still true?
You have to be able to express the change you want in natural language. This is not always possible due to ambiguity.
Next to that, eventually you run into the same issue that we humans run into: no more context windows.
But we as software engineers have learned to abstract away components, to reduce the cognitive load when writing code. E.g., when you write file you don't deal with syscalls anymore.
This is different with AI. It doesn't abstract away things, which means you requesting a change might make the AI make a LOT of changes to the same pattern, but this can cause behavior to change in ways you haven't anticipated, haven't tested, or haven't seen yet.
And because it's so much code to review, it doesn't get the same scrutiny.
No I said what I meant. Code is a liability, though to your point, code you don't understand is an even bigger liability.
Even if I understand all my code, when I go to make changes, if it's 100k lines of code vs 2k lines of code, it's going to take more time and be more error prone.
Even if I understand all my code, the intern I hired last week won't and I'll have to teach it to them.
Even if I understand all my code, I don't remember everything all the time and I can forget about an edge case handed in thousands of lines of code.
Even if I understand all my code, I don't understand my co-workers code, and they don't understand mine.
Even if I understand all my code, I might not want to work for this company the rest of my life.
I've worked at so many places in my career that "not understanding code" is not an excuse. It is a skill to be able to read and follow code and get up to speed quickly, even on shit codebases. But "AI" generated code makes that so much more difficult, and the "AI" isn't going to walk you through it, and neither will your new coworkers. We aren't in a race to the bottom with "AI", we're in a speedrun to the bottom, and I don't think it's going to end up going too well for whatever developers are left in a few years.
I'm very curious to see how this will affect the job market. All the recent CS grads, all the coding bootcamp graduates - where would they end up in? And then there's medium/senior engineers that would have to switch how they work to oversee the hordes of AI agents that all the hype evangelists are pushing on the industry.
This is the thing I don't really get. I enjoy tinkering with AI and seeing what it comes up with to solve problems. But when I need to write working code that does anything beyond simple CRUD, it's faster for me to write the code than it is to (1) describe the problem in English with sufficient detail and working theory, then (2) check the AI's work, understand what it's written, de-duplicate and dry it out.
I guess if I skipped step 2, it might save time, but it would be completely irresponsible to put it into production, so that's not an option in any world where I maintain code quality and the trust of my clients.
Plus, having AI code mixed into my projects also leaves me with an uneasy sense of being less able to diagnose future bugs. Yes, I still know where everything is, but I don't know it as well as if I'd written it myself. So I find myself going back and re-reviewing AI-written code, re-familiarizing myself with it, in order to be sure I still have a full handle on everything.
To the extent that it may save me time as an engineer, I don't mind using it. But the degree to which the evangelists can peddle it to the management of a company as a replacement for human coders seems highly correlated with whether that company's management understood the value of safe code in the first place. If they didn't, then their infrastructure may have already been garbage, but it will now become increasingly unusable garbage. At some point, I think there will be a backlash when the results in reality can no longer be denied, and engineers who can come in and clean up the mess will be in high demand. But maybe that's just wishful thinking.
I'm in the same boat. Too often for me it feels easier to write code that I want to see by myself instead of opening some AI tool where I would have to describe what I need in plain English. After which I'd still have to review the code to make sure it does do what was requested.
Perhaps you have to be certain type of person or work in a peculiar company where second step (review) can be ignored as long as AI says that it does. Hardcore YOLO life.
the top % of talent is still extremely hard to get, perhaps moreso
saw an article recently where every sector is seeing a reduction in IT/devs except for tech and ai companies
if your company is in a sector where eng is a cost-center and the product is not directly tied to your engineers / your company is pushing for efficiency it's an employer's market
> - It’s simple and minimal - it does only what’s needed, in a way that both humans and machines can understand now and maintain in the future.
But do the humans need to actually understand the code? A "yes" means the bottleneck is understanding (code review, code inspection). A "no" means you can go faster, but at some risk.
OpenAI is implying that code may no longer be human readable in some circumstances.
> The resulting code does not always match human stylistic preferences, and that’s okay. As long as the output is correct, maintainable, and legible *to future agent runs*, it meets the bar.
> But do the humans need to actually understand the code? A "yes" means the bottleneck is understanding (code review, code inspection). A "no" means you can go faster, but at some risk.
I always thought of things like code reviews as semi pseudo-science in most cases. I've sat through meetings where developers obviously understand the code that they are reviewing, but where they didn't understand anything about the system as a whole. If your perfect function pulls on 800 external dependencies that you trust. Trust because it's too much of a hazzle to go through them. I'd argue that in this situation you don't understand your code at all. I don't think it matters and I certainly don't think I'm better than anyone else in this regard. I only know how things work when it matters.
If anything, I think AI will increase human understanding without the need to write computer unfriendly code like "Clean Code", "DRY" and so on.
True, however as these products have been designed and coded by LLMs from the ground up in 2025+, they are generally using modern (typed even) languages, the latest version of third party libraries, usually have documentation of sorts... sometimes they even have test suites.
As such, they can often be improved as easily as one can prompt, which is much faster and easier than before. Notably in the FOSS world where one had to ask the maintainer, get ghosted for a year and have them go back with a "close: wontfix (too tedious)".
I've tried very earnestly to use opus 4.5 to get rid of some backlog tasks that were too tedious to do manually. It turns out that they're still extremely tedious because I have to make every single non trivial decision for the model, unless I don't care one iota about the long term sustainability of the code base. And by long term, I mean more than a week. They're good for saving keystrokes or doing fuzzy searches for me. "Design"? No, that is an anthropomorphism.
Better languages do not necessarily mean better architectural decisions, or even better performance, unless the humans pressure for that and burn tokens on that. With no engineer in the room, more technical issues will be left unnoticed and unaddressed.
Compare it to visual arts. With a guidance form an artist, AI tools can help create wonderful pictures. Without such guidance, or at least expert prompting, a typical one-shot image from Gemini is... well, at best recognizable as such.
> Code has always been expensive. Producing a few hundred lines of clean, tested code takes most software developers a full day or more. Many of our engineering habits, at both the macro and micro level, are built around this core constraint.
> At the macro level we spend a great deal of time designing, estimating and planning out projects, to ensure that our expensive coding time is spent as efficiently as possible. Product feature ideas are evaluated in terms of how much value they can provide in exchange for that time - a feature needs to earn its development costs many times over to be worthwhile!
Maybe I am spending my life working at the wrong corporations (not FAANG/direct tech related), but that doesn't match at all my experience. The `design` phase was reduced to something more akin to a sketch in order to get faster iterating products. Obviously that now, as you create and debate over more iterations, the time for writing code is increased (as you built more stuff that is discarded). What is that discarded time used for? Well, it's the way new people learn the system/business domain. It's how we build the knowledge to support the product in production. It's how the business learns what are the limits/features, why they are there, what they can offer, what they must ask the regulators etc.
Realistically, if you only count the time required to develop the feature as described, is basically nothing. Most of the time is spent on edge-cases that are not written anywhere. You start coding something and 15m in you discover 5-10 cases not handled in any way. You ask business people, they ask other people. You start checking regulation docs/examples, etc. etc. Maybe there are no docs available, so you just push a version, and test if you assumptions are correct (most likely not...so go again and again). At the end of this process everyone gains a better understanding on how the business works, why, and what you can further improve.
Can AI speedrun this? Sure, but then how will all the people around gain the knowledge required to advance things? We learn through trial and error. Previously this was a shared experience for everyone in the business, now it becomes more and more a solitary experience of just speaking with AI.
I think there's a good parallel with AI images - generating pictures has gotten ridiculously easy and simple, yet producing art that is meaningful or wanted by anyone has gotten only mildly easier.
Despire the explosion of AI art, the amount of meaningful art in the world is increased only by a tiny amount.
But the amount of pleasing useful art has gone up x1000; If I had a blog, I would now have access to art that would be a perfect fit for my words, whereas 5 years ago, I would have to do with a my own (talent-lacking) doodles.
Would some people prefer no art/illustration to AI generated art? Sure. But even more would prefer no art to my doodles.
Putting text into a file is cheaper than before. Everything else remains the same cost in a well designed project, rather than a vibe coded one where you just tell the LLM to "make a todo list app"
the real shift is that throwaway code became viable for production workflows. i used to spend days writing reusable utility libraries. now i generate single-purpose scripts, run them once, and delete them. the economics of code reuse have fundamentally changed when generation is cheaper than comprehension.
the downstream bottleneck is real though. built a video production pipeline recently - generating the python glue code took maybe 10% of total project time. the other 90% was testing edge cases, tuning ffmpeg parameters, and figuring out why API responses were subtly different between providers. cheap code just means you hit the hard problems faster.
This. All LLM code I saw so far was lots of abstraction to the point that it’s hard to maintain.
It is testable for sure, but the complications cost is so high.
Something else that is not addressed in the article is working within enterprise env where new technologies are adopted in much slower paces compared to startups. LLMs come with strange and complicated patterns to solve these problems, which is understandable as I would imagine all training and tuning were following structured frameworks
If coding is so cheap, I hope people start vibing Rust. If the machine can do the work, please have it output in a performant language. I do not need more JS/Python utilities that require embarrassing amounts of RAM.
It's already happening, particularly with "Ladybird Browser adopts Rust" [0] being at the top of HN today. It's now feasible to quickly iterate on a system's design with a dynamic language like Python, and then, once you're happy with the design, have AI rewrite it into something like Rust or Zig. I can even foresee a future where we intentionally maintain two parallel implementations, with machine-defined translation between them, such that we're able to do massive changes on the higher level implementation in minutes, and then once we finish iterating, have it run overnight to reimplement (or rewrite) it in the performant language. A bit like the difference between a unoptimized debugging version of a project, and the highly optimized one, but on steroids.
Yes writing code is easier than ever, my problem is that understanding it still costs the same if not more [0]. I get that when people use agents, understanding code is not the concern because it's not exactly catering to people, it's for other agents. But when maintaining applications that have been running for years now, I still believe we need to fully understand code before we commit.
> Delivering new code has dropped in price to almost free... but delivering good code remains significantly more expensive than that.
Writing code was always cheap to start with. Just outsource it to the lowest bidder. Writing good code remains as expensive.
The same when programmers from different languages are considered. How many Scala/Haskell engineers can I find compared to Java is not the question. It is about how many good engineers you can hire. With Haskell that pool is definitely denser.
One of the biggest challenges right now in my opinion is disambiguating what processes _were_ necessary from those that are _still_ necessary and useful in light of exactly this.
But that's the thing - changing course is suddenly no longer hard. We've already reached a state where I can have AI generate a decent set of tests from an existing codebase (or better yet, I'd already have them ahead of time), and to then do a massive refactoring or even a full rewrite while I get a good night's sleep. There is nothing "has always been" about this.
Writing code was always cheap! You could outsource for inconsequential amounts of money and get massive amounts of code in return. Yet, the vast majority of companies do not do so. Because coding is not the hard part of being a software engineer / programmer.
That's like saying that photography killed painting because it saved you from having to draw things. Drawing is basically free now, I just take the photo. But the number of painters (and by that I mean, artists who paint) is dramatically higher today than in 1800. Artists didn't die because of mechanical reproduction, they flourished, because that wasn't the problem they were solving.
The second chapter is more of a classic pattern, it describes how saying "Use red/green TDD" is a shortcut for kicking the coding agent into test-first development mode which tends to get really good results: https://simonwillison.net/guides/agentic-engineering-pattern...
I believe the ChatGPT code has a bug, in that it accepts three spaces or tabs before a code fence, while the Google Markdown spec says up to three spaces, and does not allow a tab there.
I also see that the tests generated by ChatGPT are far too few for the code features implemented. The cannot be the result of actual red/green TDD where the test comes before the feature is added.
For examples, 1) the code allows "~~~" but only tests for "```", 2) there are no tests when len(fence) < fence_len nor when len(fence) > fence_len, and 3) there are no tests for leading spaces.
There's also duplicate code. The function _strip_closing_hashes is used once, in the line:
text = _strip_closing_hashes(m.group("text")).strip()
The function is:
def _strip_closing_hashes(s: str) -> str:
s = s.rstrip()
# remove trailing " ###" style closers
s = re.sub(r"[ \t]+#+\s*$", "", s).rstrip()
return s
The ".rstrip()" is unneeded as the ".strip()" does both lstrip and rstrip.
I think that rstrip() should be replaced with a strip(), the function renamed to "_get_inline_content", and used as "text = _get_inline_content(m.group("text")).
Also, the Google spec also says "A sequence of # characters with anything but spaces following it is not a closing sequence, but counts as part of the contents of the heading:" so is it really correct to use "\s*" in that regex, instead of "[ ]*"? And does it matter, since the input was rstrip'ped already?
So perhaps:
def _get_inline_content(s: str) -> str:
s = s.rstrip(" ") # remove trailing spaces
s = s.rstrip("#") # removing "#" style closers
return s.strip() # remove leading and trailing whitespace
would be more correct, readable, and maintainable?
I see lot of comments downplaying the significance of this but other than very large and/or mission critical infrastructure roles, your "taste and experience" is going to become cheap just like code.
Currently there is this notion that white collar workers and artists still have which is that they bring "taste" too to the experience but eventually AI will come for those as well, may or may not be LLM, and not sure about timelines.
Even as we speak, when I read through HN comments, I always ask : "Did an AI write this" or did someone use AI to help write their response. This goes beyond HN but any photo or drawing or music I hear now I ask the same question but eventually nobody will care because we are climbing out of uncanny valley very quickly.
Exactly. I feel like the latest models are basically a couple MCP servers away from just doing the whole thing. You just say, here’s what I want the system to do, and it’ll just do everything. No knowledge required. You need only know how to ask.
I wonder if they will make code fast in the way that McDonalds made food fast. For many business needs, knowing when a project will finish would be equally or even more valuable than knowing that it will contain more code or employ fewer programmers.
The rule of good fast cheap still applies the same as always, but business leaders consistently choose to ignore this reality and insist upon fast and cheap without acknowledging that it will come at the cost of good.
What's worse, is that these decisions are usually made on a short-term, quarterly basis. They never consider that slowing down today might save us time and money in the long-term. Better code means less bugs and faster bug-fixes. LLMs only exacerbate the business leader's worst tendencies.
The Airbus A320neo can already takeoff, ascend, cruise, descend, and land all by autopilot. It can even download your flight plan from the airline's servers.
But you still need the pilots because the system can only handle the happy path. As soon as there's any blockade or strong weather change, the autopilot will just turn off. And then you need the pilots.
I would say software engineering with AI is similar: The AI can handle CRUD just fine. But once things get messy, you need someone who can actually think.
To fly a plane with 300+ passengers you still only need 2-3 pilots. That has remained consistent with the invention of autopilot. While we might still need a few human engineer experts, maybe we only need a few for small to medium sized companies? That may not eliminate the career for the top % but it effectively does for the vast majority of engineers.
We do automate lots about flying, not just take-off and landing. It's why a 4-engine aircraft in the 1960s required flight crews of 6-8 people just to fly the thing when they can be routinely flown with 2-3 today.
"Writing" code is cheap but this just scratches the surface. Its a completely different paradigm. All forms of digital generation is cheap and on the verge of being fully automated which comes with self recursion loops.
The “typing” part used to dominate the cost structure, so we optimized around it (architecture upfront, DRY everywhere, extreme caution). Now the expensive part is clarity of intent and orchestrating the iteration: deciding what to build next, what to cut, what to validate, what to trust, and where to add guardrails (tests, invariants, observability).
If your requirements are fuzzy, the agent will happily generate 5k lines of very confident nonsense. If your domain model + constraints are crisp, results can be shockingly good.
So the scarce skill isn’t “can you write good code?”
It’s “can you interrogate reality well enough to produce a precise model—and then continuously steer the agent against that model?”
the interesting shift is where the time goes. before: thinking + typing. now: thinking + reviewing. the thinking part didn't get cheaper -- domain knowledge, edge cases, integration constraints -- none of that is free. what changed is you now review AI output instead of type your own, which is genuinely faster but not as different as it sounds. the hard part was always understanding what to build, not the keystrokes.
> the interesting shift is where the time goes. before: thinking + typing. now: thinking + reviewing.
It's widely accepted that you can't learn just by reading, you have to write. So only thinking and reviewing is a great way to lose all the business domain knowledge.
> the thinking part didn't get cheaper -- domain knowledge, edge cases, integration constraints -- none of that is free. what changed is you now review AI output instead of type your own, which is genuinely faster but not as different as it sounds
It's very different - you lose business domain knowledge if you're only reading.
Sometimes it feels what we are seeing is Code becoming just like any other "asset" in the globalised economy: cheap - but not quality; just like the priors of clothing (disintegrating after a few washes), consumer electronics (cheap materials), furniture (Instagram-able but utterly impracticable), etc: all made for quick turn-overs to rake in more profit and generate more waste but none made to last long.
Put another way: “reading code costs the same as it always did” arguably more when you consider that the cost of reading goes down when the ability read goes up. in other words if you wrote the thing it is likely you can read it fast. but reading someone elses stuff is harder.
Scathophagidae are flies that really like eating shit. We know how to cheaply produce massive amounts of shit.
But that doesn't mean we solved world hunger. In the same way, AIs churning out millions of lines of code doesn't mean we have solved software engineering.
Actually, I would argue that high LOCs are a liability, not an asset. We have found a very fast way of turning money into slop, which will then need maintenance and delay every future release. Unless, of course, you have an expert code reviewer who checks the AI output. But in that case, the productivity gains will be max 10%. Because thoroughly reviewing code is almost the same amount of work as writing it.
For everyone who is responding to the "Writing code is cheap now" heading without reading the article, I'd encourage you to scroll down to the "Good code still has a cost" section.
What a fantastic list. I'll be saving it to show the junior developers.
My only nitpick is that "reliability" should have been a point by itself. All the other "ilities" can be appropriately sacrificed in some context, but I've never seen unreliable software being praised for its code quality.
Which is part of why LLMs are so frustrating. They're extremely useful and extremely unreliable.
> ...
> Writing good code remains significantly more expensive
I think this is a bad argument. Code was expensive because you were trying to write the expensive good code in the first place.
When you drop your standards, then writing generated code is quick, easy and cheap. Unless you're willing to change your standard, getting it back to "good code" is still an equivalent effort.
There are alternative ways to define the argument for agentic coding, this is just a really really bad argument to kick it off.
Last month I did the majority of my work through an agent, and while I did review its work, I’m now finding edge cases and bugs of the kind that I’d never have expected a human to introduce. Obviously it’s on me to better review its output, but the perceived gains of just throwing a quick bug ticket at the ai quickly disappear when you want to have a scalable project.
I chose this words because I don't think good code is nearly as expensive with coding agents as it was without them.
You still have to actively work to get good code, but it takes so much less time when you have a coding agent who can do the fine-grained edits on your behalf.
I firmly believe that agentic engineering should produce better code. If you are moving faster but getting worse results it's worth stopping and examining if there are processes you could fix.
I’m using a combination of 100s of megabytes of Ghidra decompiled delphi DLLs and millions of lines of decompiled C# code to do this reverse engineering. I can’t imagine even trying such a large project for LLMs so while a good implementation is still taking a lot of time, it’s definitely a lot cheaper than before.
[1] I saw your red/green TDD article/book chapter and I don’t think you go far enough. Since we have agents, you can generalize red/green development to a lot of things that would be impractical to implement in tests. For example I have agents analyze binary diffs of the file format to figure out where my implementation is incorrect without being bogged down by irrelevant details like the order or encoding of parameters. This guides the agent loop instead of tests.
Which is nuance that will get overlooked or waved away by upper management who see the cost of hiring developers, know that developers "write code", and can compare the developer salary with a Claude/Codex/whatever subscription. If the correction comes, it will be late and at the expense of rank and file, as usual. (And don't be naive: if an LLM subscription can let you employ fewer developers, that subscription plus offshore developers will enable even more cost saving. The name of the game is cost saving, and has been for a long time.)
The reason you pay attention to details is because complexity compounds and the cheapest cleanup is when you write something, not when it breaks.
This last part is still not fully fleshed out.
For now. Is there any reason to not expect things to improve further?
Regardless, a lot of code is cheap now and building products is fun regardless, but I doubt this will translate into more than very short-term benefits. When you lower the bar you get 10x more stuff, 10x more noise, etc. You lower it more you get 100x and so on.
With python I can write a simple debugging UI server with a few lines.
There are frameworks that allow me to complete certain tasks in hours.
You do not need to program everything from scratch.
The more code, the faster everything gets, since the job is mostly done.
We are accelerating, but we still work 9 to 5 jobs.
The former: 1) understand the problem, 2) solve the problem.
The latter: 1) understand the problem, 2) solve the problem, 3) understand how somebody or something else understood & solved the problem, 4) diff those two, 5) plan a transition from that solution to this solution, 6) implement that transition (ideally without unplanned downtime and/or catastrophic loss of data).
This is also why I’m not a fan of code reviews. Code review is basically steps 1–4 from the second approach, plus having to verbally explain the diff, every time.
That's specious reasoning. Code reviews are a safeguard against cowboy coding, and a tool to enforce shared code ownership. You might believe you know better than most of your team members, but odds are a fresh pair of eyes can easily catch issues you snuck in your code that you couldn't catch due to things like PR tunnel vision.
And if your PR is sound, you certainly don't have a problem explaining what you did and why you did it.
[0] Reviews are OK if I enjoy working with the person whose work I’m reviewing and I feel like I’m helping them grow.
Do current LLM based agents generate code which is easy to change? My gut feeling is a no at the moment. Until they do I'd argue code generated from agents is only good for prototypes. Once you can ask your agent to change a feature and be 100% sure they won't break other features then you don't care about how the code looks like.
But LLMs are both really good at writing code _and_ reading code. However, they're not great at knowing when to stop - either finishing early and leaving stuff broken, over-engineering and adding in stuff that's not needed or deciding it's too hard and just removing stuff it deems unimportant.
I've found a TDD approach (with not just unit tests but high-level end-to-end behaviour-driven tests) works really well with them. I give them a high-level feature specification (remember Gherkin specifications?) and tell it to make that pass (with unit tests for any intermediate code it writes), make sure it hasn't broken anything (by running the other high-level tests) then, finally, refactor. I've also just started telling it to generate screenshots for each step in the feature, so I can quickly evaluate the UI flow (inspired by Simon Willison's Rodney tool).
Now I don't actually need to care if the code is easy to read or easy to change - because the LLM handles the details. I just need to make sure that when it says "I have implemented Feature X" that the steps it has written for that feature actually do what is expected and the UI fits the user's needs.
They do. I am no longer writing code, everything I commit is 100% generated using an agent.
And it produces code depending on the code already in my code-base and based on my instructions, which tell it about clean-code, good-practices.
If you don't get maintainable code from an LLM it's for this reason: Garbage in, garbage out.
That bar is unreasonably high.
Right now, if I ask a senior engineer to change a feature in a mature codebase, I only have perhaps 70% certainty they won't break other features. Tests help, but only so far.
With auto generated code which almost no one will check or debug by hand, you want at least compiler level exactitude. Then changing "the code" is as easy as asking your code generator for new things. If people have to debug its output, then it does not help in making maintainable software unless it also generates "good" code.
https://news.ycombinator.com/item?id=44522772
I'd also argue that we should be pushing towards tracer bullets as a development concept and less so prototypes that are nice but meant to be thrown away and people might not do that.
The clean room auto porting, after a messy exploratory prototyping session would be a nice pattern, nonetheless.
Every human can string words together, but there's a world of difference between words that raise $100M and words that get you slapped in the face.
The raw material was always cheap. The skill is turning it into something useful. Agentic engineering is just the latest version of that. The new skill is mastering the craft of directing cheap inputs toward valuable outcomes.
Strongly agree with this. It took me awhile to realize that "agentic engineering" wasn't about writing software it was about being able to very quickly iterate on bespoke tools for solving a very specific problem you have.
However, as soon as you start unblocking yourself from the real problem you want to solve, the agentic engineering part is no longer interesting. It's great to be solving a problem and then realize you could improve it very quickly with a quick request to an agent, but you should largely be focused on solving the problem.
Yet I see so many people talking about running multiple agents and just building something without much effort spent using that thing, as though the agentic code itself is where the value lies. I suspect this is a hangover from decades where software was valuable (we still have plenty of highly valued, unprofitable software companies as a testament to this).
I'm reminded a bit of Alan Watts' famous quote in regards to psychedelics:
> If you get the message, hang up the phone.
If you're really leveraging AI to do something unique and potentially quite disruptive, very quickly the "AI" part should become fairly uninteresting and not the focus of your attention.
Another commentor said it makes the easy part easy, and the hard part harder, which I resonate with at the moment.
I am pretty excited by being able to jump deep into real problems without code being the biggest bottleneck. I love coding but I love solving problems more, and coding for fun is very different to coding for outcomes.
There's also the question of the true cost of all the hardware, electricity, and potential output that's being tossed onto the pyres. We aren't getting the real Cortana from the books / games; we're getting GIR trained on the corpus of fallible human code, prompted by fallible humans.
It seems to depend a lot on the industry and niche you're in, working at an agency I get experience across many different projects and industries and sometimes you are just at the edge of AIs training and it can get very unhelpful. Noting many if not most companies are working on proprietary code in donain specific problems, that isn't all that surprising either.
That's because the threat is now not other businesses, but your own users who decide to vibe-code their own "Claw" product instead of using your company's vibeslop, so there are no buyers for your single-week product. All these new harness developers are engaging in resume-driven development to save their own asses. The only ones that are not naked when the tide recedes are the ones that are able to jump to the next layer of abstraction on the infinite staircase, until the next tide comes five seconds later.
As an example: One of my most promising projects I was discussing with a friend and we realized together we could potentially use these tools to build a two person agency with no need to hire anyone ever. If this were to work, could theoretically make nice revenue and it shouldn't show up in any metric anywhere.
Additionally I've heard of countless teams cancelling their contracts with outsourced engineers because cheap but bad coders in India are worse that an LLM and still cost more. I'm not sure if there's a number around this activity, but again, these type of changes don't show up in the usual places.
My current belief is not that AI will replace traditional software engineering it will replace a good chunk of the entire model of software.
You're not following your last line to its logical conclusion regarding your own prospects: no one is going to buy the vibeslop your two person agency is selling because they'd rather create and maintain their own vibeslop instead of dealing with yours.
If you follow some of your thoughts to their logical conclusion you'll realize the parent is right: there will be limited productivity that ends up fueling the economy when nobody is buying each other's vibeslop.
I absolutely agree that it's not logical to think "oh we'll sell our AI stuff", that's the old model (which is just a variation on SaaS). I suspect a lot of HNers can't imagine a "product" that isn't code, but that's not at all what I'm describing.
The products that most people on HN have traditionally built are used by other companies to make money by allowing those processes to be scaled. AI, in many new cases, eliminates the need for a 'software' middle man. The case I'm describing is "I know how to make money doing X if only I could scale it up with out hiring people" and my offering is "I can scale it up without hiring people".
This is increasingly where I think the future of work is headed, and it's more than fine if you aren't convinced.
Faster than what? You will be faster than your previous self, just like all of your competitors. Where’s the net gain here? Even if you somehow managed to capture more value for yourself, you’ve stopped providing value to 5-10x that many employees who are no longer employed.
When costs approach zero on a large scale, margins do not increase. Low costs = you’re not paying anyone = your competitors aren’t paying anyone = your customers no longer have money = your revenue follows your costs straight to zero.
Companies that provide physical services can’t scale without hiring. A one-man “crew” isn’t putting a roof on a data center.
I want to be wrong. Tell me why you think any of this is wrong.
Except production GDP, the standard measure of economic activity.
What are the 48 other people doing now? Presumably some other economic activity.
potentially...if this were to work...theoretically
shouldn't show up? I would worry that something with so many variables wouldn't show up.
Yeah but the actual productivity gains that the internet and software tools introduced has had diminishing returns after a while.
Like, are people more productive today when they use Outlook and Slack than they were 20 years ago when using IBM Lotus Notes and IBM Sametime? I'm not. Are people more productive with the Excel of today than with Excel 2003/2007? I'm not. Is Windows 11 and MacOS Tahoe making people more productive than Windows 7 and Snow Leopard? Not me. Are IDEs of today offering so much more productivity boost than what Visual Studio, CodeWarrior and Borland Delphi did back in the day? Don't think so.
To me it seems that at least on the productivity side, we've mostly been reinventing the wheel "but in Rust/Electron" for the last 15 or so years, and the biggest productivity gains came IMHO from increased compute power due to semiconductor advancement, so that the same tasks finished faster today than 20 years ago, but not that the SW or the internet got so much more capable since then.
At least, in my own experience.
So then the question is, lis there anything other than feels to say productive has or has not gone up? What would we accept as actual evidence one way or another? Commits-per-day is similarly not a good measure either. Jira tickets and tshirts sizes? We don't have a good measure, so while ShowHN's per weekend is equally dumb, it's also equally good in the bag of lies, damn lies, and statistics.
As a specialization? Sure. But the ditch diggers moved since to machine operators, handymen and the like.
In the past there were sysadmins. Do we have less software engineers since sysadmins ceased to be a thing?
All of them? What if they liked digging ditches?
> In the past there were sysadmins. Do we have less software engineers since sysadmins ceased to be a thing?
Software Engineers were never sysadmins in the past, you’re thinking DevOps maybe?
Bootcamp grads are basically obsolete now. The real skill has always been the ability to make good design decisions and that's still the case in the LLM era.
For now maybe yes but the goal is totally removing the human from the decision loop regarding technical stuff.
I beg to differ. I know for a fact that some companies started hiring people with LLM experience, whose only expertise is spending all Copilot enterprise account tokens on their first week at the job and proceed to whine that the lack of tokens was stifling their creativity.
Say what you may about boot camps, but at least the people getting hired could do things and understand what they are doing.
TLDR A raise is not robust signal in this regard.
[1] https://news.ycombinator.com/item?id=7260137
[2] https://www.linkedin.com/posts/peterjameswalker_most-venture...
[3] https://en.wikipedia.org/wiki/There%27s_a_sucker_born_every_...
[4] https://en.wikipedia.org/wiki/Overconfidence_effect
[5] https://en.wikipedia.org/wiki/Survivorship_bias
Also, I do want to note that these little "Here is how I see the world of SWE given current model capabilities and tooling" posts are MUCH appreciated, given how much you follow the landscape. When a major hype wave is happening and I feel like I am getting drowned on twitter, I tend to wonder "What would Simon say about this?"
That is my observation as well. Churning code is easy, but making sure the code is not total crap is a completely new challenge and concern.
It's not like prior to LLMs code reviews didn't required work. Far from it. It's just that how the code is generated in a completely different way, and in some cases with barely any oversight from vibecoders who are trying to punch way above their weight. So they generate these massive volumes of changes that fail in obvious and subtle ways, and the flow is relentless.
You can remove those comments afterwards if you feel they are too much but it helps a lot the reviewing.
More a trick than a silver bullet but it's nice.
No, it doesn't. It's completely useless and unhelpful. These machine-generated comments are only realizations of the context that already outputted crap. Dumping volumes of this output adds more work to reviewers to parse through to figure out the mess presented by vibecoders who didn't even bothered to check what they are generating.
It sounds harsh, but over the lifetime of a project, 10-lines/person/day is often a high estimate of the number of lines produced. It’s not because humans type so slow - it is because after a while, it’s all about changing previously written lines in ways that don’t break things.
LLMs are much better at that than humans, if the constraints and tests are reasonably well specified.
if they are, then why would a human be so slow? You're not comparing the same situation.
The difference comes in confidence that the solution works and can be maintained in the future, but in terms of purely making the decisions and applying the changes an LLM is faster when it has all the required infos available
In that time the LLM has made a change, ran tests, committed, pushed, checked that the CI build failed, looked at the CI logs, fixed the issue and the PR is now passing.
I think it's a very fun space, finally being able to empower many people who in the past wouidve been bottlenecked unless they were using very simple tools for their domain and upskilled enough. Those 2 things will still be true, but the speed at which some things can happen at the exploration and other layers has seen a significant speedup.
Other problems like entropy/slop, security, system testing, lack of automation fundamentals arise but it's a good problem to start tackling.
I'm very focused on evals [1] because is what allows me to not to be the bottleneck with economists who I want to empower to code end to end and I'd like that mental shift to happen for anyone becoming a builder so non traditional developers and developers by trade have a common language for product building [2]. That part of speaking to different audiences and combating hype that promises to do everything for you Vs the intent that's actually needed is hard, but trying gets you to advance quite a lot.
[1] https://alexhans.github.io/posts/series/evals/measure-first-...
[2] https://ai-evals.io
LLMs lower the cost of copy/pasting code around, or troubleshooting issues using standard error messages.
Instead of going through Stack Overflow to find how to use a framework to do some specific thing, you prompt a model. You don't even need to know a thing about the language you are using to leverage a feedback loop.
LLMs lower the cost of a multitude of drudge work in developing software, such as having to read the docs to learn how a framework should be used to achieve a goal. You still need to know what you are doing, but you don't need to reinvent the wheel.
> The challenge is to develop new personal and organizational habits that respond to the affordances and opportunities of agentic engineering.
I don't think it's the habits that need to change, it's everything. From how accountability works, to how code needs to be structured, to how languages should work. If we want to keep shipping at this speed, no stone can be left unturned.
[1]: https://lucumr.pocoo.org/2026/2/13/the-final-bottleneck/
The linked blog post draws comparisons to the industrial revolution however in the industrial revolution the speed up caused innovation upstream not downstream.
The first innovation was mechanical weaving. The bottleneck was then yarn. This was automated so the bottleneck became cotton production, which was then mechanised.
So perhaps the real bottleneck of being able to write code faster is upstream.
Can requirements of what to build keep up with pace to deliver it?
I'm very focused on their minimalistic building experience as a way to make me and other traditional developers, not the bottleneck and empowering them end to end.
I think AI evals [1] are a big part of that route and hope that different disciplines can finally have probable product design stories [2] instead of there being big gaps of understanding between them.
[1] https://alexhans.github.io/posts/series/evals/measure-first-...
[2] https://ai-evals.io
If agentic AI is a good idea and if it increases productivity we should expect to see some startup blowing everyone out of the water. I think we should be seeing it now if it makes you say ten times more productive. A lot of startups have had a year of agentic AI now to help them beat their competitors.
Imo the wave of top down 'AI mandates' from incumbent companies is a direct result of the competitive pressure, although it probably wont work as well as the execs think it will
that being said even Dario claims a 5-20% speedup from coding agents, 10x productivity only exists in microcosm prototypes, or if someone was so unskilled oneshotting a localhost web app is a 10x for them
Could you give us a few examples?
Which gets back to the outsourcing argument: it’s always been cheap to make buggy code. If we were able to solve this, outsourcing would have been ubiquitous. Maybe LLMs change the calculus here too?
But coding assistance tools must themselves be evaluated by what they produce. We won't see significant economic growth through using AI tools to build other AI tools recursively unless the there are companies using these tools to make enough money to justify the whole stack.
I believe there are teams out there producing software that people are willing to pay for faster than they did before. But if we were on the verge of rapid economic growth, I would expect HN commenters to be able to rattle these off by the dozen.
ant 10xing ARR, oai
harvey legora sierra decagon 11labs glean(ish) base10(infra) modal(infra) gamma mercor(ish) parloa cognition
regulated industries giving these companies 7/8-fig contracts less than 2 years from incorporation
Not sure how long it’ll last though. With the time I spend on reviews I could have done it myself, so if they don’t start learning…
Then? Your job is still to review their code. If they are your coworker, you can not fire them.
(Whether you think OpenClaw is good software is kind of beside the point.)
I don’t think anyone is arguing against code agents being good at prototypes, which is a great feat, but most SWE work is built on maintaining code over time.
It's very much imperfect, but the only consistently agreed upon and useful definition of "value" we have in the West is monetary value, and in that sense, we have at least a few major examples of AI generating value rapidly.
In any case, I agree with the grandparent post about the distinction between being successful and good.
I don't see a bunch of small agents in the future, instead just one per device or user. Maybe there will be a fleeting moment for GUI/local apps to tie into some local, OS LLM library (or some kind of WebLLM spec) to leverage this local agent in your app.
sort of how the hammer is the most useful tool ever and all we have to do is to make every thing that needs doing look like a nail.
Will we stop using web browsers as we understand them today in the next few decades in favor of only interacting with agents? Maybe.
These are valid points, taken to the extreme we will have apps that cannot be supported.
In short term, we already have SQL/reports being automated. Lovable etc is experimenting with generating user interfaces from prompts, soon we will have complete working apps from a prompt. Why not have one core that you can expand via a prompt?
I am currently studying and depending heavily on Anki, its been amazing to use Claude Code to add new functionality on the fly. Its a holy mess of inconsistent/broken UX but it so clearly gives me value over the core version. Sometimes it breaks, but CC can usually fix it within a prompt or two.
Me too, and I see this as _incredibly_ wasteful.
Why? Why do we need to "write code so much faster and quicker" to the point we saturate systems downstream? I understand that we can, but just because we can, does'nt mean we should.
But that's point of TFA, no? Now that writing code is no longer the bottleneck, the upstream and downstream processes have become the new bottlenecks, and we need to figure out how to widen them.
As I see it, the end goal for all of this is generating software at the speed of thought, or at least at the speed of speech. I want the digital butler to whom I could just say - "I'm not happy with the way things happened to day, please change it so that from here on, it'll be like x" - and it'll just respond with "As you wish", and I'll have confidence that it knows me well enough and is capable enough to have actually implemented the best possible interpretation of what I asked for, and that the few miscommunications that do occur would be easy to fix.
We're obviously not close that yet, but why shouldn't we build towards it?
I think it’s contestable that writing the code was ever the main bottleneck.
> As I see it, the end goal for all of this is generating software at the speed of thought, or at least at the speed of speech.
The question is what distinguishes that from having AGI, and if the answer is “nothing”, then that will change the whole game entirely again.
Using AI to ship more and more code faster, instead of to make code more mature, will make this worse.
With coding agent projects I find that investing in DRY doesn't really help very much. Needing to apply the same fix in two places is a waste of time as a human. An agent will spot both places with grep and update them almost as fast as if there was just one.
It's another case where my existing programming instincts appear to not hold as well as I would expect them to.
Is the goal basically a codebase where your interactions are mediated through an LLM?
I'm not ready to write about how radically though because I don't know myself!
Do we? Spewing features like explosive diarrhea is not something I want.
The thing I'd add from running agents in actual production (not demos, but workflows executing unattended for weeks): the hard part isn't code volume or token cost. It's state continuity.
Agents hallucinate their own history. Past ~50-60 turns in a long-running loop, even with large context windows, they start underweighting earlier information and re-solving already-solved problems. File-based memory with explicit retrieval ends up being more reliable than in-context stuffing - less elegant but more predictable across longer runs.
Second hard part: failure isolation. If an agent workflow errors at step 7 of 12, you want to resume from step 6, not restart from zero. Most frameworks treat this as an afterthought. Checkpoint-and-resume with idempotent steps is dramatically more operationally stable.
Agree it's not just habits - the infrastructure mental model has to change too. You're not writing programs so much as engineering reliability scaffolding around code that gets regenerated anyway.
Tokens are expensive. We don't know what the actual cost is yet. We have startups, who aren't turning a profit, buying up all the capacity of the supply chain. There are so many impacts here that we don't have the data on.
Code is still liability but it's undeniable that going from thought to running code is very cheap today.
To recap, the author disagrees that writing code is cheap, because we've collectively invested trillions of dollars and redirected entire supply chains into automating code generation. The externalities will be paid for generations to come by all of humanity; it's just not reflected in your Claude subscription.
The cat is out of the bag: compute shall keep getting cheaper as it's always been since 60 years or something.
It's always been maintenance that's been the killer and GP is totally right about that.
And if we look at a company like Cloudflare who basically didn't have any serious outage for five years then had five serious outages in six months since they drank the AI kool-aid, we kinda have a first data point on how amazing AI is from a maintenance point of view.
We all know we're generating more lines of underperforming, insecure, probably buggy, code than ever before.
We're in for a wild ride.
We kind of do? Local models (thought no state of the art) set a floor on this.
Even if prices are subsidized now (they are) that doesn't mean they will be more expensive later. e.g. if there's some bubble deflation then hardware, electricity, and talent could all get cheaper.
The same applies to a small software project - you need to choose what features you can fit. And while the cost of building is part of the consideration, I'd say most of it is about the cost of maintaining features, not only in code, but also in product coherence and other incidental 'costs' like documentation and user support.
Be careful of building too many features and ending up being overwhelmed by the maintenance, or worse, diluting the product's value to a point where you loose users.
Writing good software is still expensive.
It's going to take everybody a while to figure that out (just like with outsourcing)
Which is a shame, cause I think LLMs have a lot more use for software dev than writing code. And that’s really what’s going to shift the industry - not just the part willing to cut on quality.
The real cost was never the code itself. It was the decision-making around what to build. That hasn't gotten cheaper at all.
Empowering people to work Tracer bullet style after they've selected their prototype of choice and thrown it away might be a powerful pattern that actually gets us into a nice collaborative space.
It seems to me that in order to obtain the ability to build things that other people like, you need to go through the process of creating things they won't. Like a painter needs to paint a bunch of crappy paintings to learn how to create a good painting. If you have the LLM create these throwaway prototypes, how will you even know when you come across a good idea and how will you be able to build it.
Okay, granted. What does that have to do with how the code is written? Do people generally care if a web app is running from nicely formatted JS or minified JS? Is a product manager not getting better at building things people like because they're not iterating on the code themselves?
Without agreeing or disagreeing with the premise, I think a relevant metaphor* here is that the painter can practice and iterate and go from creating crappy paintings to creating good paintings, without needing to make their own paint and canvas and brushes. If they're particular, they can have their assistant go to the supply shop and get just the right things they want, with increasing specificity as needed, but they don't need to manufacture them by hand.
* Like most metaphors, it's not perfect; please try to understand the intent.
The cost of iterating (with software) dropped by a few orders of magnitude in the last few months.
Yeah, coding is cheaper now, but knowing what to code has always been the more expensive piece. I think AI will be able to help there eventually, but it's not as far along on that vector yet.
AIs so far seem to prefer addition by addition, not addition by subtraction or addition by saying "are you sure?".
This doesn't mean that "code is cheap" is bad. Rather, it means that soon our primary role will be to guide AIs to produce a high proportion of "code that was cheap", while being able to quickly distinguish, prevent, and reject "cheap code".
One huge barrier is fighting entropy. You should be wary of prototypes which create false expectations and don't help product evolution whereas tracer bullets [2] might be better if you want to quickly show something and adjust.
Testing and testability are concepts that aren't intuitive or easy until you develop a feel for them so we should be preaching feeling that pain and moving slowly and with intent and working minimally [3] when you actually want to share or maintain your coding artifact. There should be no difference between judicious human and computer code. Don't suddenly start putting What instead of why in comments or repeating everything.
Helping non tech people become builders or sharers is a challenge beyond "vibe coding" and the agent skills [4] space is fascinating for that. Like most things AI (LLM), UX matters more than almost anything else.
[1] https://ai-evals.io
[2] concept from the Pragmatic Programmer, https://www.aihero.dev/tracer-bullets
[3] https://alexhans.github.io/posts/series/evals/measure-first-...
[4] https://alexhans.github.io/posts/series/evals/building-agent...
1. The time spent to think and iteratively understand what you want to build 2. The time spent to spell out how you want to build it
The cost for #2 is nearly zero now. The cost for #1 too is slashed substantially because instead of thinking in abstract terms or writing tests you can build a version of the thing and then ground your reasoning in that implementation and iterate until you attain the right functionality.
However, once that thing is complex enough you still need to burn time on identifying the boundaries of the various components and their interplay. There is no gain from building "a browser" and then iterating on the whole thing until it becomes "the browser". You'll be up against combinatorial complexity. You can perhaps deal with that complexity if you have a way to validate every tiny detail, which some are doing very well in porting software for example.
You might not get gcc/llvm level optimization from a newly built compiler - but if you had a home-built one, which took $15,000/month engineer to support (for years!) you can now get a new one for $20,000 every 3 months, for a 50% cost saving, each time changing your requirements (which you couldn’t do before).
Code used to be a liability, like a car or an apartment for the average person. Now it’s a liability, like a car or apartment for Bill Gates.
Next to that, eventually you run into the same issue that we humans run into: no more context windows.
But we as software engineers have learned to abstract away components, to reduce the cognitive load when writing code. E.g., when you write file you don't deal with syscalls anymore.
This is different with AI. It doesn't abstract away things, which means you requesting a change might make the AI make a LOT of changes to the same pattern, but this can cause behavior to change in ways you haven't anticipated, haven't tested, or haven't seen yet.
And because it's so much code to review, it doesn't get the same scrutiny.
Then "AI" code is even more of a liability.
But please correct me if I'm wrong.
Even if I understand all my code, when I go to make changes, if it's 100k lines of code vs 2k lines of code, it's going to take more time and be more error prone.
Even if I understand all my code, the intern I hired last week won't and I'll have to teach it to them.
Even if I understand all my code, I don't remember everything all the time and I can forget about an edge case handed in thousands of lines of code.
Even if I understand all my code, I don't understand my co-workers code, and they don't understand mine.
Even if I understand all my code, I might not want to work for this company the rest of my life.
Not an employee market, that's for sure.
This is the thing I don't really get. I enjoy tinkering with AI and seeing what it comes up with to solve problems. But when I need to write working code that does anything beyond simple CRUD, it's faster for me to write the code than it is to (1) describe the problem in English with sufficient detail and working theory, then (2) check the AI's work, understand what it's written, de-duplicate and dry it out.
I guess if I skipped step 2, it might save time, but it would be completely irresponsible to put it into production, so that's not an option in any world where I maintain code quality and the trust of my clients.
Plus, having AI code mixed into my projects also leaves me with an uneasy sense of being less able to diagnose future bugs. Yes, I still know where everything is, but I don't know it as well as if I'd written it myself. So I find myself going back and re-reviewing AI-written code, re-familiarizing myself with it, in order to be sure I still have a full handle on everything.
To the extent that it may save me time as an engineer, I don't mind using it. But the degree to which the evangelists can peddle it to the management of a company as a replacement for human coders seems highly correlated with whether that company's management understood the value of safe code in the first place. If they didn't, then their infrastructure may have already been garbage, but it will now become increasingly unusable garbage. At some point, I think there will be a backlash when the results in reality can no longer be denied, and engineers who can come in and clean up the mess will be in high demand. But maybe that's just wishful thinking.
Perhaps you have to be certain type of person or work in a peculiar company where second step (review) can be ignored as long as AI says that it does. Hardcore YOLO life.
saw an article recently where every sector is seeing a reduction in IT/devs except for tech and ai companies
if your company is in a sector where eng is a cost-center and the product is not directly tied to your engineers / your company is pushing for efficiency it's an employer's market
> [...]
> - It’s simple and minimal - it does only what’s needed, in a way that both humans and machines can understand now and maintain in the future.
But do the humans need to actually understand the code? A "yes" means the bottleneck is understanding (code review, code inspection). A "no" means you can go faster, but at some risk.
> The resulting code does not always match human stylistic preferences, and that’s okay. As long as the output is correct, maintainable, and legible *to future agent runs*, it meets the bar.
https://openai.com/index/harness-engineering/
I always thought of things like code reviews as semi pseudo-science in most cases. I've sat through meetings where developers obviously understand the code that they are reviewing, but where they didn't understand anything about the system as a whole. If your perfect function pulls on 800 external dependencies that you trust. Trust because it's too much of a hazzle to go through them. I'd argue that in this situation you don't understand your code at all. I don't think it matters and I certainly don't think I'm better than anyone else in this regard. I only know how things work when it matters.
If anything, I think AI will increase human understanding without the need to write computer unfriendly code like "Clean Code", "DRY" and so on.
How?
As such, they can often be improved as easily as one can prompt, which is much faster and easier than before. Notably in the FOSS world where one had to ask the maintainer, get ghosted for a year and have them go back with a "close: wontfix (too tedious)".
Compare it to visual arts. With a guidance form an artist, AI tools can help create wonderful pictures. Without such guidance, or at least expert prompting, a typical one-shot image from Gemini is... well, at best recognizable as such.
Owning code is getting more and more expensive.
SWEs sacrificed their jobs so that SREs could have unlimited job security.
> At the macro level we spend a great deal of time designing, estimating and planning out projects, to ensure that our expensive coding time is spent as efficiently as possible. Product feature ideas are evaluated in terms of how much value they can provide in exchange for that time - a feature needs to earn its development costs many times over to be worthwhile!
Maybe I am spending my life working at the wrong corporations (not FAANG/direct tech related), but that doesn't match at all my experience. The `design` phase was reduced to something more akin to a sketch in order to get faster iterating products. Obviously that now, as you create and debate over more iterations, the time for writing code is increased (as you built more stuff that is discarded). What is that discarded time used for? Well, it's the way new people learn the system/business domain. It's how we build the knowledge to support the product in production. It's how the business learns what are the limits/features, why they are there, what they can offer, what they must ask the regulators etc.
Realistically, if you only count the time required to develop the feature as described, is basically nothing. Most of the time is spent on edge-cases that are not written anywhere. You start coding something and 15m in you discover 5-10 cases not handled in any way. You ask business people, they ask other people. You start checking regulation docs/examples, etc. etc. Maybe there are no docs available, so you just push a version, and test if you assumptions are correct (most likely not...so go again and again). At the end of this process everyone gains a better understanding on how the business works, why, and what you can further improve.
Can AI speedrun this? Sure, but then how will all the people around gain the knowledge required to advance things? We learn through trial and error. Previously this was a shared experience for everyone in the business, now it becomes more and more a solitary experience of just speaking with AI.
Despire the explosion of AI art, the amount of meaningful art in the world is increased only by a tiny amount.
Would some people prefer no art/illustration to AI generated art? Sure. But even more would prefer no art to my doodles.
Thus, "Code" is a liability; Producing excess liabilities 'cheaply' is still a loss.
You only ever want to have just enough code to accomplish the task at hand.
LLMs may help you get to just enough faster, but you'll only know that you are there after doing the second 90%.
the downstream bottleneck is real though. built a video production pipeline recently - generating the python glue code took maybe 10% of total project time. the other 90% was testing edge cases, tuning ffmpeg parameters, and figuring out why API responses were subtly different between providers. cheap code just means you hit the hard problems faster.
This. All LLM code I saw so far was lots of abstraction to the point that it’s hard to maintain.
It is testable for sure, but the complications cost is so high.
Something else that is not addressed in the article is working within enterprise env where new technologies are adopted in much slower paces compared to startups. LLMs come with strange and complicated patterns to solve these problems, which is understandable as I would imagine all training and tuning were following structured frameworks
When it’s trained on enough APL/K code, you’ll get minimal abstraction.
Turned it into a Stripe revenue dashboard and notifier.
Even bought a couple more, flashed them, and gave to my cofounders, complete with AI written (personally tested, though) setup instructions!
[0] https://news.ycombinator.com/item?id=47120899
[0]: https://idiallo.com/blog/writing-code-is-easy-reading-is-har...
> Delivering new code has dropped in price to almost free... but delivering good code remains significantly more expensive than that.
Writing code was always cheap to start with. Just outsource it to the lowest bidder. Writing good code remains as expensive.
The same when programmers from different languages are considered. How many Scala/Haskell engineers can I find compared to Java is not the question. It is about how many good engineers you can hire. With Haskell that pool is definitely denser.
That's like saying that photography killed painting because it saved you from having to draw things. Drawing is basically free now, I just take the photo. But the number of painters (and by that I mean, artists who paint) is dramatically higher today than in 1800. Artists didn't die because of mechanical reproduction, they flourished, because that wasn't the problem they were solving.
The second chapter is more of a classic pattern, it describes how saying "Use red/green TDD" is a shortcut for kicking the coding agent into test-first development mode which tends to get really good results: https://simonwillison.net/guides/agentic-engineering-pattern...
I also see that the tests generated by ChatGPT are far too few for the code features implemented. The cannot be the result of actual red/green TDD where the test comes before the feature is added.
For examples, 1) the code allows "~~~" but only tests for "```", 2) there are no tests when len(fence) < fence_len nor when len(fence) > fence_len, and 3) there are no tests for leading spaces.
There's also duplicate code. The function _strip_closing_hashes is used once, in the line:
The function is: The ".rstrip()" is unneeded as the ".strip()" does both lstrip and rstrip.I think that rstrip() should be replaced with a strip(), the function renamed to "_get_inline_content", and used as "text = _get_inline_content(m.group("text")).
Also, the Google spec also says "A sequence of # characters with anything but spaces following it is not a closing sequence, but counts as part of the contents of the heading:" so is it really correct to use "\s*" in that regex, instead of "[ ]*"? And does it matter, since the input was rstrip'ped already?
So perhaps:
would be more correct, readable, and maintainable?Currently there is this notion that white collar workers and artists still have which is that they bring "taste" too to the experience but eventually AI will come for those as well, may or may not be LLM, and not sure about timelines.
Even as we speak, when I read through HN comments, I always ask : "Did an AI write this" or did someone use AI to help write their response. This goes beyond HN but any photo or drawing or music I hear now I ask the same question but eventually nobody will care because we are climbing out of uncanny valley very quickly.
What's worse, is that these decisions are usually made on a short-term, quarterly basis. They never consider that slowing down today might save us time and money in the long-term. Better code means less bugs and faster bug-fixes. LLMs only exacerbate the business leader's worst tendencies.
We have autopilot and i'm sure if we tried could automate take off and landing of commercial flights.
But we will keep pilots on planes long after they are needed.
But you still need the pilots because the system can only handle the happy path. As soon as there's any blockade or strong weather change, the autopilot will just turn off. And then you need the pilots.
I would say software engineering with AI is similar: The AI can handle CRUD just fine. But once things get messy, you need someone who can actually think.
Automated intelligence is now cheap....
The real bottleneck isn’t writing (or even reviewing) code anymore. It’s:
1. extracting knowledge from domain experts
2. building a coherent mental model of the domain
3. making product decisions under ambiguity / tradeoffs
4. turning that into clear, testable requirements and steering the loop as reality pushes back
The workflow is shifting to:
Understand domain => Draft PRD/spec (LLM helps) => Prompt agent to implement => Evaluate against intent + constraints => Refine (requirements + tests + code) => Repeat
The “typing” part used to dominate the cost structure, so we optimized around it (architecture upfront, DRY everywhere, extreme caution). Now the expensive part is clarity of intent and orchestrating the iteration: deciding what to build next, what to cut, what to validate, what to trust, and where to add guardrails (tests, invariants, observability).
If your requirements are fuzzy, the agent will happily generate 5k lines of very confident nonsense. If your domain model + constraints are crisp, results can be shockingly good.
So the scarce skill isn’t “can you write good code?” It’s “can you interrogate reality well enough to produce a precise model—and then continuously steer the agent against that model?”
It's widely accepted that you can't learn just by reading, you have to write. So only thinking and reviewing is a great way to lose all the business domain knowledge.
> the thinking part didn't get cheaper -- domain knowledge, edge cases, integration constraints -- none of that is free. what changed is you now review AI output instead of type your own, which is genuinely faster but not as different as it sounds
It's very different - you lose business domain knowledge if you're only reading.
But that doesn't mean we solved world hunger. In the same way, AIs churning out millions of lines of code doesn't mean we have solved software engineering.
Actually, I would argue that high LOCs are a liability, not an asset. We have found a very fast way of turning money into slop, which will then need maintenance and delay every future release. Unless, of course, you have an expert code reviewer who checks the AI output. But in that case, the productivity gains will be max 10%. Because thoroughly reviewing code is almost the same amount of work as writing it.
Code is cheap. Show me the talk
https://news.ycombinator.com/item?id=46823485
And LLMs aren’t half as good as maintaining code as they are to generate it in the first place. At least yet.