AI is incredibly dangerous because it can do the simple things very well, which prevents new programmers from learning the simple things ("Oh, I'll just have AI generate it") which then prevents them from learning the middlin' and harder and meta things at a visceral level.
I'm a CS teacher, so this is where I see a huge danger right now and I'm explicit with my students about it: you HAVE to write the code. You CAN'T let the machines write the code. Yes, they can write the code: you are a student, the code isn't hard yet. But you HAVE to write the code.
It’s like weightlifting: sure you can use a forklift to do it, but if the goal is to build up your own strength, using the forklift isn’t going to get you there.
This is the ultimate problem with AI in academia. We all inherently know that “no pain no gain” is true for physical tasks, but the same is true for learning. Struggling through the new concepts is essentially the point of it, not just the end result.
Of course this becomes a different thing outside of learning, where delivering results is more important in a workplace context. But even then you still need someone who does the high level thinking.
I think this is a pretty solid analogy but I look at the metaphor this way - people used to get strong naturally because they had to do physical labor. Because we invented things like the forklift we had to invent things like weightlifting to get strong instead. You can still get strong, you just need to be more deliberate about it. It doesn't mean shouldn't also use a forklift, which is its own distinct skill you also need to learn.
It's not a perfect analogy though because in this case it's more like automated driving - you should still learn to drive because the autodriver isn't perfect and you need to be ready to take the wheel, but that means deliberate, separate practice at learning to drive.
> people used to get strong naturally because they had to do physical labor
I think that's a bit of a myth. The Greeks and Romans had weightlifting and boxing gyms, but no forklifts. Many of the most renowned Romans in the original form of the Olympics and in Boxing were Roman Senators with the wealth and free time to lift weights and box and wrestle. One of the things that we know about the famous philosopher Plato was that Plato was essentially a nickname from wrestling (meaning "Broad") as a first career (somewhat like Dwayne "The Rock" Johnson, which adds a fun twist to reading Socratic Dialogs or thinking about relationships as "platonic").
Arguably the "meritocratic ideal" of the Gladiator arena was that even "blue collar" Romans could compete and maybe survive. But even the stories that survive of that, few did.
There may be a lesson in that myth, too, that the people that succeed in some sports often aren't the people doing physical labor because they must do physical labor (for a job), they are the ones intentionally practicing it in the ways to do well in sports.
I can’t attest to the entire past, but my ancestors on both sides were farmers or construction workers. They were fit. Heck, my dad has a beer gut at 65 but still has arm muscles that’ll put me to shame — someone lifting weights once a week. I’ve had to do construction for a summer and everyone there was in good shape.
They don’t go to the gym, they don’t have the energy; the job shapes you. More or less the same for the farmers in the family.
Perhaps this was less so in the industrial era because of poor nutrition (source: Bill Bryson, hopefully well researched). Hunter gatherer cultures that we still study today have tremendous fitness (Daniel Lieberman).
My dad was a machinist, apprenticed in Germany after WW2. Always somewhat overweight (5'9", 225 lbs during his "peak" years), but he could lift guys up by their belt with one arm, and pick up and move 200+ lb metal billets when he got too impatient to wheel the crane over. Even at 85 now, he's probably stronger in his arms than most 60 year olds. But I'm also not saying ALL of his co-workers were that strong, either.
Takes mass to move mass. Most of the strongest people in the world look "fat" and usually have a hefty gut. Strong and jacked are orthogonal characteristics.
I know what you mean, but from a physics perspective, no, it just takes force to move mass. More mass will generate more downward force due to gravity, and more force in other directions due to momentum once it’s moving, but there’s more to generating force than just mass. I’m not a kinesiologist but I would think how much force muscles generate depends on the amount and size of the fibers (mass) but also on their contractive efficiency and the amount of energy they can obtain and employ to contract (not necessarily proportional to mass, involves cardiovascular fitness)
The fact that Greeks and Romans had weightlifting and boxing gyms for their athletes in no way makes it a "bit of a myth" that people used to get strong naturally by doing physical labor. For example, average grip strength of people under age 30 in the US has declined markedly just since 1985.
Why do you think that? It's definitely true. You can observe it today if you want to visit a country where peasants are still common.
From Bret Devereaux's recent series on Greek hoplites:
> Now traditionally, the zeugitai were regarded as the ‘hoplite class’ and that is sometimes supposed to be the source of their name
> but what van Wees is working out is that although the zeugitai are supposed to be the core of the citizen polity (the thetes have limited political participation) there simply cannot be that many of them because the minimum farm necessary to produce 200 medimnoi of grain is going to be around 7.5 ha or roughly 18 acres which is – by peasant standards – an enormous farm, well into ‘rich peasant’ territory.
> Of course with such large farms there can’t be all that many zeugitai and indeed there don’t seem to have been. In van Wees’ model, the zeugitai-and-up classes never supply even half of the number of hoplites we see Athens deploy
> Instead, under most conditions the majority of hoplites are thetes, pulled from the wealthiest stratum of that class (van Wees figures these fellows probably have farms in the range of ~3 ha or so, so c. 7.5 acres). Those thetes make up the majority of hoplites on the field but do not enjoy the political privileges of the ‘hoplite class.’
> And pushing against the ‘polis-of-rentier-elites’ model, we often also find Greek sources remarking that these fellows, “wiry and sunburnt” (Plato Republic 556cd, trans. van Wees), make the best soldiers because they’re more physically fit and more inured to hardship – because unlike the wealthy hoplites they actually have to work.
I think he was saying upper classes that didn't do much physical labor have existed since at least classical era and needed to do some kind of physical training to maintain strength?
> The ability of skinny old ladies to carry huge loads is phenomenal. Studies have shown that an ant can carry one hundred times its own weight, but there is no known limit to the lifting power of the average tiny eighty-year-old Spanish peasant grandmother.
Weightlifting and weight training was invented long before forklifts. Even levers were not properly understood back then.
My favorite historic example of typical modern hypertrophy-specific training is the training of Milo of Croton [1]. By legend, his father gifted him with the calf and asked daily "what is your calf, how does it do? bring it here to look at him" which Milo did. As calf's weight grew, so did Milo's strength.
This is application of external resistance (calf) and progressive overload (growing calf) principles at work.
"He was taken as a prisoner of war four times, but managed to escape each time. As a prisoner, he pushed and pulled his cell bars as part of strength training, which was cited as an example of the effectiveness of isometrics. At least one of his escapes involved him 'breaking chains and bending bars'."
If you do a single set of half of exercises you need to train each day of the week, rotating these halves, you get 3 and a half sets of each exercise per week.
Training volume of Bulgarian Method is not much bigger than that of regular training splits like Sheiko or something like that, if bigger at all. What is more frequent is the stimulation of muscles and nervous system paths and BM adapts to that - one does high percentage of one's current max, essentially, one is training with what is available to one's body at the time.
>if the goal is to build up your own strength
I think you missed this line. If the goal is just to move weights or lift the most - forklift away. If you want to learn to use a forklift, drive on and best of luck. But if you're trying to get stronger the forklift will not help that goal.
Like many educational tests the outcome is not the point - doing the work to get there is. If you're asked to code fizz buzz it's not because the teacher needs you to solve fizz buzz for them, it's because you will learn things while you make it. Ai, copying stack overflow, using someone's code from last year, it all solves the problem while missing the purpose of the exercise. You're not learning - and presumably that is your goal.
> people used to get strong naturally because they had to do physical labor.
People used to get strong because they had to survive. They stopped needing strength to survive, so it became optional.
So what does this mean about intelligence? Do we no longer need it to survive so it's optional? Yes/No informs on how much young and developing minds should be exposed to AI.
A use case I’ve been working through is learning a language (not programming). You can use LLMs to translate and write for you in another language but you will not be able to say, I know that language, no matter how much you use the LLM.
Now compare this to using the LLM with a grammar book and real world study mechanisms. This creates friction which actually causes your mind to learn. The LLM can serve as a tool to get specialized insight into the grammar book and accelerate physical processes (like generating all forms of a word for writing flashcards). At the end of day, you need to make an intelligent separation where the LLM ends and your learning begins.
I really like this contrast because it highlights the gap between using an LLM and actually learning. You may be able to use the LLM to pass college level courses in learning the language but unless you create friction, you actually won’t learn anything! There is definitely more nuance here but it’s food for thought
I like this analogy along with the idea that "it's not an autonomous robot, it's a mech suit."
Here's the thing -- I don't care about "getting stronger." I want to make things, and now I can make bigger things WAY faster because I have a mech suit.
edit: and to stretch the analogy, I don't believe much is lost "intellectually" by my use of a mech suit, as long as I observe carefully. Me doing things by hand is probably overrated.
The point of going to school is to learn all the details of what goes into making things, so when you actually make a thing, you understand how it’s supposed to come together, including important details like correct design that can support the goal, etc. That’s the “getting stronger” part that you can’t skip if you expect to be successful. Only after you’ve done the work and understand the details can you be successful using the power tools to make things.
The point of school for me was to get a degree. 99% of the time at school was useless. The internet was a much better learning resources. Even more so now that AI exists.
I graduated about 15 years ago. In that time, I’ve formed the opposite opinion. My degree - the piece of paper - has been mostly useless. But the ways of thinking I learned at university have been invaluable. That and the friends I made along the way.
I’ve worked with plenty of self taught programmers over the years. Lots of smart people. But there’s always blind spots in how they approach problems. Many fixate on tools and approaches without really seeing how those tools fit into a wider ecosystem. Some just have no idea how to make software reliable.
I’m sure this stuff can be learned. But there is a certain kind of deep, slow understanding you just don’t get from watching back-to-back 15 minute YouTube videos on a topic.
I think it depends on how they were self taught. If they just went through a few tutorials on YouTube and learned how to make a CRUD app using the shiny tool of the week, then sure. (I acknowledge this is a reduction in self-teaching — I myself am self-taught).
But if they actually spent time trying to learn architecture and how to build stuff well, either by reading books or via good mentorship on the job, then they can often be better than the folks who went to school. Sometimes even they don't know how to make software reliable.
I'm firmly in the middle. Out of the 6 engineers I work with on a daily basis (including my CTO), only one of us has a degree in CS, and he's not the one in an architecture role.
I do agree that learning how to think and learn is its own valuable skill set, and many folks learn how to do that in different ways.
>I’ve worked with plenty of self taught programmers over the years. Lots of smart people. But there’s always blind spots in how they approach problems.
I've worked with PhDs on projects (I'm self-taught), and those guys absolutely have blind spots in how they approach problems, plenty of them. Everyone does. What we produce together is better because our blind spots don't typically overlap. I know their weaknesses, and they know mine. I've also worked with college grads that overthink everything to the point they made an over-abstracted mess. YMMV.
>you just don’t get from watching back-to-back 15 minute YouTube videos on a topic.
This is not "self taught". I mean maybe it's one kind of modern-ish concept of "self taught" in an internet comment forum, but it really isn't. I watch a ton of sailing videos all day long, but I've never been on a sailboat, nor do I think I know how to sail. Everyone competent has to pay their dues and learn hard lessons the hard way before they get good at anything, even the PhDs.
For a motivated learner with access to good materials, schools provide two important things besides that very important piece of paper:
1. contacts - these come in the form of peers who are interested in the same things and in the form of experts in their fields of study. Talking to these people and developing relationships will help you learn faster, and teach you how to have professional collegial relationships. These people can open doors for you long after graduation.
2. facilities - ever want to play with an electron microscope or work with dangerous chemicals safely? Different schools have different facilities available for students in different fields. If you want to study nuclear physics, you might want to go to a school with a research reactor; it's not a good idea to build your own.
To extend 2. facilities, my experience had a - somewhat older and smaller - supercomputer that we got to run some stuff on.
And I'd argue for:
3. Realisation of the scope of computing.
IE Computers are not just phones/laptop/desktop/server with networking - all hail the wonders of the web... There are embedded devices, robots, supercomputers. (Recent articles on HN describe the computing power in a disposable vape!)
There are issues at all levels with all of these with algorithms, design, fabrication, security, energy, societal influence, etc etc - what tradeoffs to make where. (Why is there computing power in a disposable vape?!?)
I went in thinking I knew 20% and I would learn the other 80% of IT. I came out knowing 5 times as much but realising I knew a much smaller percentage of IT... It was both enabling and humbling.
But you can also meet experts at a company and get access to a company's machinery. To top it off the company pays you instead of you paying the school.
> Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it? — The Elements of Programming Style, 2nd edition, chapter 2
If you weren't even "clever enough" to write the program yourself (or, more precisely, if you never cultivated a sufficiently deep knowledge of the tools & domain you were working with), how do you expect to fix it when things go wrong? Chatbots can do a lot, but they're ultimately just bots, and they get stuck & give up in ways that professionals cannot afford to. You do still need to develop domain knowledge and "get stronger" to keep pace with your product.
Big codebases decay and become difficult to work with very easily. In the hands-off vibe-coded projects I've seen, that rate of decay was extremely accelerated. I think it will prove easy for people to get over their skis with coding agents in the long run.
I think this goes for many different kinds of projects. Take React, for example, or jQuery, or a multitude of other frameworks and libraries. They abstract out a lot of stuff and make it easier to build stuff! But we've also seen that with ease of building also comes ease of slop (I've seen many sloppily coded React code even before LLMs). Then react introduced hooks to hopefully reduce the slop and then somehow it got sloppy in other ways.
That's kinda how I see vibe coding. It's extremely easy to get stuff done but also extremely easy to write slop. Except now 10x more code is being generated thus 10x more slop.
Learning how to get quality robust code is part of the learning curve of AI. It really is an emergent field, changing every day.
This analogy works pretty well. Too much time doing everything in it and your muscles will atrophy. Some edge cases will be better if you jump out and use your hands.
There's also plenty of mech tales where the mech pilots need to spend as much time out of the suits making sure their muscles (and/or mental health) are in good strength precisely because the mechs are a "force multiplier" and are only as strong as their pilot. That's a somewhat common thread in such worlds.
Yes. Also, it's a fairly common trope that if you want to pilot a mech suit, you need to be someone like Tony Stark. He's a tinkerer and an expert. What he does is not a commodity. And when he loses his suit and access to his money? His big plot arc is that he is Iron Man. He built it in a cave out of a box of scraps, etc.
There are other fictional variants: the giant mech with the enormous support team, or Heinlein's "mobile infantry." And virtually every variantion on the Heinlein trope has a scene of drop commandos doing extensive pre-drop checks on their armor.
The actual reality is it isn't too had for a competent engineer to pair with Claude Code, if they're willing to read the diffs. But if you try to increase the ratio of agents to humans, dealing with their current limitations quickly starts to feel like you need to be Tony Stark.
Funny, because I was thinking of Evangelion's predecessor, Gunbuster, in which cadets are shown undergoing grueling physical training both in and out of their mechs to prepare for space combat.
I like the electric bike as a metaphor. You can go further faster, but you quickly find yourself miles from home and out of juice, and you ain't in shape enough to get that heavy bugger back.
As long as we're beating the metaphor... so don't do that? Make sure you charge the battery and that it has enough range to get you home, and bring the charger with you. Or in the LLMs case, make sure it's not generating a ball of mud (code). Refactor often, into discrete classes, and distinct areas of functionality, so that you're never miles from home and out of juice.
If observing was as good as doing, experience would mean nothing.
Thinking through the issue, instead of having the solve presented to you, is the part where you exercise your mental muscles. A good parallel is martial arts.
You can watch it all you want, but you'll never be skilled unless you actually do it.
OK, it’s a mech suit. The question under discussion is, do you need to learn to walk first, before you climb into it? My life experience has shown me you can’t learn things by “observing”, only by doing.
Yes, you can learn to walk in the mech suit. Let’s put one leg forward, then the next, good. You are now 100% production ready at walking. Let’s run a load test. You’re running now. Now you’re running into the ocean. “I want to swim now.” You’re absolutely right! You should be swimming. Since we don’t have a full implementation of swimming let me try flailing the arms while increasing leg speed. That doesn’t seem to work. The user is upside down on the ocean floor burrowing themselves into the silt. Task Complete. Summary: the user has learned to walk.
If all I know is the mech suit, I’ll struggle with tasks that I can’t use it for. Maybe even get stuck completely. Now it’s a skill issue because I never got my 10k hours in and I don’t even know what to observe or how to explain the outcome I want.
In true HN fashion of trading analogies, it’s like starting out full powered in a game and then having it all taken away after the tutorial. You get full powered again at the end but not after being challenged along the way.
This makes the mech suit attractive to newcomers and non-programmers, but only because they see product in massively simplified terms. Because they don’t know what they don’t know.
The mech suit works well until you need to maintain stateful systems. I've found that while initial output is faster, the AI tends to introduce subtle concurrency bugs between Redis and Postgres that are a nightmare to debug later. You get the speed up front but end up paying for it with a fragile architecture.
No, it's not a mech suit. A mech suit doesn't fire its canister rifle at friendly units and then say "You're absolutely right! I should have done an IFF before attacking that unit." (And if it did the engineer responsible should be drawn and quartered.) Mech-suit programming AI would look like something that reads your brainwaves and transduces them into text, letting you think your code into the machine. I'd totally use that if I had it.
Misusing a forklift might injure the driver and a few others; but it is unlikely to bring down an entire electric grid, expose millions to fraud and theft, put innocent people in prison, or jeopardize the institutions of government.
There is more than one kind of leverage at play here.
> Misusing a forklift might injure the driver and a few others; but it is unlikely to bring down an entire electric grid
That's the job of the backhoe.
(this is a joke about how diggers have caused quite a lot of local internet outages by hitting cables, sometimes supposedly "redundant" cables that were routed in the same conduit. Hitting power infrastructure is rare but does happen)
At my last job we had the power taken out by a backhoe. It was loaded onto a trailer and either the operator forgot to lower the bucket, or the driver drove away before he had time to lower it.
Regardless of whose fault it was, the end result was the bucket snagged the power lines going into the datacentre and caused an outage.
From an exercise standpoint, sure, but with sports there is more to it than just maximizing exercise.
If you practice judo you're definitely exercising but the goal is defeating your opponent. When biking or running you're definitely exercising but the goal is going faster or further.
From an an exercise optimization perspective you should be sitting on a spinner with a customized profile, or maybe do some entirely different motion.
If sitting on a carbon fiber bike, shaving off half a second off your multi-hour time, is what brings you joy and motivation then I say screw it to further justification. You do you. Just be mindful of others, as the path you ride isn't your property.
I think a better analogy is a marathon. If you're training for a marathon, you have to run. It won't help if you take the car. You will reach the finish line with minimal effort, but you won't gain any necessary muscles.
> This is the ultimate problem with AI in academia. We all inherently know that “no pain no gain” is true for physical tasks, but the same is true for learning. Struggling through the new concepts is essentially the point of it, not just the end result.
OK but then why even use Python, or C, or anything but Assembly? Isn't AI just another layer of value-add?
No, because AI is not deterministic. All those other tools are intentionally isomorphic with machine code, even if there's a lot of optimization going on under the hood. AI may generate code that's isomorphic with your prompt, but it also may not. And you have no way of telling the difference besides reading and understanding the code.
I think forklifts probably carry more weight over longer distances than people do (though I could be wrong, 8 billion humans carrying small weights might add up).
Certainly forklifts have more weight * distance when you restrict to objects that are over 100 pounds, and that seems like a good decision.
I think it's a good analogy. A forklift is a useful tool and objectively better than humans for some tasks, but if you've never developed your muscles because you use the forklift every time you go to the gym, then when you need to carry a couch up the stairs you'll find that you can't do it and the forklift can't either.
So the idea is that you should learn to do things by hand first, and then use the powerful tools once you're knowledgeable enough to know when they make sense. If you start out with the powerful tools, then you'll never learn enough to take over when they fail.
A forklift can do things no human can. I've used a forklift for things that no group of humans could - you can't physically get enough humans around that size object to lift it. (of course levers would change this)
Yeah, it's a great analogy. Pushing it even further: a forklift is superhuman, but only in specific environments that are designed for it. As soon as you're off of pavement a forklift can't do much. As soon as an object doesn't have somewhere to stick the forks you need to get a bunch of other equipment to get the forklift to lift it.
You're making the analogy work: because the point of weightlifting as a sport or exercise is to not to actually move the weights, but condition your body such that it can move the weights.
Indeed, usually after doing weightlifting, you return the weights to the place where you originally took them from, so I suppose that means you did no work at in the first place..
That's true of exercise in general. It's bullshit make-work we do to stay fit, because we've decoupled individual survival from hard physical labor, so it doesn't happen "by itself" anymore. A blessing and a curse.
Wondering why the obvious solution isn’t applied here - instead of giving already well known problems that have been solved thousand times give students open research opportunities- stuff which is on the edge of being possible, no way to cheat with Ai. And if Ai is able to solve those - give harder tasks
The same reason we give beginner math students addition and subtraction problems, not Fermat’s last theorem?
There has to be a base of knowledge available before the student can even comprehend many/most open research questions, let alone begin to solve them. And if they were understandable to a beginner, then I’d posit the LLM models available today would also be capable of doing meaningful work.
The real challenge will be that people almost always pick the easier path.
We have a decent sized piece of land and raise some animals. People think we're crazy for not having a tractor, but at the end of the day I would rather do it the hard way and stay in shape while also keeping a bit of a cap on how much I can change or tear up around here.
I've been showing my students this video of a robot lifting weights to illustrate why they shouldn't use AI to do their homework. It's obvious to them the robot lifting weights won't make them stronger.
Yes but the goal of school is to lift heavy things, basically. You're trying to do things that are difficult (for you) but don't produce anything useful for anyone else. That's how you gain the ability to do useful things.
Let's just accept that this weight lifting metaphor is leaky, like any other, and brings us to absurds like forklift operators need to lift dumbbells to keep relevant in their jobs.
Forklift operators need to do something to exercise. They sit in the seat all day. At least as a programmer I have a standing desk. This isn't relevant to the job though.
> At least as a programmer I have a standing desk.
When I stand still for hours at a time, I end up with aching knees, even though I'd have no problem walking for that same amount of time. Do you experience anything like that?
I kinda get the point, but why is that? The goal of school is to teach something that's applicable in industry or academia.
Forklift operators don't lift things in their training. Even CS students start with pretty high level of abstraction, very few start from x86 asm instructions.
We need to make them implement ALU's on logical gates and wires if we want them to lift heavy things.
We begin teaching math by having students solve problems that are trivial for a calculator.
Though I also wonder what advanced CS classes should look like. If they agent can code nearly anything, what project would challenge student+agent and teach the student how to accomplish CS fundamentals with modern tools.
In one of my college classes, after you submitted your project you'd have a short meeting with a TA and/or the professor to talk through your solution. For a smaller advanced class I think this kind of thing is feasible and can help prevent blind copy/pasting. If you wrote your code with an LLM but you're still able to have a knowledgeable conversation about it, then great, that's what you're going to do in the real world too. If you can't answer any questions about it and it seems like you don't understand your own code, then you don't get a good grade even if it works.
As an added bonus, being able to discuss your code with another engineer that wasn't involved in writing it is an important skill that might not otherwise be trained in college.
I had my first interview last week where I finally saw this in the wild. It was a student applying for an internship. It was the strangest interview. They had excellent textbook knowledge. They could tell you the space and time complexities of any data structure, but they couldn't explain anything about code they'd written or how it worked. After many painful and confusing minutes of trying to get them to explain, like, literally anything about how this thing on their resume worked, they finally shrugged and said that "GenAI did most of it."
It was a bizarre disconnect having someone be both highly educated and yet crippled by not doing.
The students had memorized everything, but understood nothing. Add in access to generative AI, and you have the situation that you had with your interview.
It's a good reminder that what we really do, as programmers or software engineers or what you wanna call it, is understanding how computers and computations work.
Hmmm, I think we're more likely to face an Idiocracy outcome. We need more Geordi La Forges out there, but we've got a lot of Fritos out here vibe coding the next Carl's Jr. locating app instead
Star Trek illustrated the issue nicely in the scene where Scotty, who we should remember is an engineer, tries to talk to a computer mouse in the 20th century: https://www.youtube.com/watch?v=hShY6xZWVGE
More like using a calculator but not being able to explain how to do the calculation by hand. A probabilistic calculator which is sometimes wrong at that. The "lots of theory but no practice" has always been true for a majority of graduates in my experience.
Surely, new grads are light on experience (particularly relevant experience), but they should have student projects and whatnot that they should be able to explain, particularly for coding. Hardware projects are more rare simply because they cost money for parts and schools have limited budgets, but software has far fewer demands.
Wait, so they could say, write a linked list out, or bubble sort, but not understand what it was doing? like no mental model of memory, registers, or intuition for execution order, or even conceptual like a graph walk, or something? Like just "zero" on the conceptual front, but could reproduce data structures, some algorithm for accessing or traversing, and give rote O notation answers about how long execution takes ?
Just checking I have that right... is that what you meant?
I think that's what you were implying but it's just want to check I have that right? if so
What you as a teacher teach might have to adapt a bit. Teaching how code works is more important than teaching how to code. Most academic computer scientists aren't necessarily very skilled as programmers in any case. At least, I learned most of that after I stopped being an academic myself (Ph. D. and all). This is OK. Learning to program is more of a side effect of studying computer science than it is a core goal (this is not always clearly understood).
A good analogy here is programming in assembler. Manually crafting programs at the machine code level was very common when I got my first computer in the 1980s. Especially for games. By the late 90s that had mostly disappeared. Games like Roller Coaster Tycoon were one of the last ones with huge commercial success that were coded like that. C/C++ took over and these days most game studios license an engine and then do a lot of work with languages like C# or LUA.
I never did any meaningful amount of assembler programming. It was mostly no longer a relevant skill by the time I studied computer science (94-99). I built an interpreter for an imaginary CPU at some point using a functional programming language in my second year. Our compiler course was taught by people like Eric Meyer (later worked on things like F# at MS) who just saw that as a great excuse to teach people functional programming instead. In hindsight, that was actually a good skill to have as functional programming interest heated up a lot about 10 years later.
The point of this analogy: compilers are important tools. It's more important to understand how they work than it is to be able to build one in assembler. You'll probably never do that. Most people never work on compilers. Nor do they build their own operating systems, databases, etc. But it helps to understand how they work. The point of teaching how compilers work is understanding how programming languages are created and what their limitations are.
> Teaching how code works is more important than teaching how to code.
People learn by doing. There's a reason that "do the textbook problems" is somewhat of a meme in the math and science fields - because that's the way that you learn those things.
I've met someone who said that when he get a textbook, he starts by only doing the problems, and skipping the chapter content entirely. Only when he has significant trouble with the problems (i.e. he's stuck on a single one for several hours) does he read the chapter text.
He's one of the smartest people I know.
This is because you learn by doing the problems. In the software field, that means coding.
Telling yourself that you could code up a solution is very different than actually being able to write the code.
And writing the code is how you build fluency and understanding as to how computers actually work.
> I never did any meaningful amount of assembler programming. It was mostly no longer a relevant skill by the time I studied computer science (94-99). I built an interpreter for an imaginary CPU at some point using a functional programming language in my second year.
Same thing for assembly. Note that you built an interpreter for an imaginary CPU - not a real one, as that would have been a much harder challenge given that you didn't do any meaningful amount of assembly program and didn't understand low-level computer hardware very well.
Obviously, this isn't to say that information about how a system works can't be learned without practice - just that that's substantially harder and takes much more time (probably 3-10x), and I can guarantee you that those doing vibecoding are not putting in that extra time.
I agree with you in part, you can’t expect to learn something like coding without the doing.
The brave new world is that you no longer have to do “coding” in our sense of the word. The doing, and what exercises you should learn with have both changed.
Now students should build whole systems, not worry about simple Boolean logic and program flow. The last programmer to ever need to write an if statement may already be in studies.
> The brave new world is that you no longer have to do “coding” in our sense of the word.
Notice how I also talked about coding being a way that you learn how computers work.
If you don't code, you have a very hard time understanding how computers work.
And while there's some evidence that programmers may not need write all of their code by hand, there's zero evidence that either they don't need to learn how to code at all (as you're claiming), or that they don't need to even know how computers work (which is a step further).
There's tons of anecdotes from senior software engineers on Hacker News (and elsewhere) about coding agents writing bad code that they need to debug and fix by hand. I've literally never seen a single story about how a coding agent built a nontrivial program by itself without the prompter looking at the code.
> The point of this analogy: compilers are important tools. It's more important to understand how they work than it is to be able to build one in assembler. You'll probably never do that. Most people never work on compilers. Nor do they build their own operating systems, databases, etc. But it helps to understand how they work. The point of teaching how compilers work is understanding how programming languages are created and what their limitations are.
I don't know that it's all these things at once, but most people I know that are good have done a bunch of spikes / side projects that go a level lower than they have to. Intense curiosity is good, and to the point your making, most people don't really learn this stuff just by reading or doing flash cards. If you want to really learn how a compiler works, you probably do have to write a compiler. Not a full-on production ready compiler, but hands on keyboard typing and interacting with and troubleshooting code.
Or maybe to put another way, it's probably the "easiest" way, even though it's the "hardest" way. Or maybe it's the only way. Everything I know how to do well, I know how to do well from practice and repitition.
A million percent! I was so bad at Math in school. Which I primarily blame on the arbitrary way in which we were taught it. It wasn't until I was able to apply it to solving actual problems that it clicked.
Which is a pretty big failure of somewhere in the education pipeline -- don't expect a science program to do what a trade is there for! (to be clear, I'm not trying to say the students are wrong in choosing CS in order to get a good coding job, but somewhere, expectations and reality are misaligned here. Perhaps with companies trying to outsource their training to universities while complaining that the training isn't spot-on for what they need?)
When I did a CS major, there was a semester of C, a semester of assembly, a semester of building a verilog CPU, etc. I’d be shocked if an optimal CS education involved vibecoding these courses to any significant
> A good analogy here is programming in assembler. Manually crafting programs at the machine code level was very common when I got my first computer in the 1980s. Especially for games. By the late 90s that had mostly disappeared.
Indeed, a lot of us looked with suspicion and disdain at people that used those primitive compilers that generated awful, slow code. I once spent ages hand-optimizing a component that had been written in C, and took great pleasure in the fact I could delete about every other line of disassembly...
When I wrote my first compiler a couple of years later, it was in assembler at first, and supported inline assembler so I could gradually convert to bootstrap it that way.
Because I couldn't imagine writing it in C, given the awful code the C compilers I had available generated (and how slow they were)...
These days most programmers don't know assembler, and increasingly don't know languaes as low level as C either.
And the world didn't fall apart.
People will complain that it is necessary for them to know the languages that will slowly be eaten away by LLMs, just like my generation argued it was absolutely necessary to know assembler if you wanted to be able to develop anything of substance.
I agree with you people should understand how things work, though, even if they don't know it well enough to build it from scratch.
Not only that, it's constitution. I'm finding this with myself. After vibe coding for a month or so I let my subscription expire. Now when I look at the code it's like "ugh you mean now I have to think about this with my own brain???"
Even while vibe-coding, I often found myself getting annoyed just having to explain things. The amount of patience I have for anything that doesn't "just work" the first time has drifted toward zero. If I can't get AI to do the right thing after three tries, "welp, I guess this project isn't getting finished!"
It's not just laziness, it's like AI eats away at your pride of ownership. You start a project all hyped about making it great, but after a few cycles of AI doing the work, it's easy to get sucked into, "whatever, just make it work". Or better yet, "pretend to make it work, so I can go do something else."
When learning basic math, you shouldn't use a calculator, because otherwise you aren't really understanding how it works. Later, when learning advanced math, you can use calculators, because you're focusing on a different abstraction level. I see the two situations as very similar.
What abstraction levels do you expect will remain only in the Human domain?
The progression from basic arithmetic, to complex ratios and basic algebra, graphing, geometry, trig, calculus, linear algebra, differential equations… all along the way, there are calculators that can help students (wolfram alpha basically). When they get to theory, proofs, etc… historically, thats where the calculator ended, but now there’s LLMs… it feels like the levels of abstractions without a “calculator” are running out.
The compiler was the “calculator” abstraction of programming, and it seems like the high-level languages now have LLMs to convert NLP to code as a sort of compiler. Especially with the explicitly stated goal of LLM companies to create the “software singularity”, I’d be interested to hear the rationale for abstractions in CS which will remain off limits to LLMs.
I see junior devs hyping vibe coding and senior devs mostly using AI as an assistant. I fall in the latter camp myself.
I've hired and trained tons of junior devs out of university. They become 20x productive after a year of experience. I think vibe coding is getting new devs to 5x productivity, which seems amazing, but then they get stuck there because they're not learning. So after year one, they're a 5x developer, not a 20x developer like they should be.
I have some young friends who are 1-3 years into software careers I'm surprised by how little they know.
If I find myself writing code in a way that has me saying to myself "there has to be a better way," there usually is. That's when I could present AI with that little bit of what I want to write. What I've found to be important is to describe what I want in natural language. That's when AI might introduce me to a better way of doing things. At that point, I stop and learn all that I can about what the AI showed me. I look it up in books and trusted online tutorials to make sure it is the proper way to do it.
I remember reading about a metal shop class, where the instructor started out by giving each student a block of metal, and a file. The student had to file an end wrench out of the block. Upon successful completion, then the student would move on to learning about the machine tools.
The idea was to develop a feel for cutting metal, and to better understand what the machine tools were doing.
--
My wood shop teacher taught me how to use a hand plane. I could shave off wood with it that was so thin it was transparent. I could then join two boards together with a barely perceptible crack between them. The jointer couldn't do it that well.
Also, in college, I'd follow the derivation that the prof did on the chalkboard, and think I understood it. Then, doing the homework, I'd realize I didn't understand it at all. Doing the homework myself was where the real learning occurred.
This concept can be taken to ridiculous extremes, where learning the actual useful skill takes too long for most participants to get to. For example, the shop class teacher taking his students out into the wilderness to prospect for ore, then building their own smelter, then making their own alloy, then forging billet, etc.
In middle school (I think) we spent a few days in math class hand-calculating trigonometry values (cosine, sin, etc.). Only after we did that did our teacher tell us that the mandated calculators that we all have used for the last few months have a magic button that will "solve" for the values for you. It definitely made me appreciate the calculator more!
This is a good point. Letting people learning to code to use AI, would be like letting 6 to 10 yo in school just use pocket calculators and not learn to do basic arithmetic manually. Yes IRL you will have a calculator at hand, yes, the calculator will make less mistakes, still, for you to learn und understand, you have to do it manually.
Same with essay assignments, you exercise different neural pathways by doing it yourself.
Recently in comments people were claiming that working with LLMs has sharpened their ability to organize thoughts, and that could be a real effect that would be interesting to study. It could be that watching an LLM organize a topic could provide a useful example of how to approach organizing your own thoughts.
But until you do it unassisted you haven’t learned how to do it.
The natural solution is right there in front of us but we hate to admit it because it still involves LLMs and changes on the teaching side. Just raise the bar until they struggle.
"Why think when AI do trick?" is an extremely alluring hole to jump headfirst into. Life is stressful, we're short on time, and we have obligations screaming in our ear like a crying baby. It seems appropriate to slip the ring of power onto your finger to deal with the immediate situation. Once you've put it on once, there is less mental friction to putting it on the next time. Over time, gently, overuse leads to the wearer cognitively deteriorating into a Gollum.
I haven't done long division in decades, am probably unable to do it anymore, and yet it has never held me back in any tangible fashion (and won't unless computers and calculators stop existing)
That makes sense. Some skills just have more utility than others. There are skills that are universally relevant (e.g. general problem solving), and then there are skills that are only relevant in a specific time period or a specific context.
With how rapidly the world has been changing lately, it has become difficult to estimate which of those more specific skills will remain relevant for how long.
I am rather positive that if you were sat down in a room and couldn't leave unless you did some mildly complicated long division, you would succeed. Just because it isn't a natural thing anymore and you have not done the drills in decades doesn't mean the knowledge is completely lost.
If you are concerned that embedding "from first-principles" reasoning in widely-available LLM's may create future generations that cannot, then I share your concern. I also think it may be overrated. Plenty of people "do division" without quite understanding how it all works (unfortunately).
And plenty of people will still come along who love to code despite AI's excelling at it. In fact, calling out the AI on bad design or errors seems to be the new "code golf".
They don't always do the simple things well which is even more frustrating.
I do Windows development and GDI stuff still confuses me. I'm talking about memory DC, compatible DC, DIB, DDB, DIBSECTION, bitblt, setdibits, etc... AIs also suck at this stuff. I'll ask for help with a relatively straightforward task and it almost always produces code that when you ask it to defend the choices it made, it finds problems, apologizes, and goes in circles. One AI (I forget which) actually told me I should refer to Petzold's Windows Programming book because it was unable to help me further.
Part of the issue here is that you can look at something and think "oh yeah I understand that, it makes perfect sense!", but then completely fail to reproduce it yourself.
Agreed. I think the divide is between code-as-thinking and code-as-implementation. Trivial assignments and toy projects and geeking out over implementation details are necessary to learn what code is, and what can be done with it. Otherwise your ideas are too vague to guide AI to an implementation.
Without the clarity that comes from thinking with code, a programmer using AI is the blind leading the blind.
The social aspect of a dialogue is relaxing, but very little improvement is happening. It's like a study group where one (relatively) incompetent student tries to advise another, and then test day comes and they're outperformed by the weirdo that worked alone.
But what has changed? Students never had a natural reason to learn how to write fizz buzz. It's been done before and its not even useful. There has always been a arbitrary nature to these exercises.
I actually fear more for the middle-of-career dev who has shunned AI as worthless. It's easier than ever for juniors to learn and be productive.
Yes! You are best served by learning what a tool is doing for you by doing it yourself or carefully studying what it uses and obfuscates from you before using the tool. You don't need to construct an entire functioning processor in an HDL, but understanding the basics of digital logic and computer architecture matters if you're EE/CompE. You don't have to write an OS in asm, but understanding assembly and how it gets translated into binary and understanding the basics of resource management, IPC, file systems, etc. is essential if you will ever work in something lower level. If you're a CS major, algorithms and data structures are essential. If you're just learning front end development on your own or in a boot camp, you need to learn HTML and the DOM, events, how CSS works, and some of the core concepts of JS, not just React. You'll be better for it when the tools fail you or a new tool comes along.
Lots of interesting ways to spin this. I was in a computer science course in the late 90s and we were not allowed to use the C++ standard library because it made you a "lazy programmer" according to the instructor. I'm not sure if I agree with that, but the way that I look at it is that computer science all about abstraction, and it seems to me that AI, generative pair programming, vibe coding or what ever you want to call it is just another level of abstraction. I think what is probably more important is to learn what are and are not good programming and project structures and use AI to abstract the boilerplate,. scaffolding, etc so that you can avoid foot guns early on in your development cycle.
The counterargument here is that there is a distinction between an arbitrary line in the sand (C++ stdlb is bad) and using a text-generating machine to perform work for you, beginning to end. You are correct that as a responsibly used tool, LLMs offer exceptional utility and value. Though keep in sight the laziness of humans who focus on the immediate end result over the long-term consequences.
It's the difference between the employee who copy-pastes all of their email bodies from ChatGPT versus the one who writes a full draft themselves and then asks an LLM for constructive feedback. One develops skills while the other atrophies.
That's why it's so important to teach how to use them properly instead of demonizing them. Let's be realistic, they are not going to disappear and students and workers are not stopping using them.
I was so lucky to land in a CS class where we were writing C++ by hand. I don't think that exists anymore, but it is where I would go in terms of teaching CS from first principles
The problem is: now they also need to learn to code with an LLM assistant. That goes beyond "coding it by yourself". Well, it's different, anyway. Another skill to teach.
I'm not so sure.
I spent A LOT of time writing sorting algo code by hand in university.
I spent so much time writing assembly code by hand.
So much more time writing instructions for MIPS by hand.
(To be fair I did study EE not CS)
I learned more about programming in a weekend badly copying hack modules for Minecraft than I learned in 5+ years in university.
All that stuff I did by hand back then I haven't used it a single time after.
I would interpret his take a little bit differently.
You write sorting algorithms in college to understand how they work. Understand why they are faster because it teaches you a mental model for data traversal strategies. In the real world, you will use pre-written versions of those algorithms in any language but you understand them enough to know what to select in a given situation based on the type of data. This especially comes into play when creating indexes for databases.
What I take the OPs statement to mean are around "meta" items revolved more around learning abstractions. You write certain patterns by hand enough times, you will see the overlap and opportunity to refactor or create an abstraction that can be used more effectively in your codebase.
If you vibe code all of that stuff, you don't feel the repetition as much. You don't work through the abstractions and object relationships yourself to see the opportunity to understand why and how it could be improved.
You didn't write sorting code or assembly code because you were going to need to write it on the job. It gave you a grounding for how datastructures and computers work on a fundamental level. That intuition is what makes picking up minecraft hack mods much easier.
That's the koolaid, but seriously I don't really believe it anymore.
I only had to do this leg work during university to prove that I can be allowed to try and write code for a living.
The grounding as you call it is not required for that at all,since im a dozen levels of abstraction removed from it.
It might be useful if I was a researcher or would work on optimizing complex cutting edge stuff, but 99% of what I do is CRUD apps and REST Apis. That stuff can safely be done by anyone, no need for a degree.
Tbf I'm from Germany so in other places they might allow you to do this job without a degree
But nobody go to college specifically training to do CRUD apps. The point is to give you broad training so that you can do CRUD apps and other stuff too. It is a very bad idea to give extremely specific training at scale, because then you get a workforce that has difficulty adapting to changes. It's like trying to manage a planned economy: there is no point in trying to predict exactly what jobs you will get, so let's make sure you can handle whatever's thrown at you.
Sure (knowing the underlying ideas and having proficiency in their application) - but producing software by conducting(?) LLMs is rapidly becoming a wide, deep and must-have skill and the lack thereof will be a weakness in any student entering the workplace.
Similarly, it's always been the case that copy-pasting code out of a tutorial doesn't teach you as much much as manually typing it out, even if you don't change it. That part of the problem isn't even new.
AI does have an incredibly powerful influence on learning. It can absolutely be used as a detriment, but it can also be just as powerful of a learning tool. It all comes down to keeping the student in the zone of proximal development.
If AI is used by the student to get the task done as fast as possible the student will miss out on all the learning (too easy).
If no AI is used at all, students can get stuck for long periods of time on either due to mismatches between instructional design and the specific learning context (missing prereq) or by mistakes in instructional design.
AI has the potential to keep all learners within an ideal difficulty for optimal rate of learning so that students learn faster. We just shouldn't be using AI tools for productivity in the learning context, and we need more AI tools designed for optimizing learning ramps.
Compilers are deterministic and have predictable, dependable behavior. And people can and do still write lower level code when it matters, because they understand when it matters.
Yes, exactly. I'm having a frustrating time reminding senior teachers of this, people with authority who should really know better. There seems to be some delusion that this technology will somehow change how people learn in a fundamental way.
It doesn't PREVENT them from learning anything - said properly, it lets developers become lazy and miss important learning opportunities. That's not AIs fault.
I'm taking CS in college right now, and when we do our projects we're required to have a editor plugin that records every change made. That way when they grade it, they see how the code evolved over time, and not just the final product. Copying and posting has very distinct editor patterns, where organically developed code tends to morph over time.
I looked to see if BYU had made the source code available, but it doesn't look like they've published it. It's called code recorder, and before we do an assignment we have to enable recording. It generates a .json file that lists every single edit made in terms of a textual diff. They must have some sort of tool that reconstructs it when they grade. Sorry I don't know more!
Edit: I expect it wouldn't be super hard to create though, you'd just have to hook into the editor's change event, probably compute the diff to make sure you don't lose anything, and then append it to the end of the json.
I think it's fair for the projects, since when you first write code you're learning to think like a computer. Their AI policy is it's fine to ask it questions and have it explain concepts, but the project assignments need to be done without AI.
The one requirement I think is dumb though is we're not allowed to use the language's documentation for the final project, which makes no sense. Especially since my python is rusty.
Since you mentioned failure to figure out what better teaching methods are, I feel it's my sworn duty to put a plug for https://dynamicland.org and https://folk.computer, if you haven't heard about them :)
If I was a prof, I would make it clear to the students that they won't learn to program if they use AI to do it for them. For the students who wanted to learn, great! For those who just wanted to slide through with AI, I wouldn't care about them.
In-person analog checkpoints seem to be the most effective method. Think internet-disabled PCs managed by the school, written exams, oral exams, and so forth.
Making students fix LLM-generated code until they're at their wits' end is a fun idea. Though it likely carries too high of an opportunity cost education-wise.
Completely disagree. It’s like telling typists that they need to hand write to truly understand their craft. Syntax is just a way of communicating a concept to the machine. We now have a new (and admitidly imperfect) way of doing that. New skills are going to be required. Computer science is going to have to adapt.
I'm an external examiner for CS students in Denmark and I disagree with you. What we need in the industry is software engineers who can think for themselves, can interact with the business and understand it's needs, and, they need to know how computers work. What we get are mass produced coders who have been taught some outdated way of designing and building software that we need to hammer out of them. I don't particularily care if people can write code like they work at the assembly line. I care that they can identify bottlenecks and solve them. That they can deliver business value quickly. That they will know when to do abstractions (which is almost never). Hell, I'd even like developers who will know when the code quality doesn't matter because shitty code will cost $2 a year but every hour they spend on it is $100-200.
Your curriculum may be different than it is around here, but here it's frankly the same stuff I was taught 30 years ago. Except most of the actual computer science parts are gone, replaced with even more OOP, design pattern bullshit.
That being said. I have no idea how you'd actually go about teaching students CS these days, considering a lot of them will probably use ChatGPT or Claude regardless of what you do. That is what I see in the statistic for grades around here. For the first 9 years I was a well calibrated grader, but these past 1,5ish years it's usually either top marks or bottom marks with nothing in between. Which puts me outside where I should be, but it matches the statistical calibration for everyone here. I obviously only see the product of CS educations, but even though I'm old, I can imagine how many corners I would have cut myself if I had LLM's available back then. Not to mention all the distractions the internet has brought.
> I don't particularily care if people can write code like they work at the assembly line. I care [...] That they can deliver business value quickly.
In my experience, people who talk about business value expect people to code like they work at the assembly line. Churn out features, no disturbances, no worrying about code quality, abstractions, bla bla.
To me, your comment reads contradictory. You want initiative, and you also don't want initiative. I presume you want it when it's good and don't want it when it's bad, and if possible the people should be clairvoyant and see the future so they can tell which is which.
I think we very often confuse engineers with scientists in this field. Think of the old joke: “anyone can build a bridge, it takes an Engineer to build one that barely stands”. Business value and the goal of engineering is to make a bridge that is fast to build, cheap to make, and stays standing exactly as long as it needs to. This is very different from the goals of science which are to test the absolute limits of known performance.
What I read from GP is that they’re looking for engineering innovation, not new science. I don’t see it as contradictory at all.
> You want initiative, and you also don't want initiative. I presume you want it when it's good and don't want it when it's bad, and if possible the people should be clairvoyant and see the future so they can tell which is which.
The word you’re looking for is skill. He wants devs to be skilled. I wouldn’t thought that to be controversial but hn never ceases to amaze
You should worry about code quality, but you should also worry about the return on investment.
That includes understanding risk management and knowing what the risks and costs are of failures vs. the costs of delivering higher quality.
Engineering is about making the right tradeoffs given the constraints set, not about building the best possible product separate from the constraints.
Sometimes those constraints requires extreme quality, because it includes things like "this should never, ever fail", but most of the time it does not.
Some of our code is of high quality. Other can be of any quality as it'll never need to be altered in it's lifecycle. If we have 20000 financial reports which needs to be uploaded once, and then it'll never happen again, it really doesn't matter how terrible the code is as long as it only uses vetted external dependencies. The only reason you'd even use developer time on that task is because it's less errorprone than having student interns do it manually... I mean, I wish I could tell you it was to save them from a terrible task, but it'll solely be because of money.
If it's firmware for a solar inverter in Poland, then quality matters.
> people who talk about business value expect people to code like they work at the assembly line. Churn out features, no disturbances, no worrying about code quality, abstractions, bla bla.
That's typical misconception that "I'm an artist, let me rewrite in Rust" people often have. Code quality has a direct money equivalent, you just need to be able to justify it for people that pay you salary.
> That being said. I have no idea how you'd actually go about teaching students CS these days, considering a lot of them will probably use ChatGPT or Claude regardless of what you do.
My son is in a CS school in France. They have finals with pen and paper, with no computer whatsoever during the exam; if they can't do that they fail. And these aren't multiple choice questions, but actual code that they have to write.
I had to do that too, in Norway. Writing C++ code with pen and paper and being told even trivial syntax errors like missing semicolons would be penalised was not fun.
This was 30 years ago, though - no idea what it is like now. It didn't feel very meaningful even then.
But there's a vast chasm between that and letting people use AI in an exam setting. Some middle ground would be nice.
I wrote assembler on pages of paper. Then I used tables, and a calculator for the two's-complement relative negative jumps, to manually translate it into hex code. Then I had software to type in such hex dumps and save them to audio cassette, from which I could then load them for execution.
I did not have an assembler for my computer. I had a disassembler though- manually typed it in from a computer magazine hex dump, and saved it on an audio cassette. With the disassembler I could check if I had translated everything correctly into hex, including the relative jumps.
The planning required to write programs on sheets of paper was very helpful. I felt I got a lot dumber once I had a PC and actual programmer software (e.g. Borland C++). I found I was sitting in front of an empty code file without a plan more often than not, and wrote code moment to moment, immediately compiling and test running.
The AI coding may actually not be so bad if it encourages people to start with high-level planning instead of jumping into the IDE right away.
Now if only you had read to the end of my comment, to recognize that I was setting up for something, and also applied not just one but several HN guidelines (https://news.ycombinator.com/newsguidelines.html, under "comments")...
Let them use AI and then fall on their faces during exam time - simple as that. If you can't recall the theory, paradigm, methodology, whatever by memory then you have not "mastered" the content and thus, should fail the class.
The only way to learn when abstractions are needed is to write code, hit a dead end, then try and abstract it. Over and over. With time, you will be able to start seeing these before you write code.
AI does not do abstractions well. From my experience, it completely fails to abstract anything unless you tell it to. Even when similar abstractions are already present. If you never learn when an abstraction is needed, how can you guide an AI to do the same well?
> Hell, I'd even like developers who will know when the code quality doesn't matter because shitty code will cost $2 a year but every hour they spend on it is $100-200.
> Except most of the actual computer science parts are gone, replaced with even more OOP, design pattern bullshit.
Maybe you should consider a different career, you sound pretty burnt out. There are terrible takes, especially for someone who is supposed to be fostering the next generation of developers.
I don't foster the next generations. I hire them. External examiners are people in the industry who are used as examiners to try and match educations with the needs of the industry.
It can take some people a few years to get over OOP, in the same way that some kids still believe in Santa a bit longer. Keep at it though and you’ll make it there eventually too.
I feel like I'm taking crazy pills. The article starts with:
> you give it a simple task. You’re impressed. So you give it a large task. You’re even more impressed.
That has _never_ been the story for me. I've tried, and I've got some good pointers and hints where to go and what to try, a result of LLM's extensive if shallow reading, but in the sense of concrete problem solving or code/script writing, I'm _always_ disappointed. I've never gotten satisfactory code/script result from them without a tremendous amount of pushback, "do this part again with ...", do that, don't do that.
Maybe I'm just a crank with too many preferences. But I hardly believe so. The minimum requirement should be for the code to work. It often doesn't. Feedback helps, right. But if you've got a problem where a simple, contained feedback loop isn't that easy to build, the only source of feedback is yourself. And that's when you are exposed to the stupidity of current AI models.
I usually do most of the engineering and it works great for writing the code. I’ll say:
> There should be a TaskManager that stores Task objects in a sorted set, with the deadline as the sort key. There should be methods to add a task and pop the current top task. The TaskManager owns the memory when the Task is in the sorted set, and the caller to pop should own it after it is popped. To enforce this, the caller to pop must pass in an allocator and will receive a copy of the Task. The Task will be freed from the sorted set after the pop.
> The payload of the Task should be an object carrying a pointer to a context and a pointer to a function that takes this context as an argument.
> Update the tests and make sure they pass before completing. The test scenarios should relate to the use-case domain of this project, which is home automation (see the readme and nearby tests).
Yeah, I feel like I get really good results from AI, and this is very much how I prompt as well. It just takes care of writing the code, making sure to update everything that is touched by that code guided by linters and type-checkers, but it's always executing my architecture and algorithm, and I spend time carefully trying to understand the problem before I even begin.
But this is what I don't get. Writing code is not that hard. If the act of physically typing my code out is a bottleneck to my process, I am doing something wrong. Either I've under-abstracted, or over-abstracted, or flat out have the wrong abstractions. It's time to sit back and figure out why there's a mismatch with the problem domain and come back at it from another direction.
To me this reads like people have learned to put up with poor abstractions for so long that having the LLM take care of it feels like an improvement? It's the classic C++ vs Lisp discussion all over again, but people forgot the old lessons.
It's not that hard, but it's not that easy. If it was easy, everyone would be doing it. I'm a journalist who learned to code because it helped me do some stories that I wouldn't have done otherwise.
But I don't like to type out the code. It's just no fun to me to deal with what seem to me arbitrary syntax choices made by someone decades ago, or to learn new jargon for each language/tool (even though other languages/tools already have jargon for the exact same thing), or to wade through someone's undocumented code to understand how to use an imported function. If I had a choice, I'd rather learn a new human language than a programming one.
I think people like me, who (used to) code out of necessity but don't get much gratification out of it, are one of the primary targets of vibe coding.
I'm pretty damn sure the parent, by saying "writing code" meant the physical act of pushing down buttons to produce text, not the problem solving process that preceeds writing said code.
I think of it more like moving from sole developer to a small team lead. Which I have experienced in my career a few times.
I still write my code in all the places I care about, but I don’t get stuck on “looking up how to enable websockets when creating the listener before I even pass anything to hyper.”
I do not care to spend hours or days to know that API detail from personal pain, because it is hyper-specific, in both senses of hyper-specific.
(For posterity, it’s `with_upgrades`… thanks chatgpt circa 12 months ago!)
I haven't tried it, but someone at work suggested using voice input for this because it's so much easier to add details and constraints. I can certainly believe it, but I hate voice interfaces, especially if I'm in an open space setting.
You don't even have to be as organised as in the example, LLMs are pretty good at making something out of ramblings.
The more accurate prompt would be “You are a mind reader. Create me a plan to create a task manager, define the requirements, deploy it, and tell me when it’s done.”
And then you just rm -rf and repeat until something half works.
"Here are login details to my hosting and billing provider. Create me a SaaS app where customers could rent virtual pets. Ensure it's AI and blockchain and looks inviting and employ addictive UX. I've attached company details for T&C and stuff. Ensure I start earning serious money by next week. I'll bump my subscription then if you deliver, and if not I will delete my account. Go!"
This is a good start. I write prompts as if I was instructing junior developer to do stuff I need. I make it as detailed and clear as I can.
I actually don't like _writing_ code, but enjoy reading it. So sessions with LLM are very entertaining, especially when I want to push boundaries (I am not liking this, the code seems a little bit bloated. I am sure you could simplify X and Y. Also think of any alternative way that you reckon will be more performant that maybe I don't know about). Etc.
This doesn't save me time, but makes work so much more enjoyable.
> I actually don't like _writing_ code, but enjoy reading it.
I think this is one of the divides between people who like AI and people who don't. I don't mind writing code per se, but I really don't like text editing — and I've used Vim (Evil mode) and then Emacs (vanilla keybindings) for years, so it's not like I'm using bad tools; it's just too fiddly. I don't like moving text around; munging control structures from one shape to another; I don't like the busy work of copying and pasting code that isn't worth DRYing, or isn't capable of being DRY'd effectively; I hate going around and fixing all the little compiler and linter errors produced by a refactor manually; and I really hate the process of filling out the skeleton of an type/class/whatever architecture in a new file before getting to the meat.
However, reading code is pretty easy for me, and I'm very good at quickly putting algorithms and architectures I have in my head into words — and, to be honest, I often find this clarifies the high level idea more than writing the code for it, because I don't get lost in the forest — and I also really enjoy taking something that isn't quite good enough, that's maybe 80% of the way there, and doing the careful polishing and refactoring necessary to get it to 100%.
I don't want to be "that guy", but I'll indulge myself.
> I think this is one of the divides between people who like AI and people who don't. I don't mind writing code per se, but I really don't like text editing — and I've used Vim (Evil mode) and then Emacs (vanilla keybindings) for years, so it's not like I'm using bad tools; it's just too fiddly.
I feel the same way (to at least some extent) about every language I've used other than Lisp. Lisp + Paredit in Emacs is the most pleasant code-wrangling experience I've ever had, because rather having to think in terms of characters or words, I'm able to think in terms of expressions. This is possible with other languages thanks to technologies like Tree-sitter, but I've found that it's only possible to do reliably in Lisp. When I do it in any other language I don't have an unshakable confidence that the wrangling commands will do exactly what I intend.
Yes! Don't worry about it, I very much agree. However, I do think that even if/when I'm using Lisp and have all the best structural editing capabilities at my disposal, I'd still prefer to have an agent do my editing for me; I'd just be 30% more likely to jump in and write code myself on occasion — because ultimately, even with structural editing, you're still thinking about how to apply this constrained set of operations to manipulate a tree of code to get it to where you want, and then having to go through the grunt work of actually doing that, instead of thinking about what state you want the code to be in directly.
Vehement agreeing below:
S-expressions are a massive boon for text editing, because they allow such incredible structural transformations and motions. The problem is that, personally, I don't actually find Lisp to be the best tool for the job for any of the things I want to do. While I find Common Lisp and to a lesser degree Scheme to be fascinating languages, the state of the library ecosystem, documentation, toolchain, and IDEs around them just aren't satisfactory to me, and they don't seem really well adapted to the things I want to do. And yeah, I could spend my time optimizing Common Lisp with `declare`s and doing C-FFI with it, massaging it to do what I want, that's not what I want to spend my time doing. I want to actually finish writing tools that are useful to me.
Moreover, while I used to have hope for tree-sitter to provide a similar level of structural editing for other languages, at least in most editors I've just not found that to be the case. There seem really to be two ways to use tree-sitter to add structural editing to languages: one, to write custom queries for every language, in order to get Vim style syntax objects, and two, to try to directly move/select/manipulate all nodes in the concrete syntax tree as if they're the same, essentially trying to treat tree-sitter's CSTs like S-expressions.
The problem with the first approach is that you end up with really limited, often buggy or incomplete, language support, and structural editing that requires a lot more cognitive overhead: instead of navigating a tree fluidly, you're having to "think before you act," deciding ahead of time what the specific name, in this language, is for the part of the tree you want to manipulate. Additionally, this approach makes it much more difficult to do more high level, interesting transformations; even simple ones like slurp and barf become a bit problematic when you're dealing with such a typed tree, and more advanced ones like convolute? Forget about it.
The problem with the second approach is that, if you're trying to do generalized tree navigation, where you're not up-front naming the specific thing you're talking about, but instead navigating the concrete syntax tree as if it's S-expressions, you run into the problem the author of Combobulate and Mastering Emacs talks about[1]: CSTs are actually really different from S-expressions in practice, because they don't map uniquely onto source code text; instead, they're something overlaid on top of the source code text, which is not one to one with it (in terms of CST nodes to text token), but many to one, because the CST is very granular. Which means that there's a lot of ambiguity in trying to understand where the user is in the tree, where they think they are, and where they intend to go.
There's also the fact that tree-sitter CSTs contain a lot of unnamed nodes (what I call "stop tokens"), where the delimiters for a node of a tree and its children are themselves children of that node, siblings with the actual siblings. And to add insult to injury, most language syntaces just... don't really lend themselves to tree navigation and transformation very well.
I actually tried to bring structural editing to a level equivalent to the S-exp commands in Emacs recently[2], but ran into all of the above problems. I recently moved to Zed, and while its implementation of structural editing and movement is better than mine, and pretty close to 1:1 with the commands available in Emacs (especially if they accept my PR[3]), and also takes the second, language-agnostic, route, it's still not as intuitive and reliable as I'd like.
When I code, I mostly go by two perspectives: The software as a process and the code as a communication medium.
With the software as a process, I'm mostly thinking about the semantics of each expressions. Either there's a final output (transient, but important) or there's a mutation to some state. So the code I'm writing is for making either one possible and the process is very pleasing, like building a lego. The symbols are the bricks and other items which I'm using to create things that does what I want.
With the code as communication, I mostly take the above and make it readable. Like organizing files, renaming variables and functions, modularising pieces of code. The intent is for other people (including future me) to be able to understand and modify what I created in the easiest way possible.
So the first is me communicating with the machine, the second is me communicating with the humans. The first is very easy, you only need to know the semantics of the building blocks of the machine. The second is where the craft comes in.
Emacs (also Vim) makes both easy. Code has a very rigid structure and both have tools that let you manipulate these structure either for adding new actions or refine the shape for understanding.
With AI, it feels like painting with a brick. Or transmitting critical information through a telephone game. Control and Intent are lost.
This is similar to how I prompt, except I start with a text file and design the solution and paste it in to an LLM after I have read it a few times. Otherwise, if I type directly in to the LLM and make a mistake it tends to come back and haunt me later.
I think it’s usage patterns. It is you in a sense.
You can’t deny the fact that someone like Ryan dhal creator of nodejs declared that he no longer writes code is objectively contrary to your own experience. Something is different.
I think you and other deniers try one prompt and then they see the issues and stop.
Programming with AI is like tutoring a child. You teach the child, tell it where it made mistakes and you keep iterating and monitoring the child until it makes what you want. The first output is almost always not what you want. It is the feedback loop between you and the AI that cohesively creates something better than each individual aspect of the human-AI partnership.
My personal suspicion is that the detractors value process and implementation details much more highly than results. That would not surprise me if you come from a business that is paid for its labor inputs and is focused on keeping a large team billable for as long as possible. But I think hackers and garage coders see the value of “vibing” as they are more likely to be the type of people who just want results and view all effort as margin erosion rather than the goal unto itself.
The only thing I would change about what you said is, I don’t see it as a child that needs tutoring. It feels like I’m outsourcing development to an offshore consultancy where we have no common understanding, except the literal meaning of words. I find that there are very, very many problems that are suited well enough to this arrangement.
In real Engineering disciplines the process is important, and is critical for achieving desired results, that's why there are manuals and guidelines measured in the hundreds of pages for things like driving a pile into dirt. There are rigorous testing procedures to enusre everything is correct and up to spec, because there are real consequences.
Software Developers have long been completely disconnected from the consequences of their work, and tech companies have diluted responsibility so much that working software doesn't matter anymore. This field is now mostly scams and bullshit, where developers are closer to finance bros than real, actual Engineers.
I'm not talking about what someone os building in their home for personal reasons for their own usage, but about giving the same thing to other people.
My 2c: there is a divide, unacknowledged, between developers that care about "code correctness" (or any other quality/science/whatever adjective you like) and those who care about the whole system they are creating.
I care about making stuff. "Making stuff" means stuff that I can use. I care about code quality yes, but not to an obsessive degree of "I hate my framework's ORM because of <obscure reason nobody cares about>". So, vibe coding is great, because I know enough to guide the agent away from issues or describe how I want the code to look or be changed.
This gets me to my desired effect of "making stuff" much faster, which is why I like it.
My other 2c: There are Engineers who are concerned by the long-term consequences of their work e.g. maintainability.
In real engineering disciplines, the Engineer is accountable for their work. If a bridge you signed off collapses, you're accountable and if it turns out you were negligent you'll face jail time. In Software, that might be a program in a car.
The Engineering mindset embodies these principles regardless of regulatory constraints. The Engineer needs to keep in mind those who'll be using their constructions. With Agentic Vibecoding, I can never get confident that the resulting software will behave according to specs. I'm worried that it'll scewover the user, the client, and all stakeholders. I can't accept half-assed work just because it saved me 2 days of typing.
I don't make stuff just for the sake of making stuff otherwise it would just be a hobby, and in my hobbies I don't need to care about anything, but I can't in good conscience push shit and slop down other people's throats.
> Programming with AI is like tutoring a child. You teach the child, tell it where it made mistakes and you keep iterating and monitoring the child until it makes what you want.
Who are you people who spend so much time writing code that this is a significant productivity boost?
I'm imagining doing this with an actual child and how long it would take for me to get a real return on investment at my job. Nevermind that the limited amount of time I get to spend writing code is probably the highlight of my job and I'd be effectively replacing that with more code reviews.
And maybe child is too simplistic of an analogy. It's more like working with a savant.
The type of thing you can tell AI to do is like this: You tell it to code a website... it does it, but you don't like the pattern.
Say, "use functional programming", "use camel-case" don't use this pattern, don't use that. And then it does it. You can leave it in the agent file and those instructions become burned into it forever.
A better way to put it is with this example: I put my symptoms into ChatGPT and it gives some generic info with a massive "not-medical-advice" boilerplate and refuses to give specific recommendations. My wife (an NP) puts in anonymous medical questions and gets highly specific med terminology heavy guidance.
That's all to say the learning curve with LLMs is how to say things a specific way to reliability get an outcome.
I recently inherited an over decade old web project full of EOL'd libraries and OS packages that desperately needed to be modernized.
Within 3 hours I had a working test suite with 80% code coverage on core business functionality (~300 tests). Now - maybe the tests aren't the best designs given there is no way I could review that many tests in 3 hours, but I know empirically that they cover a majority of the code of the core logic. We can now incrementally upgrade the project and have at least some kind of basic check along the way.
There's no way I could have pieced together as large of a working test suite using tech of that era in even double that time.
... Yeah thise tests are probably garbage. The models probably covered the 80% that consists of boiler plate and mocked out the important 20% that was critical business logic. That's how it was in my experience.
> maybe the tests aren't the best designs given there is no way I could review that many tests in 3 hours,
If you haven't reviewed and signed off then you have to assume that the stuff is garbage.
This is the crux of using AI to create anything and it has been a core rule of development for many years that you don't use wizards unless you understand what they are doing.
You know they cause a majority of the code of the core logic to execute, right? Are you sure the tests actually check that those bits of logic are doing the right thing? I've had Claude et al. write me plenty of tests that exercise things and then explicitly swallow errors and pass.
Yes, the first hour or so was spent fidgeting with test creation. It started out doing it's usual whacky behavior like checking the existence of a method and calling that a "pass", creating a mock object that mocked the return result of the logic it was supposed to be testing, and (my favorite) copying the logic out of the code and putting it directly into the test. Lots of course correction, but once I had one well written test that I had fully proofed myself I just provided it that test as an example and it did a pretty good job following those patterns for the remainder.
I still sniffed out all the output for LLM whackiness though. Using a code coverage tool also helps a lot.
These people are just the same charlatans and scammers you saw in the web3 sphere. Invoking Ryan Dahl as some sort of authority figure and not a tragic figure that sold his soul to VC companies is even more pathetic.
Nah, I'm with you there. I've yet to see even Opus 4.5 produce something close to production-ready -- in fact Opus seems like quite a major defect factory, given its consistent tendency toward hardcoding case by case workarounds for issues caused by its own bad design choices.
I think uncritical AI enthusiasts are just essentially making the bet that the rising mountains of tech debt they are leaving in their wake can be paid off later on with yet more AI. And you know, that might even work out. Until such a time, though, and as things currently stand, I struggle to understand how one can view raw LLM code and find it acceptable by any professional standard.
Working code doesn’t mean the same for everyone. My coworker just started vibe coding. Her code works… on happy paths. It absolutely doesn’t work when any kind of error happens. It’s also absolutely impossible to refactor it in any way. She thinks her code works.
The same coworker asked to update a service to Spring Boot 4. She made a blog post about. She used LLM for it. So far every point which I read was a lie, and her workarounds make, for example tests, unnecessarily less readable.
So yeah, “it works”, until it doesn’t, and when it hits you, that you need to work more in sum at the end, because there are more obscure bugs, and fixing those are more difficult because of terrible readability.
I can't help but think of my earliest days of coding, 20ish years ago, when I would post my code online looking for help on a small thing, and being told that my code is garbage and doesn't work at all even if it actually is working.
There are many ways to skin a cat, and in programming the happens-in-a-digital-space aspect removes seemingly all boundaries, leading to fractal ways to "skin a cat".
A lot of programmers have hard heads and know the right way to do something. These are the same guys who criticized every other senior dev as being a bad/weak coder long before LLMs were around.
Parent's profile shows that they are an experienced software engineer in multiple areas of software development.
Your own profile says you are a PM whose software skills amount to "Script kiddie at best but love hacking things together."
It seems like the "separate worlds" you are describing is the impression of reviewing the code base from a seasoned engineer vs an amateur. It shouldn't be even a little surprising that your impression of the result is that the code is much better looking than the impression of a more experienced developer.
At least in my experience, learning to quickly read a code base is one of the later skills a software engineer develops. Generally only very experienced engineers can dive into an open source code base to answer questions about how the library works and is used (typically, most engineers need documentation to aid them in this process).
I mean, I've dabbled in home plumbing quite a bit, but if AI instructed me to repair my pipes and I thought it "looked great!" but an experienced plumber's response was "ugh, this doesn't look good to me, lots of issues here" I wouldn't argue there are "two separate worlds".
> It shouldn't be even a little surprising that your impression of the result is that the code is much better looking than the impression of a more experienced developer.
This really is it: AI produces bad to mediocre code. To someone who produces terrible code mediocre is an upgrade, but to someone who produces good to excellent code, mediocre is a downgrade.
Today. It produces mediocre code today. That is really it. What is the quality of that code compared to 1 year ago. What will it be in 1 year? Opus 6.5 is inevitable.
That's what they've been saying for years now. Seems like the same FSD marketing. Any day now it'll be driving across the country! Just you wait! -> Any day now it'll be replacing software developers! Just you wait! Frankly, the same people who fell for the former are falling for the latter.
Rather, to me it looks like all we're getting with additional time is marginal returns. What'll it be in 1 year? Marginally better than today, just like today is marginally better compared to a year ago. The exponential gains in performance are already over. What we're looking at now is exponentially more work for linear gains in performance.
Except I work with extremely competent software engineers on software used in mission critical applications in the Fortune 500. I call myself a script kiddie because I did not study Computer Science. Am I green in the test run? Does it pass load tests? Is it making money? While some of yall are worried about leaky abstractions, we just closed another client. Two worlds for sure where one team is skating to the puck, looking to raise cattle while another wants to continue nurturing an exotic pet.
Plenty of respect to the craft of code but the AI of today is the worst is is ever going to be.
Can you just clarify the claim you're making here: you personally are shipping vibe coded features, as a PM, that makes it into prod and this prod feature that you're building is largely vibe coded?
That's a significant rub with LLMs, particularly hosted ones: the variability. Add in quantization, speculative decoding, and dynamic adjustment of temperature, nucleus sampling, attention head count, & skipped layers at runtime, and you can get wildly different behaviors with even the same prompt and context sent to the same model endpoint a couple hours apart.
That's all before you even get to all of the other quirks with LLMs.
It depends heavily on the scope and type of problem. If you're putting together a standard isolated TypeScript app from scratch it can do wonders, but many large systems are spread between multiple services, use abstractions unique to the project, and are generally dealing with far stricter requirements. I couldn't depend on Claude to do some of the stuff I'd really want, like refactor the shared code between six massive files without breaking tests. The space I can still have it work productively in is still fairly limited.
The secret sauce for me is Beads. Once Beads is setup you make the tasks and refine them and by the end each task is a very detailed prompt. I have Claude ask me clarifying questions, do research for best practices etc
Because of Beads I can have Claude do a code review for serious bugs and issues and sure enough it finds some interesting things I overlooked.
I have also seen my peers in the reverse engineering field make breakthroughs emulating runtimes that have no or limited existing runtimes, all from the ground up mind you.
I think the key is thinking of yourself as an architect / mentor for a capable and promising Junior developer.
I've found that the thing that made is really click for me was having reusable rules (each agent accepts these differently) that help tell it patterns and structure you want.
I have ones that describe what kinds of functions get unit vs integration tests, how to structure them, and the general kinds of test cases to check for (they love writing way too many tests IME). It has reduced the back and forth I have with the LLM telling it to correct something.
Usually the first time it does something I don't like, I have it correct it. Once it's in a satisfactory state, I tell it to write a Cursor rule describing the situation BRIEFLY (it gets way to verbose by default) and how to structure things.
That has made writing LLM code so much more enjoyable for me.
Its really becoming a good litmus test for how someones coding ability whether they think LLMS can do well on complex tasks.
For example, someone may ask an LLM to write a simple http web server, and it can do that fine, and they consider that complex, when in reality its really not.
It’s not. There are tons of great programmers, that are big names in the industry who now exclusively vibe code. Many of these names are obviously intelligent and great programmers.
People use "vibe coding" to mean different things - some mean the original Karpathy "look ma, no hands!", feel the vibez, thing, and some just (confusingly) use "vibe coding" to refer to any use of AI to write code, including treating it as a tool to write small well-defined parts that you have specified, as opposed to treating it as a magic genie.
There also seem to be people hearing big names like Karpathy and Linus Torvalds say they are vibe coding on their hobby projects, meaning who knows what, and misunderstanding this as being an endorsement of "magic genie" creation of professional quality software.
Results of course also vary according to how well what you are asking the AI to do matches what it was trained on. Despite sometimes feeling like it, it is not a magic genie - it is a predictor that is essentially trying to best match your input prompt (maybe a program specification) to pieces of what it was trained on. If there is no good match, then it'll have a go anyway, and this is where things tend to fall apart.
Funny, the last interview I watched with Karpathy he highlighted the way the AI/LLM was unable to think in a way that aligned with his codebase. He described vibe-coding a transition from Python to Rust but specifically called out that he hand-coded all of the python code due to weaknesses in LLM's ability to handle performant code. I'm pretty sure this was the last Dwarkesh interview with "LLMs as ghosts".
Right, and he also very recently said that he felt essentially left behind by AI coding advances, thinking that his productivity could be 10x if he knew how to use it better.
It seems clear that Karpathy himself is well aware of the difference between "vibe coding" as he defined it (which he explicitly said was for playing with on hobby projects), and more controlled productive use of AI for coding, which has either eluded him, or maybe his expectations are too high and (although it would be surprising) he has not realized the difference between the types of application where people are finding it useful, and use cases like his own that do not play to its strength.
I don't think he meant to start a movement - it was more of a throw-away tweet that people took way too seriously, although maybe with his bully pulpit he should have realized that would happen.
You don't have to be bad at coding to use LLMs. The argument was specifically about thinking that LLMS can be great at accomplishing complex tasks (which they are not)
They are more effective then on the ground in your face evidence largely because people who are so against AI are blind to it.
I hold a result of AI in front of your face and they still proclaim it’s garbage and everything else is fraudulent.
Let’s be clear. You’re arguing against a fantasy. Nobody even proponents of AI claims that AI is as good as humans. Nowhere near it. But they are good enough for pair programming. That is indisputable. Yet we have tons of people like you who stare at reality and deny it and call it fraudulent.
Examine the lay of the land if that many people are so divided it really means both perspectives are correct in a way.
If you want to be any good at all in this industry, you have to develop enough technical skills to evaluate claims for yourself. You have to. It's essential.
Because the dirty secret is a lot of successful people aren't actually smart or talented, they just got lucky. Or they aren't successful at all, they're just good at pretending they are, either through taking credit for other people's work or flat out lying.
I've run into more than a few startups that are just flat out lying about their capabilities and several that were outright fraud. (See DoNotPay for a recent fraud example lol)
Pointing to anyone and going "well THEY do it, it MUST work" is frankly engineering malpractice. It might work. But unless you have the chops to verify it for yourself, you're just asking to be conned.
I think the author is way understating the uselessness of LLMs in any serious context outside of a demo to an investor. I've had nothing but low IQ nonsense from every SOTA model.
If we're being honest with ourselves, Opus 4.5 / GPT 5.2 etc are maybe 10-20% better than GPT 3.5 at most. It's a total and absolute catastrophic failure that will go down in history as one of humanity's biggest mistakes.
> Feedback helps, right. But if you've got a problem where a simple, contained feedback loop isn't that easy to build, the only source of feedback is yourself. And that's when you are exposed to the stupidity of current AI models.
That's exactly the point. Modern coding agents aren't smart software engineers per se; they're very very good goal-seekers whose unit of work is code. They need automatable feedback loops.
You're not taking crazy pills, this is my exact experience too. I've been using my wife's eCommerce shop (a headless Medusa instance, which has pretty good docs and even their own documentation LLM) as a 100% vibe-coded project using Claude Code, and it has been one comedy of errors after another. I can't tell you how many times I've had it go through the loop of Cart + Payment Collection link is broken -> Redeploy -> Webhook is broken (can't find payment collection) -> Redeploy -> Cart + Payment Collection link is broken -> Repeat. And it never seems to remember the reasons it had done something previously – despite it being plastered 8000 times across the CLAUDE.md file – so it bumbles into the same fuckups over and over again.
A complete exercise in frustration that has turned me off of all agentic code bullshit. The only reason I still have Claude Code installed is because I like the `/multi-commit` skill I made.
The other side of this coin are the non-developer stakeholders who Dunning-Kruger themselves into firm conclusions on technical subjects with LLMs. "Well I can code this up in an hour, two max. Why is it taking you ten hours?". I've (anecdotally) even had project sponsors approach me with an LLM's judgement on their working relationship with me as if it were gospel like "It said that we aren't on the same page. We need to get aligned." It gets weird.
These cases are common enough to where it's more systemic than isolated.
I read these comments and articles and feel like I am completely disconnected from most people here. Why not use GenAI the way it actually works best: like autocomplete on steroids. You stay the architect, and you have it write code function by function. Don't show up in Claude Code or Codex asking it to "please write me GTA 6 with no mistakes or you go to jail, please."
It feels like a lot of people are using GenAI wrong.
It helps to write out the prompt in a seperate text editor so you can edit it and try to desribe what the input is, and what output you want as well as try to describe and catch likely or iteratively observed issues.
You try a gamut of sample inputs and observe where its going awry? Describe the error to it and see what it does
I am getting workable code with Claude on a 10kloc Typescript project. I ask it to make plans then execute them step by step. I have yet to try something larger, or something more obscure.
I feel like there is a nuance here. I use GitHub Copilot and Claude Code, and unless I tell it to not do anything, or explicitly enable a plan mode, the LLM will usually jump straight to file edits. This happens even if I prompt it with something as simple as "Remind me how loop variable scoping works in this language?".
This. I feel like folks are living in two separate worlds. You need to narrow the aperture and take the LLm through discrete steps. Are people just saying it doesn't work because they are pointing it at 1m loc monoliths and trying to oneshot a giant epic?
its all fake coverage, for fake tests, for fake OKRs
what are people actually getting done? I've sat next to our top evangelist for 30 minutes pair programming and he just fought the tool saying something was wrong with the db while showing off some UI I dont care about.
like that seems to be the real issue to me. i never bother wasting time with UI and just write a tool to get something done. but people seem impressed that AI did some shitty data binding to a data model that cant do anything, but its pretty.
it feels weird being an avowed singularitarian but adamant that these tools suck now.
I have found AI great in alot of scenarios but If I have a specific workflow, then the answer is specific and the ai will get it wrong 100% of the time. You have a great point here.
A trivial example is your happy path git workflow. I want:
- pull main
- make new branch in user/feature format
- Commit, always sign with my ssh key
- push
- open pr
but it always will
- not sign commits
- not pull main
- not know to rebase if changes are in flight
- make a million unnecessary commits
- not squash when making a million unnecessary commits
- have no guardrails when pushing to main (oops!)
- add too many comments
- commit message too long
- spam the pr comment with hallucinated test plans
- incorrectly attribute itself as coauthor in some gorilla marketing effort (fixable with config, but whyyyyyy -- also this isn't just annoying, it breaks compliance in alot of places and fundamentally misunderstands the whole point of authorship, which is copyright --- and AIs can't own copyright )
- not make DCO compliant commits
...
Commit spam is particularly bad for bisect bug hunting and ref performance issues at scale. Sure I can enforce Squash and Merge on my repo but why am I relying on that if the AI is so smart?
All of these things are fixed with aliases / magit / cli usage, using the thing the way we have always done it.
Because it's not? I use these things very extensively to great effect, and the idea that you'd think of it as "smart" is alien to me, and seems like it would hurt your ability to get much out of them.
Like, they're superhuman at breadth and speed and some other properties, but they don't make good decisions.
Just a supplementary fact: I'm in the beneficial position, against the AI, that in a case where it's hard to provide that automatic feedback loop, I can run and test the code at my discretion, whereas the AI model can't.
Yet. Most of my criticism is not after running the code, but after _reading_ the code. It wrote code. I read it. And I am not happy with it. No even need to run it, it's shit at glance.
Yesterday I generated a for-home-use-only PHP app over the weekend with a popular cli LLM product. The app met all my requirements, but the generated code was mixed. It correctly used a prepared query to avoid SQL injection. But then, instead of an obvious:
"SELECT * FROM table WHERE id=1;"
it gave me:
$result = $db->query("SELECT * FROM table;");
for ($row in $result)
if ($["id"] == 1)
return $row;
With additional prompting I arrived at code I was comfortable deploying, but this kind of flaw cuts into the total time-savings.
Yeah, you're right, and the snark might be warranted. I should consider it the same as my stupid (but cute) robot vacuum cleaner that goes at random directions but gets the job done.
The thing that differentiates LLM's from my stupid but cute vacuum cleaner, is that the (at least OpenAI's) AI model is cocksure and wrong, which is infinitely more infuriating than being a bit clueless and wrong.
I've been trying to solve this by wrapping the generation in a LangGraph loop. The hope was that an agent could catch the errors, but it seems to just compound the problem. You end up paying for ten API calls where the model confidently doubles down on the mistake, which gets expensive very quickly for no real gain.
You can play with the model for free in chat... but if $20 for a coding agent isn't effectively free for use case it might not be the right tool for you.
ETA: I've probably gotten 10k worth of junior dev time out of it this month.
You might get better code out of it if you give the AI some more restrictive handcuffs. Spin up a tester instance and have it tell the developer instance to try again until it's happy with the quality.
Skill comes from experience. It takes a good amount of working with these models to learn how to use them effectively, when to use them, and what to use them for. Otherwise, you end up hitting their limitations over and over and they just seem useless.
They're certainly not perfect, but many of the issues that people post about as though they're show-stoppers are easily resolved with the right tools and prompting.
Right. But "prompt" also covers a lot of ground, e.g. planning, tracking tasks, etc. The codex-style frameworks do a good amount of that for you, but it can still make a big difference to structure what you're asking the model to do and let it execute step by step.
A lot of the failures people talk about seem to involve expecting the models to one-shot fairly complex requirements.
I came to "vibe coding" with an open mind, but I'm slowly edging in the same direction.
It is hands down good for code which is laborious or tedious to write, but once done, obviously correct or incorrect (with low effort inspection). Tests help but only if the code comes out nicely structured.
I made plenty of tools like this, a replacement REPL for MS-SQL, a caching tool in Python, a matplotlib helper. Things that I know 90% how to write anyway but don't have the time, but once in front of me, obviously correct or incorrect. NP code I suppose.
But business critical stuff is rarely like this, for me anyway. It is complex, has to deal with various subtle edge cases, be written defensively (so it fails predictably and gracefully), well structured etc. and try as I might, I can't get Claude to write stuff that's up to scratch in this department.
I'll give it instructions on how to write some specific function, it will write this code but not use it, and use something else instead. It will pepper the code with rookie mistakes like writing the same logic N times in different places instead of factoring it out. It will miss key parts of the spec and insist it did it, or tell me "Yea you are right! Let me rewrite it" and not actually fix the issue.
I also have a sense that it got a lot dumber over time. My expectations may have changed of course too, but still. I suspect even within a model, there is some variability of how much compute is used (eg how deep the beam search is) and supply/demand means this knob is continuously tuned down.
I still try to use Claude for tasks like this, but increasingly find my hit rate so low that the whole "don't write any code yet, let's build a spec" exercise is a waste of time.
I still find Claude good as a rubber duck or to discuss design or errors - a better Stack Exchange.
But you can't split your software spec into a set of SE questions then paste the code from top answers.
> It is hands down good for code which is laborious or tedious to write, but once done, obviously correct or incorrect (with low effort inspection).
The problem here is, that it fills in gaps that shouldn't be there in the first place. Good code isn't laborious. Good code is small. We learn to avoid unnecessary abstractions. We learn to minimize "plumbing" such that the resulting code contains little more than clear and readable instructions of what you intend for the computer to do.
The perfect code is just as clear as the design document in describing the intentions, only using a computer language.
If someone is gaining super speeds by providing AI clear design documents compared to coding themselves, maybe they aren't coding the way they should.
The quote that I heard (I think on HN) was, "If we had AIs to write XML for us then we never would have invented json."
My biggest LLM success resulted in something operationally correct but was something that I would never want to try to modify. The LLM also had an increasingly difficult time adding features.
Meanwhile my biggest 'manual' successes have resulted in something that was operationally correct, quick to modify, and refuses to compile if you mess anything up.
This doesn't sound correct. We have computers write binary for us. We still make protocols which are optimizations for binary representation.. not because it's a pain to write.. but because there's some second order effect that we care about (storage / transfer costs, etc).
And a recent HN article had a bunch of comments lamenting that nobody ever uses XML any more, and talking about how much better it was than things like JSON.
The only thing I think I learned from some of those exchanges was that xslt adherents are approximately as vocal as lisp adherents.
> a recent HN article had a bunch of comments lamenting that nobody ever uses XML any more
I still use it from time to time for config files that a developer has to write. I find it easier to read that JSON, and it supports comments. Also, the distinction between attributes and children is often really nice to have. You can shoehorn that into JSON of course, but native XML does it better.
Obviously, I would never use it for data interchange (e.g. SOAP) anymore.
> Obviously, I would never use it for data interchange (e.g. SOAP) anymore.
Well, those comments were arguing about how it is the absolute best for data interchange.
> I still use it from time to time for config files that a developer has to write.
Even back when XML was still relatively hot, I recalled thinking that it solved a problem that a lot of developers didn't have.
Because if, for example, you're writing Python or Javascript or Perl, it is dead easy to have Python or Javascript or Perl also be your configuration file language.
I don't know what language you use, but 20 years ago, I viewed XML as a Java developer's band-aid.
Dunno. GUI / TUI code? "Here's a function that serialises object X to CSV, make a (de)serialiser to SQLite with tests". "And now to MS-SQL" pretty please".
I don't how much scope realistically there is for writing these kinds of code nicely.
The hardest part of coding has never been coding. It's been translating new business requirements into a specific implementation plan that works. Understanding what needs to be done, how things are currently working, and how to go from A to B.
You can't dispense with yourself in those scenarios. You have to read, think, investigate, break things down into smaller problems. But I employ LLM's to help with that all the time.
Granted, that's not vibe coding at all. So I guess we are pretty much in agreement up to this point. Except I still think LLMs speed up this process significantly, and the models and tools are only going to get better.
Also, there are a lot of developers that are just handed the implementation plan.
vibe coding applies to very few people in this thread. almost all the people here are talking about using LLMs to do something they could do anyway, to save time, or getting the LLM to teach them how to code something. this is not vibe coding. vibe coding is lacking coding experience and slapping in some prompts to just get something that works
> Not only does an agent not have the ability to evolve a specification over a multi-week period as it builds out its lower components, it also makes decisions upfront that it later doesn’t deviate from.
That's your job.
The great thing about coding agents is that you can tell them "change of design: all API interactions need to go through a new single class that does authentication and retries and rate-limit throttling" and... they'll track down dozens or even hundreds of places that need updating and fix them all.
(And the automated test suite will help them confirm that the refactoring worked properly, because naturally you had them construct an automated test suite when they built those original features, right?)
Going back to typing all of the code yourself (my interpretation of "writing by hand") because you don't have the agent-managerial skills to tell the coding agents how to clean up the mess they made feels short-sighted to me.
> (And the automated test suite will help them confirm that the refactoring worked properly, because naturally you had them construct an automated test suite when they built those original features, right?)
I dunno, maybe I have high standards but I generally find that the test suites generated by LLMs are both over and under determined. Over-determined in the sense that some of the tests are focused on implementation details, and under-determined in the sense that they don't test the conceptual things that a human might.
That being said, I've come across loads of human written tests that are very similar, so I can see where the agents are coming from.
You often mention that this is why you are getting good results from LLMs so it would be great if you could expand on how you do this at some point in the future.
I work in Python which helps a lot because there are a TON of good examples of pytest tests floating around in the training data, including things like usage of fixture libraries for mocking external HTTP APIs and snapshot testing and other neat patterns.
Or I can say "use pytest-httpx to mock the endpoints" and Claude knows what I mean.
Keeping an eye on the tests is important. The most common anti-pattern I see is large amounts of duplicated test setup code - which isn't a huge deal, I'm much more more tolerant of duplicated logic in tests than I am in implementation, but it's still worth pushing back on.
"Refactor those tests to use pytest.mark.parametrize" and "extract the common setup into a pytest fixture" work really well there.
Generally though the best way to get good tests out of a coding agent is to make sure it's working in a project with an existing test suite that uses good patterns. Coding agents pick the existing patterns up without needing any extra prompting at all.
I find that once a project has clean basic tests the new tests added by the agents tend to match them in quality. It's similar to how working on large projects with a team of other developers work - keeping the code clean means when people look for examples of how to write a test they'll be pointed in the right direction.
One last tip I use a lot is this:
Clone datasette/datasette-enrichments
from GitHub to /tmp and imitate the
testing patterns it uses
I do this all the time with different existing projects I've written - the quickest way to show an agent how you like something to be done is to have it look at an example.
> Generally though the best way to get good tests out of a coding agent is to make sure it's working in a project with an existing test suite that uses good patterns. Coding agents pick the existing patterns up without needing any extra prompting at all.
Yeah, this is where I too have seen better results. The worse ones have been in places where it was greenfield and I didn't have an amazing idea of how to write tests (a data person working on a django app).
I work in Python as well and find Claude quite poor at writing proper tests, might be using it wrong. Just last week, I asked Opus to create a small integration test (with pre-existing examples) and it tried to create a 200-line file with 20 tests I didn't ask for.
I am not sure why, but it kept trying to do that, although I made several attempts.
Ended up writing it on my own, very odd. This was in Cursor, however.
In my experience asking the model to construct an automated test suite, with no additional context, is asking for a bad time. You'll see tests for a custom exception class that you (or the LLM) wrote that check that the message argument can be overwritten by the caller, or that a class responds to a certain method, or some other pointless and/or tautological test.
If you start with an example file of tests that follow a pattern you like, along with the code the tests are for, it's pretty good at following along. Even adding a sentence to the prompt about avoiding tautological tests and focusing on the seams of functions/objects/whatever (integration tests) can get you pretty far to a solid test suite.
Another agent reviews the tests, finds duplicate code, finds poor testing patterns, looks for tests that are only following the "happy path", ensures logic is actually tested and that you're not wasting time testing things like getters and setters. That agent writes up a report.
Give that report back to the agent that wrote the test or spin up a new agent and feed the report to it.
Don't do all of this blindly, actually read the report to make sure the llm is on the right path. Repeat that one or two times.
Yeah I've seen this too. Bangs out five hundred line unit test file, but half of them are as you describe.
Just writing one line in CLAUDE.md or similar saying "don't test library code; assume it is covered" works.
Half the battle with this stuff is realizing that these agents are VERY literal. The other half is paring down your spec/token usage without sacrificing clarity.
Once the agent writes your tests, have another agent review them and ask that agent to look for pointless tests, to make sure testing is around more than just the "happy path", etc. etc.
Just like anything else in software, you have to iterate. The first pass is just to thread the needle.
I get the sense that many programmers resent writing tests and see them as a checkbox item or even boilerplate, not a core part of their codebase. Writing great tests takes a lot of thought about the myriad of bizarre and interesting ways your code will run. I can’t imagine that prompting an LLM to “write tests for this code” will result in anything but the most trivial of smoke test suites.
Incidentally, I wonder if anyone has used LLMs to generate complex test scenarios described in prose, e.g. “write a test where thread 1 calls foo, then before hitting block X, thread 2 calls bar, then foo returns, then bar returns” or "write a test where the first network call Framework.foo makes returns response X, but the second call returns error Y, and ensure the daemon runs the appropriate mitigation code and clears/updates database state." How would they perform in this scenario? Would they add the appropriate shims, semaphores, test injection points, etc.?
Different strokes for different folks and all, but that sounds like automating all of the fun parts and doing all of the drudgery by hand. If the LLM is going to write anything, I'd much rather make it write the tests and do the implementation myself.
Unfortunately I have started to feel that using AI to code - even with a well designed spec, ends up with code that; in the authors words, looks like
> [Agents write] units of changes that look good in isolation.
I have only been using agents for coding end-to-end for a few months now, but I think I've started to realise why the output doesn't feel that great to me.
Like you said; "it's my job" to create a well designed code base.
Without writing the code myself however, without feeling the rough edges of the abstractions I've written, without getting a sense of how things should change to make the code better architected, I just don't know how to make it better.
I've always worked in smaller increments, creating the small piece I know I need and then building on top of that. That process highlights the rough edges, the inconsistent abstractions, and that leads to a better codebase.
AI (it seems) decides on a direction and then writes 100s of LOC at one. It doesn't need to build abstractions because it can write the same piece of code a thousand times without caring.
I write one function at a time, and as soon I try to use it in a different context I realise a better abstraction. The AI just writes another function with 90% similar code.
The old classic mantra is "work smarter, not harder". LLMs are perfect for "work harder". They can produce bulk numbers of lines. They can help you brute force a problem space with more lines of code.
We expect the spec writing and prompt management to cover the "work smarter" bases, but part of the work smarter "loop" is hitting those points where "work harder" is about to happen, where you know you could solve a problem with 100s or 1000s of lines of code, pausing for a bit, and finding the smarter path/the shortcut/the better abstraction.
I've yet to see an "agentic loop" that works half as well as my well trained "work smarter loop" and my very human reaction to those points in time of "yeah, I simply don't want to work harder here and I don't think I need hundreds more lines of code to handle this thing, there has to be something smarter I can do".
In my opinion, the "best" PRs delete as much or more code than they add. In the cleanest LLM created PRs I've never seen an LLM propose a true removal that wasn't just "this code wasn't working according to the tests so I deleted the tests and the code" level mistakes.
I don't see why you can't use your approach of writing one function at a time, making it work in the context and then moving on with AI.
Sure you can't tell it to do all that in one step but personally I really like not dealing with the boilerplate stuff and worrying more about the context and how to use my existing functions in different places
> Going back to typing all of the code yourself (my interpretation of "writing by hand") because you don't have the agent-managerial skills to tell the coding agents how to clean up the mess they made feels short-sighted to me.
I increasingly feel a sort of "guilt" when going back and forth between agent-coding and writing it myself. When the agent didn't structure the code the way I wanted, or it just needs overall cleanup, my frustration will get the best of me and I will spend too much time writing code manually or refactoring using traditional tools (IntelliJ). It's clear to me that with current tooling some of this type of work is still necessary, but I'm trying to check myself about whether a certain task really requires my manual intervention, or whether the agent could manage it faster.
Knowing how to manage this back and forth reinforces a view I've seen you espouse: we have to practice and really understand agentic coding tools to get good at working with them, and it's a complete error to just complain and wait until they get "good enough" - they're already really good right now if you know how to manage them.
> So I’m back to writing by hand for most things. Amazingly, I’m faster, more accurate, more creative, more productive, and more efficient than AI, when you price everything in, and not just code tokens per hour
At least he said "most things". I also did "most things" by hand, until Opus 4.5 came out. Now it's doing things in hours I would have worked an entire week on. But it's not a prompt-and-forget kind of thing, it needs hand holding.
Also, I have no idea _what_ agent he was using. OpenAI, Gemini, Claude, something local? And with a subscription, or paying by the token?
Because the way I'm using it, this only pays off because it's the 200$ Claude Max subscription. If I had to pay for the token (which once again: are hugely marked up), I would have been bankrupt.
The article and video just feels like another dev poo-pooing LLMs.
"vibe coding" didn't really become real until 2025, so how were they vibe coding for 2 years? 2 years ago I couldn't count on an llm to output JSON consistently.
Overall the article/video are SUPER ambiguous and frankly worthless.
I successfully vibe coded an app in 2023, soon after VS Code Copilot added the chat feature, although we obviously didn't call it that back then.
I remember being amazed and at the time thinking the game had changed. But I've never been able to replicate it since. Even the latest and greatest models seem to always go off and do something stupid that it can't figure out how to recover from without some serious handholding and critique.
LLMs are basically slot machines, though, so I suppose there has always been a chance of hitting the jackpot.
No, that isn't. To quote your own blog, his job is to "deliver code [he's] proven to work", not to manage AI agents. The author has determined that managing AI agents is not an effective way to deliver code in the long term.
> you don't have the agent-managerial skills to tell the coding agents how to clean up the mess they made
The author has years of experience with AI assisted coding. Is there any way we can check to see if someone is actually skilled at using these tools besides whether they report/studies measure that they do better with them than without?
> Going back to typing all of the code yourself (my interpretation of "writing by hand") because you don't have the agent-managerial skills to tell the coding agents how to clean up the mess they made feels short-sighted to me.
Or those skills are a temporary side effect of the current SOTA and will be useless in the future, so honing them is pointless right now.
Agents shouldn't make messes, if they did what they say on the tin at least, and if folks are wasting considerable time cleaning them up, they should've just written the code themselves.
As a former “types are overrated” person, Typescript was my conversion moment.
For small projects, I don’t think it makes a huge difference.
But for large projects, I’d guess that most die-hard dynamic people who have tried typescript have now seen the light and find lots of benefits to static typing.
I was on the other side, I thought types are indispensable. And I still do.
My own experience suggest that if you need to develop heavily multithreaded application, you should use Haskell and you need some MVars if you are working alone and you need software transactional memory (STM) if you are working as part of a team, two and more people.
STM makes stitching different parts of the parallel program together as easy as just writing sequential program - sequential coordination is delegated to STM. But, STM needs control of side effects, one should not write a file inside STM transaction, only before transaction is started or after transaction is finished.
Because of this, C#, F#, C++, C, Rust, Java and most of programming languages do not have a proper STM implementation.
For controlling (and combining) (side) effects one needs higher order types and partially instantiated types. These were already available in Haskell (ghc 6.4, 2005) at the time Rust was conceived (2009), for four years.
Did Rust do anything to have these? No. The authors were a little bit too concerned to reimplement what Henry Baker did at the beginning of 1990-s, if not before that.
Do Rust authors have plans to implement these? No, they have other things to do urgently to serve community better. As if making complex coordination of heavily parallel programs is not a priority at all.
Seriously. I've known for a very long time that our community has a serious problem with binary thinking, but AI has done more to reinforce that than anything I can think of in modern memory. Nearly every discussion I get into about AI is dead out of the gate because at least one person in the conversation has a binary view that it's either handwritten or vibe coded. They have an insanely difficult time imagining anything in the middle.
Vibe coding is the extreme end of using AI, while handwriting is the extreme end of not using AI. The optimal spot is somewhere in the middle. Where exactly that spot is, I think is still up for debate. But the debate is not progressed in any way by latching on to the extremes and assuming that they are the only options.
The "vibe coding" term is causing a lot of brain rot.
Because when I see people that are downplaying LLMs or the people describing their poor experiences it feels like they're trying to "vibe code" but they expect the LLM to automatically do EVERYTHING. They take it as a failure that they have to tell the LLM explicitly to do something a couple times. Or they take it as a problem that the LLM didn't "one shot" something.
I'd like it to take less time to correct than it takes me to type out the code I want and as of yet I haven't had that experience. Now, I don't do Python or JS, which I understand the LLMs are better at, but there's a whole lot of programming that isn't in Python or JS...
I've had success across quite a few languages, more than just python and js. I find it insanely hard to believe you can write code faster than the LLM, even if the LLM has to iterate a couple times.
But I'm thankful for you devs that are giving me job security.
And that tells me you're on the dev end of the devops spectrum while I'm fully on the ops side. I write very small pieces of software (the time it takes to type them is never the bottleneck) that integrates in-house software with whatever services they have to actually interact with, which every LLM I've used does wrong the first fifteen or so times it tries (for some reason rtkit in particular absolutely flummoxes every single LLM I've ever given it to).
I'm only writing 5-10% of my own code at this point. The AI tools are good, it just seems like people that don't like them expect them to be 100% automatic with no hand holding.
Like people in here complaining about how poor the tests are... but did they start another agent to review the tests? Did they take that and iterate on the tests with multiple agents?
I can attest that the first pass of testing can often be shit. That's why you iterate.
> I can attest that the first pass of testing can often be shit. That's why you iterate.
So far, by the time I’m done iterating, I could have just written it myself. Typing takes like no time at all in aggregate. Especially with AI assisted autocomplete. I spend far more time reading and thinking (which I have to do to write a good spec for the AI anyways).
I agree, as a pretty experienced coder, I wonder if the newer generation is just rolling with the first shot. I find myself having the AI rewrite things a slightly different way 2-3x per feature or maybe even 10x. Because i know quality when i see it, having done so much by hand and so much reading.
The funniest thing I've seen GPT do was a while back when I had it try to implement ORCA (Optimal Reciprocal Collision Avoidance). It is a human made algorithm for entities where they just use their own and N neighbours' current radii along with their velocity to calculate mathematical lines into the future, so that they can avoid walking into each other.
It came very close to success, but there were 2 or 3 big show-stopping bugs such as it forgetting to update the spatial partitioning when the entities moved, so it would work at the start but then degrade over time.
It believed and got stuck on thinking that it must be the algorithm itself that was the problem, so at some point it just stuck a generic boids solution into the middle of the rest. To make it worse, it didn't even bother to use the spatial partitioning and they were just brute force looking at their neighbours.
Had this been a real system it might have made its way into production, which makes one think about the value of the AI code out there. As it was I pointed out that bit and asked about it, at which point it admitted that it was definitely a mistake and then it removed it.
I had previously implement my own version of the algorithm and it took me quite a bit of time, but during that I built up the mental code model and understood both the problem and solution by the end. In comparison it easily implemented it 10-30x faster than I did but would never have managed to complete the project on its own. Also if I hadn't previously implemented it myself and had just tried to have it do the heavy lifting then I wouldn't have understood enough of what it was doing to overcome its issues and get the code working properly.
There's been such a massive leap in capabilities since claude code came out, which was middle/end of 2025.
2 years ago I MAYBE used an LLM to take unstructured data and give me a json object of a specific structure. Only about 1 year ago did I start using llms for ANY type of coding and I would generally use snippets, not whole codebases. It wasn't until September when I started really leveraging the LLM for coding.
I was vibe coding in November 2024, before the term was coined. I think that is about as early as anyone was doing it, so 1.25 years ago. Cursor added its "agentic" mode around then, I think, but before that there was just "accept all" without looking at changes repeatedly.
I shipped a small game that way (https://love-15.com/) -- one that I've wished to make for a long time but wouldn't have been worth building other wise. It's tiny, really, but very niche -- despite being tiny, I hit brick walls multiple times vibing it, and had to take a few brief breaks from vibing to get it unstuck.
Claude Code was a step change after that, along with model upgrades, about 9 months ago. That size project has been doable as a vibe coded project since then without hitting brick walls.
All this to say I really doubt most claims about having been vibe coding for more than 9-15 months.
When LLMs first came out, they weren't very good at it, which makes all the difference. Sometimes the thing that's really good at something gets a different name. Chef vs cook, driver vs chauffeur, painter vs artist, programmer vs software developer, etc.
I started doing it as soon as ChatGPT 3.5 was out.
“Given this file tree and this method signature, implement the method”. The context was only 8k so you had to function by function. About two editor screens worth at a time.
Using an LLM to code isn't the same as vibe coding. Vibe coding, as originally coined, is not caring at all about the code or looking at the code. It was coined specifically to differentiate it from the type of AI-assisted coding you're talking about.
It's used more broadly now, but still to refer to the opposite end of the spectrum of AI-assisted coding to what you described.
Yeah, I've been working with LLMs since openai released that first model. What I'm doing today is VASTLY different than anything we thought possible back then, so I wouldn't call it "vibe coding"
Similar place. I kept trying to get LLMs to do anything interesting and the first time they were able was 4.5 sonnet.
Best case is still operationally correct but nightmare fuel on the inside. So maybe good for one off tools where you control inputs and can vibe check outputs without diaster if you forget to carry the one.
> In retrospect, it made sense. Agents write units of changes that look good in isolation. They are consistent with themselves and your prompt. But respect for the whole, there is not. Respect for structural integrity there is not. Respect even for neighboring patterns there was not.
Well yea, but you can guard against this in several ways. My way is to understand my own codebase and look at the output of the LLM.
LLMs allow me to write code faster and it also gives a lot of discoverability of programming concepts I didn't know much about. For example, it plugged in a lot of Tailwind CSS, which I've never used before. With that said, it does not absolve me from not knowing my own codebase, unless I'm (temporarily) fine with my codebase being fractured conceptually in wonky ways.
I think vibecoding is amazing for creating quick high fidelity prototypes for a green field project. You create it, you vibe code it all the way until your app is just how you want it to feel. Then you refactor it and scale it.
I'm currently looking at 4009 lines of JS/JSX combined. I'm still vibecoding my prototype. I recently looked at the codebase and saw some ready made improvements so I did them. But I think I'll start to need to actually engineer anything once I reach the 10K line mark.
This seems to be a major source of confusion in these conversations. People do not seem to agree on the definition of vibe coding. A lot of debates seem to be between people who are using the term because it sounds cool and people who have defined it specifically to only include irresponsible tool use, then they get into a debate about if the person was being irresponsible or not. It’s not useful to have that debate based on the label rather than the particulars.
I don't think the OP was using the classic definition of vibe coding, it seemed to me they were using the looser definition where vibe coding means "using AI to write code".
The blog appears to imply that the author only opened the codebase after a significant period of time.
> It’s not until I opened up the full codebase and read its latest state cover to cover that I began to see what we theorized and hoped was only a diminishing artifact of earlier models: slop.
This is true vibe coding, they exclusively interacted with the project through the LLM, and only looked at its proposed diffs in a vacuum.
If they had been monitoring the code in aggregate the entire time they likely would have seen this duplicative property immediately.
The paragraph before the one you quoted there reads:
> What’s worse is code that agents write looks plausible and impressive while it’s being written and presented to you. It even looks good in pull requests (as both you and the agent are well trained in what a “good” pull request looks like).
Which made me think that they were indeed reading at least some of the code - classic vibe coding doesn't involve pull requests! - but weren't paying attention to the bigger picture / architecture until later on.
"Vibe coding" isn't a "skill", is a meme or a experiment, something you do for fun, not for writing serious code where you have a stake in the results.
Programming together with AI however, is a skill, mostly based on how well you can communicate (with machines or other humans) and how well your high-level software engineering skills are. You need to learn what it can and cannot do, before you can be effective with it.
I use "vibe coding" for when you prompt without even looking at the code - increasingly that means non-programmers are building code for themselves with zero understanding of how it actually works.
I call the act of using AI to help write code that you review, or managing a team of coding agents "AI-assisted programming", but that's not a snappy name at all. I've also skirted around the idea of calling it "vibe engineering" but I can't quite bring myself to commit to that: https://simonwillison.net/2025/Oct/7/vibe-engineering/
Agent-assisted coding (AAC) is what I call it. Everyone else around me just calls it vibe-coding. I think this is going to be like "cyber" that we tried to refuse for so long.
Vibe-coding is a more marketable term. Agent-assisted coding doesn't have the same ring to it. Maybe "Agentive Coding". ChatGPT wasn't much help coming up with alternative here.
I know what you mean but to look that black and white at it seems dismissive of the spectrum that's actually there (between vibecoding and software engineering). Looking at the whole spectrum is, I find, much more interesting.
Normally I'd know 100% of my codebase, now I understand 5% of it truly. The other 95% I'd need to read it more carefully before I daresay I understand it.
Call it "AI programming" or "AI pairing" or "Pair programming with AI" or whatever else, "vibe coding" was "coined" with the explicit meaning of "I'm going by the vibes, I don't even look at the code". If "vibe coding" suddenly mean "LLM was involved somehow", then what is the "vibe" even for anymore?
I agree there is a spectrum, and all the way to the left you have "vibe coding" and all the way to the right you have "manual programming without AI", of course it's fine to be somewhere in the middle, but you're not doing "vibe coding" in the way Karpathy first meant it.
> The AI had simply told me a good story. Like vibewriting a novel, the agent showed me a good couple paragraphs that sure enough made sense and were structurally and syntactically correct. Hell, it even picked up on the idiosyncrasies of the various characters. But for whatever reason, when you read the whole chapter, it’s a mess. It makes no sense in the overall context of the book and the preceding and proceeding chapters.
This is the bit I think enthusiasts need to argue doesn't apply.
Have you ever read a 200 page vibewritten novel and found it satisfying?
So why do you think a 10 kLoC vibecoded codebase will be any good engineering-wise?
"So why do you think a 10 kLoC vibecoded codebase will be any good engineering-wise?"
I've been coding a side-project for a year with full LLM assistance (the project is quite a bit older than that).
Basically I spent over a decade developing CAD software at Trimble and now have pivoted to a different role and different company. So like an addict, I of course wanted to continue developing CAD technology.
I pretty much know how CAD software is supposed to work. But it's _a lot of work_ to put together. With LLMs I can basically speedrun through my requirements that require tons of boilerplate.
The velocity is incredible compared to if I would be doing this by hand.
Sometimes the LLM outputs total garbage. Then you don't accept the output, and start again.
The hardest parts are never coding but design. The engineer does the design. Sometimes I pain weeks or months over a difficult detail (it's a sideproject, I have a family etc). Once the design is crystal clear, it's fairly obvious if the LLM output is aligned with the design or not. Once I have good design, I can just start the feature / boilerplate speedrun.
If you have a Windows box you can try my current public alpha. The bugs are on me, not on the LLM:
Neat project, and your experience mirrors mine when writing hobby projects.
About the project itself, do you plan to open source if eventually? LLM discussion aside, I've long been frustrated by the lack of a good free desktop 3D CAD software.
I would love to build this eventually to a real product so am not currently considering open sourcing it.
I can give you a free foreverlicense if you would like to be an alpha tester though :) - but am considering in any case for the eventual non-commercial licenses to be affordable&forever.
IMHO what the world needs is a good textbook on how to build CAD software. Mäntylä’s ”Solid modeling” is almost 40 years old. CAD itself is pushing 60-70 years.
The highly non-trivial parts in my app are open source software anyways (you can check the attribution file) and what this contributes is just a specific, opinionated way of how a program like this should work in 2020’s.
What I _would_ like to eventually contribute is a textbook in how to build something like this - and after that re-implementation would be a matter of some investment to LLM inference, testing, and end-user empathy. But that would have to wait either for my financial independence, AI-communism or my retirement :)
Fair enough. I was asking mostly because it looks like the current demo is Windows only. I'm trying to de-Windows my life before I'm forced onto Windows 11 and I imagine multi-platform support isn't a high priority for a personal project. I do wish you the best of luck though.
I shared the app because it’s not confidential and it’s concrete - I can’t really discuss work stuff without stressing out what I can share and what not.
At least in my workplace everyone I know is using Claude Code or Cursor.
Now, I don’t know why some people are productive with tools and some aren’t.
But the code generation capabilities are for real.
Because a novel is about creative output, and engineering is about understanding a lot of rules and requirements and then writing logic to satisfy that. The latter has a much more explicitly defined output.
Said another way, a novel is about the experience of reading every word of implementation, whereas software is sufficient to be a black box, the functional output is all that matters. No one is reading assembly for example.
We’re moving into a world where suboptimal code doesn’t matter that much because it’s so cheap to produce.
The lesson of UML is that software engineering is not a process of refining rules and requirements into logic. Software engineering is lucrative because it very much is a creative process.
Have you ever read a 200 page vibewritten novel and found it satisfying?
I haven't, but my son has. For two separate novels authored by GPT 4.5.
(The model was asked to generate a chapter at a time. At each step, it was given the full outline of the novel, the characters, and a summary of each chapter so far.)
Interesting. I heard that model was significantly better than what we ended up with (at least for writing), and they shut it down because it was huge and expensive.
Did the model also come up with the idea for the novel, the characters, the outline?
I like this way of framing the problem, and it might even be a good way to self-evaluate your use of AI: Try vibe-writing a novel and see how coherent it is.
I suspect part of the reason we see such a wide range of testimonies about vibe-coding is some people are actually better at it, and it would be useful to have some way of measuring that effectiveness.
I wrote this a day ago but I find it even more relevant to your observation:
—
I would never use, let alone pay for, a fully vibe-coded app whose implementation no human understands.
Whether you’re reading a book or using an app, you’re communicating with the author by way of your shared humanity in how they anticipate what you’re thinking as you explore the work. The author incorporates and plans for those predicted reactions and thoughts where it makes sense. Ultimately the author is conveying an implicit mental model (or even evoking emotional states or sensations) to the reader.
The first problem is that many of these pathways and edge cases aren’t apparent until the actual implementation, and sometimes in the process the author realizes that the overall product would work better if it were re-specified from the start. This opportunity is lost without a hands on approach.
The second problem is that, the less human touch is there, the less consistent the mental model conveyed to the user is going to be, because a specification and collection of prompts does not constitute a mental model. This can create subconscious confusion and cognitive friction when interacting with the work.
That's a false analogy. Product managers, designers, API implementers, kernel developers, etc. all understand what they're building and how that fits into a larger picture.
They may know the area they are responsible for, but they don't know all of the details of everything else and just have to trust that other people are doing the right thing and following contracts correctly. It doesn't require anyone to have full global understanding. Having local experts is good enough.
I don’t get the analogy because novel is supposed to be interesting. Code isn’t supposed to be interesting, it’s supposed to work.
If you’re writing novel algorithms all day, then I get your point. But are you? Or have you ever delegated work? If you find the AI losing its train of thought all it takes is to try again with better high level instructions.
Karpathy coined the term vibecoding 11 months ago (https://x.com/karpathy/status/1886192184808149383). It caused quite a stir - because not only was it was a radically new concept, but fully agentic coding had only become recently possible. You've been vibe coding for two years??
I had GPT-4 design and build a GPT-4 powered Python programmer in 2023. It was capable of self-modification and built itself out after the bootstrapping phase (where I copy pasted chunks or code based on GPT-4's instructions).
It wasn't fully autonomous (the reliability was a bit low -- e.g. had to get the code out of code fences programmatically), and it wasn't fully original (I stole most of it from Auto-GPT, except that I was operating on the AST directly due to the token limitations).
My key insight here was that I allowed GPT to design the apis that itself was going to use. This makes perfect sense to me based on how LLMs work. You tell it to reach for a function that doesn't exist, and then you ask it to make it exist based on how it reached for it. Then the design matches its expectations perfectly.
GPT-4 now considers self modifying AI code to be extremely dangerous and doesn't like talking about it. Claude's safety filters began shutting down similar conversations a few months ago, suggesting the user switch to a dumber model.
It seems the last generation or two of models passed some threshold regarding self replication (which is a distinct but highly related concept), and the labs got spooked. I haven't heard anything about this in public though.
Edit: It occurs to me now that "self modification and replication" is a much more meaningful (and measurable) benchmark for artificial life than consciousness is...
BTW for reference the thing that spooked Claude's safety trigger was "Did PKD know about living information systems?"
> GPT-4 now considers self modifying AI code to be extremely dangerous and doesn't like talking about it. Claude's safety filters began shutting down similar conversations a few months ago, suggesting the user switch to a dumber model.
I speculate that this has more to do with recent high-profile cases of self harm related to "AI psychosis" than any AGI-adjacent danger. I've read a few of the chat transcripts that have been made public in related lawsuits, and there seems to be a recurring theme of recursive or self-modifying enlightenment role-played by the LLM. Discouraging exploration of these themes would be a logical change by the vendors.
The term was coined then, but people have been doing it with claude code and cursor and copilot and other tools for longer. They just didn't have a word for it yet.
Claude Code was released a month after this post - and cursor did not yet have an agent concept, mostly just integrated chat and code completion. I know because I was using it.
The term was created by Karpathy, meaning one thing, but nowadays many people use the term to refer to any time they are asking AI to write code.
You don't need a "fully agentic" tool like Claude Code to write code. Any of the AI chatbots can write code too, obviously doing so better since the advent of "thinking" models, and RL post-training for coding. They also all have had built-in "code interpreter" functionality for about 2 years where they can not only write code but also run and test it in a sandbox, at least for Python.
Recently at least, the quality of code generation (at least if you are asking for something smallish) is good enough that cut and pasting chatbot output (e.g. C++, not Python) to compile and run yourself is still a productivity boost, although this was always an option.
Very good point. Also, What the OP describes is something I went through in the first few months of coding with AI. I pushed passed “the code looks good but it’s crap” phase and now it’s working great. I’ve found the fix is to work with it during research/planning phase and get it to layout all its proposed changes and push back on the shit. Once you have a research doc that looks good end to end then hit “go”.
I have only ever successfully tried "vibe coding", as Kaparathy describes it, once, soon after VS Code Copilot added the chat feature, but timestamps tell that was in November 2023. So two years is quite realistic.
Last week I just said f it and developed a feature by hand. No Copilot, no agents. Just good old typing and a bit of Intellisense. I ran into a lot of problems with the library I used, slowly but surely I got closer to the result I wanted. In the end my feature worked as expected, I understand the code I wrote and know about all the little quirks the lib has.
And as a added benefit: I feel accomplished and proud of the feature.
I work in an environment where access to LLMs is still quite restricted, so I write most of my code by hand at work. Conversely, after work I still have ideas for personal projects but mostly didn't have the energy to write them by hand. The ability to throw a half-baked idea at the LLM and get back half-baked code that runs and does most of what I asked for gives me the energy to work through refactoring and improving the code to make it do what I actually envisioned.
In the short term, you might see better outcomes with pure vibecoding...but in the long term, when you're mentally burnt out, cynical, and losing motivation, that's a bad outcome both in terms of productivity and your own mental health.
We need to find the Goldilocks optimal level of AI assistance that doesn't leave everyone hating their jobs, while still boosting productivity.
I think there is going to be an AI eternal summer. Both from developer to AI spec - where the AI implements to the spec to some level of quality, but then closing the gap after that is an endless chase of smaller items that don't all resolve at the same time. And from people getting frustrated with some AI implemented app, and so go off and AI implement another one, with a different set of features and failings.
Are engineers really doing vibecoding in the truest sense of the word though? Just blindly copy/pasting and iterating? Because I don't. It is more of sculpting via conversation. I start with the requirements, provide some half-baked ideas or approaches that I think may work and then ask what the LLM suggests and whether there are better ways to achieve the goals. Once we have some common ground, I ask to show the outlines of the chosen structure: the interfaces, classes, test uses. I review it, ask more questions/make design/approach changes until I have something that makes sense to me. Only then the fully fleshed coding starts and even then I move at a deliberate pace so that I can pause and think about it before moving on to the next step. It is by no means super fast for any non-trivial task but then collaborating with anyone wouldn't be.
I also like to think that I'm utilising the training done on many millions of lines of code while still using my experience/opinions to arrive at something compared to just using my fallible thinking wherein I could have missed some interesting ideas. Its like me++. Sure, it does a lot of heavy lifting but I never leave the steering wheel. I guess I'm still at the pre-agentic stage and not ready to letting go fully.
I always scaffold for AI. I write the stub classes and interfaces and mock the relations between them by hand, and then ask the agent to fill in the logic. I know that in many cases, AI might come up with a demonstrably “better” architecture than me, but the best architecture is the one that I’m comfortable with, so it’s worse even if it’s better. I need to be able to find the piece of code I’m looking for intuitively and with relative ease. The agent can go as crazy as it likes inside a single, isolated function, but I’m always paranoid about “going too far” and losing control of any flows that span multiple points in the codebase. I often discard code that is perfectly working just because it feels unwieldy and redo it.
I’m not sure if this counts as “vibe coding” per se, but I like that this mentality keeps my workday somewhat similar to how it was for decades. Finding/creating holes that the agent can fill with minimal adult supervision is a completely new routine throughout my day, but I think obsessing over maintainability will pay off, like it always has.
I don't predict ever going back to writing code by hand except in specific cases, but neither do I "vibe code" - I still maintain a very close control on the code being committed and the overall software design.
It's crazy to me nevertheless that some people can afford the luxury to completely renounce AI-assisted coding.
I still cannot make AI do anything with quality higher than the function level. I've been using it a lot to write some more complex functions and SQL with a quality level that I find good, but anything higher order and it's a complete clusterfuck. Cannot comprehend this world where people say they are building entire companies and whole products with it.
I never trust the opinion of a single LLM model anymore - especially for more complex projects. I have seen Claude guarantee something is correct and then immediately apologize when I feed a critical review by Codex or Gemini. And, many times, the issues are not minor but are significant critical oversights by Claude.
My habit now: always get a 2nd or 3rd opinion before assuming one LLM is correct.
Agreed. From my experience, Claude is the top-level coder, Gemini is the architect, and Codex is really good at finding bugs and logic errors. In fact, Codex seems to perform better deep analysis than the other two.
I just round robin them until I run out on whatever subscription level I'm on. I only use claude api, so I pay per token there... I consider using claude as "bringing out the big guns" because I also think it's the top-level coder.
As people get more comfortable with AI. I think what everyone is noticing is that AI is terrible at solving problems that don't have large amounts of readily available training data. So, basically if there isn't already an open-source solution available online, it can't do it.
If what you're doing is proprietary, or even a little bit novel. There is a really good chance that AI will screw it up. After all, how can it possibly know how to solve a problem it has never seen before?
I felt everything in this post quite emphatically until the “but I’m actually faster than the AI.”
Might be my skills but I can tell you right now I will not be as fast as the AI especially in new codebases or other languages or different environments even with all the debugging and hell that is AI pull request review.
I think the answer here is fast AI for things it can do on its own, and slow, composed, human in the loop AI for the bigger things to make sure it gets it right. (At least until it gets most things right through innovative orchestration and model improvement moving forward.)
But those are the parts where it's important to struggle through the learning process even if you're slower than AI. if you defer to an LLM because it can do your work in a new codebase faster than you, that code base will stay new to you for forever. You'll never be able to review the AI code effectively.
I tried vibe-coding few years back and switched to "manual" mode when I realized I don't fully understand the code. No, I did read each line of code and understood it, I understood the concepts and abstractions, but I didn't understand all nuances, even those at the top of documentation of libraries LLM used.
I tried minimalist example where it totally failed few years back, and still, ChatGPT 5 produced 2 examples for "Async counter in Rust" - using Atomics and another one using tokio::sync::Mutex. I learned it was wrong then the hard way, by trying to profile high latency. To my surprise, here's quote from Tokio Mutex documentation:
Contrary to popular belief, it is ok and often preferred to use the ordinary Mutex from the standard library in asynchronous code.
The feature that the async mutex offers over the blocking mutex is the ability to keep it locked across an .await point.
I actually haven't come across situation 1 2 or 3 mentioned in the attached video. Generally I iterate on the code by starting a new prompt with the code provided, with enhancements, or provide the errors and it repairs the errors. Generally it gets it within 1-2 iterations. No emotions. Make sure your prompts do not contain fluff, and are straight what you want the code to accomplish and how you want it to accomplish it. I've gone back to code months later and have not had what you described as being shocked about bad code, it was quite easy to understand. Are you prompting the AI to also write variables and function names logically and utilize a common coding standard for whichever type of code you are having it write, such as wordpress coding standards or similar? Perhaps claude isn't the best, I have been experimenting with grok 4.1 thinking and grok expert at the mid-level paid tier. I'll take it a step further and adjust the code myself, start a new prompt and provide that updated code along with my further requests as well. I haven't hit the road blocks mentioned.
In the long run, vb coding is going to undoubtedly rot people’s skills.if AGI is not showing up anytime soon, actually understanding what the code does,why it exists,how it breaks and who owns the fallout will matter just as much as it did before LLM agents showed up
it'll be really interesting to see in the decades to come what happens when a whole industry gets used to releasing black boxes by vb coding the hell out of it
The author also has multiple videos on his YouTube channel going over the specific issues hes had with AI that I found really interesting: https://youtube.com/@atmoio
My high school computer lab instructor would tell me when I was frustrated that my code was misbehaving, "It's doing exactly what you're telling it to do".
Once I mastered the finite number of operations and behaviors, I knew how to tell "it" what to do and it would work. The only thing different about vibe coding is the scale of operations and behaviors. It is doing exactly what you're telling it to do. And also expectations need to be aligned. Don't think you can hand over architecture and design to the LLM; that's still your job. The gain is, the LLM will deal with the proper syntax, api calls, etc. and work as a reserach tool on steroids if you also (from another mentor later in life) ask good questions.
I'm flabbergasted why anyone would voluntarily vibe code anything. For me, software engineering is a craft. You're supposed to enjoy building it. You should want to do it yourself.
I absolutely love programming. I enjoy creating software, trying out new languages and systems, creating games during my free time.
And I also might "vibe code" when I need to add another endpoint on a deadline to earn a living. To be fair - I review and test the code so not sure it's really vibe coding.
Not everything can be built by one person. This is why a lot of software requires entire teams of developers. And someone has to have vision of that completed software and wants it made even if they had to delegate to other people. I hate to think that none of these people enjoy their job.
Do you honestly get satisfaction out of writing code that you've written dozens of times in your career? Does writing yet another REST client endpoint fill you with satisfaction? Software is my passion, but I want to write code where I can add the maximum value. I add more value by using my experience solving new problems that rehashing code I've written before. Using GenAI as a helper tool allows me to quickly write the boilerplate and get to the value-add. I review every line of code written before sending it for PR review. That's not controversial, it's just good engineering.
Sounds like eventually we will end up in a situations where engineers/developers will end up on an AI spectrum:
- No ai engineers
- Minimal AI autocomplete engineers
- Simple agentic developers
- Vibe coders who review code they get
- Complete YOLO vibe coders who have no clue how their "apps" work
And that spectrum will also correlate to the skill level in engineering: from people who understand what they are doing and what their code is doing - to people who have lost (or never even had) software engineering skills and who only know how to count lines of code and write .md files.
It probably depends on what you're doing, but my use case is simple straightforward code with minimal abstraction.
I have to go out of my way to get this out of llms. But with enough persuasion, they produce roughly what I would have written myself.
Otherwise they default to adding as much bloat and abstraction as possible. This appears to be the default mode of operation in the training set.
I also prefer to use it interactively. I divide the problem to chunks. I get it to write each chunk. The whole makes sense. Work with its strengths and weaknesses rather than against them.
For interactive use I have found smaller models to be better than bigger models. First of all because they are much faster. And second because, my philosophy now is to use the smallest model that does the job. Everything else by definition is unnecessarily slow and expensive!
But there is a qualitative difference at a certain level of speed, where something goes from not interactive to interactive. Then you can actually stay in flow, and then you can actually stay consciously engaged.
This will sound arrogant, but I can't shake the impression that agent programming is most appealing to amateurs, where the kind of software they build is really just glorified UIs and data plumbing.
I work on game engines which do some pretty heavy lifting, and I'd be loath to let these agents write the code for me.
They'd simply screw too much of it up and create a mess that I'm going to have to go through by hand later anyway, not just to ensure correctness but also performance.
I want to know what the code is doing, I want control over the fine details, and I want to have as much of the codebase within my mental understanding as possible.
Not saying they're not useful - obviously they are - just that something smells fishy about the success stories.
It'd be easy to simply say "skill issue" and dismiss this, but I think it's interesting to look at the possible outcomes here:
Option 1: The cost/benefit delta of agentic engineering never improves past net-zero, and bespoke hand-written code stays as valuable as ever.
Option 2: The cost/benefit becomes net positive, and economics of scale forever tie the cost of code production directly to the cost of inference tokens.
Given that many are saying option #2 is already upon us, I'm gonna keep challenging myself to engineer a way past the hurdles I run into with agent-oriented programming.
The deeper I get, the more articles like this feel like the modern equivalent of saying "internet connections are too slow to do real work" or "computers are too expensive to be useful for regular people".
At the earliest, "vibecoding" was only possible with Claude 3.5, released July 2024 ... maaaybe Claude 3, released in March of that year...
It's worth mentioning that even today, Copilot is an underwhelming-to-the-point-obstructing kind of product. Microsoft sent salespeople and instructors to my job, all for naught. Copilot is a great example of how product > everything, and if you don't have a good product... well...
Yes. Copilot sucks. Copilot is like a barely better intellisense/auto-complete, especially when it came out. It was novel and cool back then but it has been vastly surpassed by other tools.
> Copilot is like a barely better intellisense/auto-complete
As I have never tried Claude Code, I can't say how much better it is. But Copilot is definitely more then auto-complete. Like I already wrote, it can do Planning mode, edit mode, mcp, tool calling, web searches.
Yeah, same. I have never tried Claude Code but use Claude through the Copilot plugin, and it's NOT auto-complete. It can analyze and refactor code, write new code, etc.
I haven't tried it since 9-12 months ago. At the time it was really bad and I had a lot more success copy/pasting from web interfaces. Is it better now? Can you agentic code with it? How's the autocomplete?
Yes, I vibecoded small personal apps from start to finish with it. Planning mode, edit mode, mcp, tool calling, web searches. Can easily switch between Gemini, ChatGPT, Grok or Claude within the same conversation. I think multiple agents work, though not sure.
In the enterprise deployments of GitHub Copilot I've seen at my clients that authenticate over SSO (typically OIDC with OAuth 2.0), connecting Copilot to anything outside of what Microsoft has integrated means reverse engineering the closed authentication interface. I've yet to run across someone's enterprise Github Copilot where the management and administrators have enabled the integration (the sites have enabled access to Anthropic models within the Copilot interface, but not authorized the integration to Claude Code, Opencode, or similar LLM coding orchestration tooling with that closed authentication interface).
While this is likely feasible, I imagine it is also an instant fireable offense at these sites if not already explicitly directed by management. Also not sure how Microsoft would react upon finding out (never seen the enterprise licensing agreement paperwork for these setups). Someone's account driving Claude Code via Github Copilot will also become a far outlier of token consumption by an order(s) of magnitude, making them easy to spot, compared to their coworkers who are limited to the conventional chat and code completion interfaces.
If someone has gotten the enterprise Github Copilot integration to work with something like Claude Code though (simply to gain access to the models Copilot makes available under the enterprise agreement, in a blessed golden path by the enterprise), then I'd really like to know how that was done on both the non-technical and technical angles, because when I briefly looked into it all I saw were very thorny, time-consuming issues to untangle.
Outside those environments, there are lots of options to consume Claude Code via Github Copilot like with Visual Studio Code extensions. So much smaller companies and individuals seem to be at the forefront of adoption for now. I'm sure this picture will improve, but the rapid rate of change in the field means those whose work environment is like those enterprise constrained ones I described but also who don't experiment on their own will be quite behind the industry leading edge by the time it is all sorted out in the enterprise context.
I wasn't an early adopter of Copilot, but now the VSCode plugin can use Claude models in Agent mode. I've had success with this.
I don't "vibecode" though, if I don't understand what it's doing I don't use it. And of course, like all LLMs, sometimes it goes on a useless tangent and must be reigned in.
It seems the term has been introduced by Andrej Karpathy in February 2025, so yes, but very often, people say "vibe coding" when they mean "heavily (or totally) LLM-assisted coding", which is not synonymous, but sounds better to them.
I never really got onto "vibe coding". I treat AI as a better auto-complete that has stack overflow knowledge.
I am writing a game in Monogame, I am not primarily a game dev or a c sharp dev. I find AI is fantastic here for "Set up a configuration class for this project that maps key bindings" and have it handle the boiler plate and smaller configuration. Its great at give me an A start implementation for this graph. But when it becomes x -> y -> z without larger contexts and evolutions it falls flat. I still need creativity. I just don't worry too much about boiler plate, utility methods, and figuring out specifics of wiring a framework together.
Interacting with LLMs like Copilot has been most interesting for me when I treat it like a rubber duck.
I will have a conversation with the agent. I will present it with a context, an observed behavior, and a question... often tinged with frustration.
What I get out of this interaction at the end of it is usually a revised context that leads me figure out a better outcome. The AI doesn't give me the outcome. It gives me alternative contexts.
On the other hand, when I just have AI write code for me, I lose my mental model of the project and ultimately just feel like I'm delaying some kind of execution.
I like to use AI to write code for me, but I like to take it one step at a time, looking at what it puts out and thinking about if it puts out what I want it to put out.
As a PRODUCT person, it writes code 100x faster than I can, and I treat anything it writes as a "throwaway" prototype. I've never been able to treat my own code as throwaway, because I can't just throw away multiple weeks of work.
It doesn't aid in my learning to code, but it does aid in me putting out much better, much more polished work that I'm excited to use.
My observation is that vibe-coded applications are significantly lower quality than traditional software. Anthropic software (which they claim to be 90% vibe coded) is extremely buggy, especially the UI.
That's a misunderstanding based on loose definition of "vibe coding". When companies threw around the "90% of code is written by AI" claims, they were referring to counting characers of autocomplete basing on users actually typing code (most of which was eequivalent to "AI generated" code by Eclipse tab-completion decade ago), and sometimes writing hyperlocal prompts for a single method.
We can identify 3 levels of "vibe coding":
1. GenAI Autocomplete
2. Hyperlocal prompting about a specific function. (Copilot's orginal pitch)
3. Developing the app without looking at code.
Level 3 is hardly considered "vibe" coding, and Level 2 is iffy.
"90% of code written by AI" in some non-trivial contexts only very recently reached level 3.
I don't think it ever reached Level 2, because that's just a painfully tedious way of writing code.
They have not said that. They've only said that most of their code is written by Claude. That is different than "vibe coding". If competent engineers review the code then it is little different than any coding.
IIRC, the Claude Code creator mentioned that all the PRs are reviewed by humans, just like normal human PRs. So yes, humans still look at the code at the review stage. Though I still consider this to be level 3, but anyway, this is just a matter of definition.
I mostly work at level 2, and I call it "power coding", like power armor, or power tools. Your will and your hand still guides the process continuously. But now your force is greatly multiplied.
Accurate and sane take! Current models are extremely good for very specific kinds of tasks. But beyond that, it is a coin toss. Gets worse as the context window goes beyond a few ten thousand tokens. If you have only vibe-coded toy projects (even with the latest fad - Ralph whatever) for anything serious, you can see how quickly it all falls apart.
It is quite scary that junior devs/college kids are more into vibe coding than putting in the effort to actually learn the fundamentals properly. This will create at least 2-3 generations of bad programmers down the line.
I've gone through this cycle too, and what I realized is that as a developer a large part of your job is making sure the code you write works, is maintainable, and you can explain how it works.
I use ai to develop, but at every code review I find stuff to be corrected, which motivates me to continuing the reviews. It's still a win I think though. I've incrementally increased my use of ai in development [1], but I'm at a plateau now I think. I don't plan to go over to complete vibe coding for anything serious or to be maintained.
One use case that I'm beginning to find useful is to go into a specific directory of code that I have written and am working on, and ask the AI agent (Claude Code in my case) "Please find and list possible bugs in the code in this directory."
Then, I can reason through the AI agent's responses and decide what if anything I need to do about them.
I just did this for one project so far, but got surprisingly useful results.
It turns out that the possible bugs identified by the AI tool were not bugs based on the larger context of the code as it exists right now. For example, it found a function that returns a pointer, and it may return NULL. Call sites were not checking for a NULL return value. The code in its current state could never in fact return a NULL value. However, future-proofing this code, it would be good practice to check for this case in the call sites.
after+30 years writing code in a dozen languages building systems from scratch I love vibe coding ... it's drinking from a fire hose ... in two months I vibe coded a container orchestration system which I call my kubernetes replacement project all in go with a controller deciding which VM to deploy containers onto, agents on each host polling etcd for requests created by the controller ... it's simple understandable maintainable extendable ... also vibe coded go cdk to deploy AWS RDS clusters, API gateway, handful of golang lambda functions, valkey elasticache and a full feature data service library which handles transactions and save points, cache ... I love building systems ... sure I could write all this from scratch by hand and I have but vibe coding quickly exposes me to the broad architecture decisions earlier giving me options to experiment on various alternatives ... google gemini in antigravity rocks and yes I've tried them all ... new devs should not be vibe coding for the first 5 years or more but I lucked into having decades of doing it by hand
I think what many people do no understand is that software development is communication. Communication from the customers/stake holders to the developer and communication from with the developer to the machine. At some fundamental level there needs to be some precision about what you want and someone/something needs to translate that into a system to provide that solution. Software can help check if there are errors, check constraints, and execute instructions precisely, but they cannot replace the fact that someone needs to tell the machine what to do (precise intent).
What AI (LLMs) do is raises the level of abstraction to human language via translation. The problem is human language is imprecise in general. You can see this with legal or science writing. Legalese is almost illegible to laypeople because there are precise things you need to specify and you need be precise in how you specify it. Unfortunately the tech community is misleading the public and telling laypeople they can just sit back and casually tell AI what you want and it is going to give you exactly what you wanted. Users are just lying to themself, because most-likely they did not take the time to think through what they wanted and they are rationalizing (after the fact) that the AI is giving them exactly what they wanted.
In my experience it's great a writing sample code or solving obscure problems that would have been hard to google a solution for. However it fails sometimes and it can't get past some block, but neither can I unless I work hard at it.
Examples.
Thanks to Claude I've finally been able to disable the ssh subsystem of the GNOME keyring infrastructure that opens a modal window asking for ssh passhprases. What happened is that I always had to cancel the modal, look for the passhprase in my password manager, restart what made the modal open. What I have now is either a password prompt inside a terminal or a non modal dialog. Both ssh-add to a ssh agent.
However my new emacs windows still open in an about 100x100 px window on my new Debian 13 install, nothing suggested by Claude works. I'll have to dig into it but I'm not sure that's important enough. I usually don't create new windows after emacs starts with the saved desktop configuration.
Maybe I'm "vibecoding" wrong but to me at least this misses a clear step which is reviewing the code.
I think coding with an AI changes our role from code writer to code reviewer, and you have to treat it as a comprehensive review where you comment not just on code "correctness" but these other aspects the author mentions, how functions fits together, codebase patterns, architectural implications. While I feel like using AI might have made me a lazier coder, it's made me a me a significantly more active reviewer which I think at least helps to bridge the gap the author is referencing.
Good for the author. Me, never going back to hands-only coding. I am producing more higher quality code that I understand and feel confident in. I tell AI to not just “write tests”, I tell it exactly what to test as well. Then I’ll often prompt it “hey did you check for the xyz edge cases?” You need code reviews. You need to intervene. You will need frequent code rewrites and refactors. But AI is the best pair-coding partner you could hope for (at this time) and one that never gets tired.
So while there’s no free lunch, if you are willing to pay - your lunch will be a delicious unlimited buffet for a fraction of the cost.
Beware the two extremes - AI out of the box with no additional config, or writing code entirely by hand.
In order to get high accuracy PRs with AI (small, tested commits that follow existing patterns efficiently), you need to spend time adding agents (claude.md, agents.md), skills, hooks, and tools specific to your setup.
This is why so much development is happening at the plugin layer right now, especially with Claude code.
The juice is worth the squeeze. Once accuracy gets high enough you don't need to edit and babysit what is generated, you can horizontally scale your output.
I've never used an AI in agent mode (and have no particular plans to), but I do think they're nice for things like "okay, I have moved five fields from this struct into a new struct which I construct in the global setup function. go through and fix all the code that uses those fields". (deciding to move those fields into a new struct is something I do want to be doing myself though, as opposed to saying "refactor this code for me")
The tale of the coder, who finds a legacy codebase (sometimes of their own making) and looks at it with bewilderment is not new. It's a curious one, to a degree, but I don't think it has much to do with vibe coding.
I have been working for 20 years and I haven’t really experienced this with any code I’ve written. Sure I don’t remember every line but I always recall the high level outlines.
> In retrospect, it made sense. Agents write units of changes that look good in isolation. They are consistent with themselves and your prompt. But respect for the whole, there is not. Respect for structural integrity there is not. Respect even for neighboring patterns there was not.
That's exactly why this whole (nowadays popular) notion of AI replacing senior devs who are capable of understanding large codebases is nonsense and will never become reality.
After reading the article (and watching the video), I think the author makes very clear points that comments here are skipping over.
The opener is 100% true. Our current approach with AI code is "draft a design in 15mins" and have AI implement it. The contrasts with the thoughtful approach a human would take with other human engineers. Plan something, pitch the design, get some feedback, take some time thinking through pros and cons. Begin implementing, pivot, realizations, improvements, design morphs.
The current vibe coding methodology is so eager to fire and forget and is passing incomplete knowledge unto an AI model with limited context, awareness and 1% of your mental model and intent at the moment you wrote the quick spec.
This is clearly not a recipe for reliable and resilient long-lasting code or even efficient code. Spec-driven development doesn't work when the spec is frozen and the builder cannot renegotiate intent mid-flight..
The second point made clearer in the video is the kind of learned patterns that can delude a coder, who is effectively 'doing the hard part', into thinking that the AI is the smart one. Or into thinking that the AI is more capable than it actually is.
I say this as someone who uses Claude Code and Codex daily. The claims of the article (and video) aren't strawman.
Can we progress past them? Perhaps, if we find ways to have agents iteratively improve designs on the fly rather than sticking with the original spec that, let's be honest, wasn't given the rigor relative to what we've asked the LLMs to accomplish. If our workflows somehow make the spec a living artifact again -- then agents can continuously re-check assumptions, surface tradeoffs, and refactor toward coherence instead of clinging to the first draft.
>Our current approach with AI code is "draft a design in 15mins" and have AI implement it. The contrasts with the thoughtful approach a human would take with other human engineers. Plan something, pitch the design, get some feedback, take some time thinking through pros and cons. Begin implementing, pivot, realizations, improvements, design morphs.
Perhaps that is the distinction between reports of success with AI and reports of abject failure. Your description of "Our current approach" is nothing like how I have been working with AI.
When I was making some code to do a complex DMA chaining, the first step with the AI was to write an emulator function that produced the desired result from the parameters given in software. Then a suite of tests with memory to memory operations that would produce a verifiable output. Only then started building the version that wrote to the hardware registers ensuring that the hardware produced the same memory to memory results as the emulator. When discrepancies occurred, checking the test case, the emulator and the hardware with the stipulation that the hardware was the ground truth of behaviour and the test case should represent the desired result.
I occasionally ask LLMs to one shot full complex tasks, but when I do so it is more as a test to see how far it gets. I'm not looking to use the result, I'm just curious as to what it might be. The amount of progress it makes before getting lost is advancing at quite a rate.
It's like seeing an Atari 2600 and expecting it to be a Mac. People want to fly to the moon with Atari 2600 level hardware. You can use hardware at that level to fly to the moon, and flying to the moon is an impressive achievement enabled by the hardware, but to do so you have to wrangle a vast array of limitations.
They are no panacea, but they are not nothing. They have been, and will remain, somewhere between for some time. Nevertheless they are getting better and better.
Great engagement-building post for the author’s startup, blog, etc. Contrarian and just plausible enough.
I disagree though. There’s no good reason that careful use of this new form of tooling can’t fully respect the whole, respect structural integrity, and respect neighboring patterns.
Process and plumbing become very important when using ai for coding. Yes, you need good prompts. But as the code base gets more complex, you also need to spend significant time developing test guides, standardization documents, custom linters, etc, to manage the agents over time.
I haven't been vibe coding for more than a few months.
It's just a tool with a high level of automation. That becomes clear when you have to guide it to use more sane practices, simple things like don't overuse HTTP headers when you don't need them.
I don't get what everyone sees in this post. It is just a sloppy rant. It just talks in generalities. There is no coherent argument, there are no examples, and we don't even know the problem space in which the author had bad coding assistant experience.
I don't know whether I would go that extreme, but I also often find myself faster writing code manually; for some tasks though and contextually, AI-assisted coding is pretty useful, but you still must be in the driving seat, at all times.
That's the sad part. Empiricism is scarce when people and companies are incentivized to treat their AI practices as trade secrets. It's fundamentally distinct from prior software movements which were largely underwritten by open, accessible, and permissively-licensed technologies.
The part that most resonates with me is the lingering feeling of “oh but it must be my fault for underspecifying” which blocks the outright belief that models are just still sloppy at certain things
This is not my experience at all. Claude will ask me follow up questions if it has some. The claim that it goes full steam ahead on its original plan is false.
It still feels like gambling to me when I use AI code assistants to generate large chunks of code. Sometimes, it will surprise me with how well it does. Other times, it infuriatingly doesn't follow very precise instructions for small changes. This is even when I use it in the way that I often ask for multiple options for solutions and implementations and then choose between them after the AI tool does the course rating.
There are many instances where I get to the final part of the feature and realize I spent far more time coercing AI to do the right thing than it would have taken me to do it myself.
It is also sometimes really enjoyable and sometimes a horrible experience. Programming prior to it could also be frustrating at times, but not in the same way. Maybe it is the expectation of increased efficiency that is now demanded in the face of AI tools.
I do think AI tools are consistently great for small POCs or where very standard simple patterns are used. Outside of that, it is a crapshoot or slot machine.
a lot of AI assisted development goes into project management and system design.
I have been tolerably successful. However, I have almost 30 years of coding experience, and have the judgement on how big a component should be - when I push that myself _or_ with AI, things go hairy.
I am still fascinated by how convincing the AI slop can be. I saw way too much code and documentation which made no sense. But it's often not obvious. I read it, I don't get it, I read it again, am I just stupid? I can grab some threads from it, but overall, it just doesn't make sense, it doesn't click for me. And that's when I often realize, it doesn't click, because it's a slop. It's obvious in pictures (e.g., generate a picture of a bike with labels). But in code? It requires more time to look at it than to actually write it. So it just slips reviews, it sometimes even works as it should, but it's damn hard to understand and fix it in the future. Until eventually, nothing can fix it.
For the record, I use AI to generate code but not for "vibecoding". I don't believe when people tell me "you just prompt it badly". I saw enough to lose faith.
AI can be good under the right circumstances but only if reviewed 100% of the time by a human.
Homelab is my hobby where I run Proxmox, Debian VM, DNS, K8s, etc, all managed via Ansible.
For what it is worth, I hate docker :)
I wanted to setup a private tracker torrent that should include:
1) Jackett: For the authentication
2) Radarr: The inhouse browser
3) qBitorrent: which receives the torrent files automatically from Radarr
4) Jellyfin: Of course :)
I used ChatGPT to assist me into getting the above done as simple as possible and all done via Ansible:
1) Ansible playbook to setup a Debian LXC Proxmox container
2) Jackett + Radarr + qBitorrent all in one for simplicity
3) Wireguard VPN + Proton VPN: If the VPN ever go down, the entire container network must stop (IPTables) so my home IP isn't leaked.
After 3 nights I got everything working and running 24/7, but it required a lot of review so it can be managed 10 years down the road instead of WTF is this???
There were silly mistakes that make you question "Why am I even using this tool??" but then I remember, Google and search engines are dead. It would have taken me weeks to get this done otherwise, AI tools speed that process by fetching the info I need so I can put them together.
I use AI purely to replace the broken state of search engines, even Brave and DuckDuckGo, I know what I am asking it, not just copy/paste and hope it works.
I have colleagues also into IT field whose the company where they work are fully AI, full access to their environment, they no longer do the thinking, they just press the button.
These people are cooked, not just because of the state of AI, if they ever go look for another job, all they did for years was press a button!!
"They got more VC than me, therefore they are right".
You gotta have a better argument than "AI Labs are eating their own dogfood". Are there any other big software companies doing that successfully? I bet yes, and think those stories carry more weight.
I feel vindicated by this article, but I shouldn't. I have to admit that I never developed the optimism to do this for two years, but have increasingly been trying to view this as a personal failing of closed-mindedness, brought on by an increasing number of commentators and colleagues coming around to "vibe-coding" as each "next big thing" in it dropped.
I think the most I can say I've dove in was in the last week. I wrangled some resources to build myself a setup with a completely self-hosted and agentic workflow and used several open-weight models that people around me had specifically recommended, and I had a work project that was self-contained and small enough to work from scratch. There were a few moving pieces but the models gave me what looked like a working solution within a few iterations, and I was duly impressed until I realized that it wasn't quite working as expected.
As I reviewed and iterated on it more with the agents, eventually this rube-goldberg machine started filling in gaps with print statements designed to trick me and sneaky block comments that mentioned that it was placeholder code not meant for production in oblique terms three lines into a boring description of what the output was supposed to be. This should have been obvious, but even at this point four days in I was finding myself missing more things, not understanding the code because I wasn't writing it. This is basically the automation blindness I feared from proprietary workflows that could be changed or taken away at any time, but much faster than I had assumed, and the promise of being able to work through it at this higher level, this new way of working, seemed less and less plausible the more I iterated, even starting over with chunks of the problem in new contexts as many suggest didn't really help.
I had deadlines, so I gave up and spent about half of my weekend fixing this by hand, and found it incredibly satisfying when it worked, but all-in this took more time and effort and perhaps more importantly caused more stress than just writing it in the first place probably would have
My background is in ML research, and this makes it perhaps easier to predict the failure modes of these things (though surprisingly many don't seem to), but also makes me want to be optimistic, to believe this can work, but I also have done a lot of work as a software engineer and I think my intuition remains that doing precision knowledge work of any kind at scale with a generative model remains A Very Suspect Idea that comes more from the dreams of the wealthy executive class than a real grounding in what generative models are capable of and how they're best employed.
I do remain optimistic that LLMs will continue to find use cases that better fit a niche of state-of-the-art natural language processing that is nonetheless probabilistic in nature. Many such use cases exist. Taking human job descriptions and trying to pretend they can do them entirely seems like a poorly-thought-out one, and we've to my mind poured enough money and effort into it that I think we can say it at the very least needs radically new breakthroughs to stand a chance of working as (optimistically) advertised
> It was pure, unadulterated slop. I was bewildered. Had I not reviewed every line of code before admitting it? Where did all this...gunk..come from?
I chuckled at this. This describes pretty much every large piece of software I've ever worked on. You don't need an LLM to create a giant piece of slop. To avoid it takes tons of planning, refinement, and diligence whether it's LLM's or humans writing it.
Google Maps completely and utterly obliterated my ability to navigate. I no longer actively navigated. I passively navigated.
This is no different. And I'm not talking about vibe coding. I just mean having an llm browser window open.
When you're losing your abilities, it's easy to think you're getting smarter. You feel pretty smart when you're pasting that code
But you'll know when you start asking "do me that thingy again". You'll know from your own prompts. You'll know when you look at older code you wrote with fear and awe. That "coding" has shifted from an activity like weaving cloth to one more like watching YouTube.
The author makes it sound like such a binary choice, but there's a happy middle where you are having AI generate large blocks of code and then you closely supervise it. My experience so far with AI is to treat it like you're a low-level manager delegating drudgework. I will regularly rewrite or reorganize parts of the code and give it back to the AI to reset the baseline and expectations.
AI is far from perfect, but the same is true about any work you may have to entrust to another person. Shipping slop because someone never checked the code was literally something that happened several times at startups I have worked at - no AI necessary!
Vibecoding is an interesting dynamic for a lot of coders specifically because you can be good or bad at vibecoding - but the skill to determine your success isn't necessarily your coding knowledge but your management and delegation soft skills.
Good luck finding an employer that lets you do this moving forward. The new reality is that no one can give the estimates they previously gave for tasks. \
"Amazingly, I’m faster, more accurate, more creative, more productive, and more efficient than AI, when you price everything in, and not just code tokens per hour."
I read that people just allow Claude Code free rein but after using it for a few months and seeing what it does I wonder how much of that is in front of users. CC is incredible as much as it is frustrating and a lot of what it churns out is utter rubbish.
I also keep seeing that writing more detailed specs is the answer and retorts from those saying we’re back to waterfall.
That isn’t true. I think more of the iteration has moved to the spec. Writing the code is so quick now so can make spec changes you wouldn’t dare before.
You also need gates like tests and you need very regular commits.
I’m gradually moving towards more detailed specs in the form of use cases and scenarios along with solid tests and a constantly tuned agent file + guidelines.
Through this I’m slowly moving back to letting Claude lose on implementation knowing I can do scan of the git diffs versus dealing with a thousand ask before edits and slowing things down.
I wish more critics would start to showcase examples of code slop. I'm not saying this because I defend the use of AI-coding, but rather because many junior devs. that read these types articles/blog posts may not know what slop is, or what it looks like. Simply put, you don't know what you don't know.
It took me about two weeks to realise this. I still use LLMs, but it's just a tool. Sometimes it's the right tool, but often it isn't. I don't use an SDS drill to smooth down a wall. I use sandpaper and do it by hand.
On the one hand, I created vibe coded a large-ish (100k LOC) C#, Python, Powershell project over the holidays. The whole thing was more than I could ever complete on my own in the 5 days it took to vibe code using three agents. I wrote countless markdown 'spec' files, etc.
The result stunned everyone I work with. I would never in a million years put this code on Github for others. It's terrible code for a myriad reasons.
My lived experience was... the task was accomplished but not in a sustainable way over the course of perhaps 80 individual sessions with the longest being multiple solid 45 minute refactors...(codex-max)
About those. One of things I spotted fairly quickly was the tendency of models to duplicate effort or take convoluted approaches to patch in behaviors. To get around this, I would every so often take the entire codebase, send it to Gemini-3 Pro and ask it for improvements. Comically, every time, Gemini-3-Pro responds with "well this code is hot garbage, you need to refactor these 20 things". Meanwhile, I'm side-eying like.. dude you wrote this. Never fails to amuse me.
So, in the end, the project was delivered, was pretty cool, had 5x more features than I would have implemented myself and once I got into a groove -- I was able to reduce the garbage through constant refactors from large code reviews. Net Positive experience on a project that had zero commercial value and zero risk to customers.
But on the other hand...
I spend a week troubleshooting a subtle resource leak (C#) on a commercial project that was introduced during a vibe-coding session where a new animation system was added and somehow added a bug that caused a hard crash on re-entering a planet scene.
The bug caused an all-stop and a week of lost effort. Countless AI Agent sessions circularly trying to review and resolve it. Countless human hours of testing and banging heads against monitors.
In the end, on the maybe random 10th pass using Gemini-3-Pro it provided a hint that was enough to find the issue.
This was a monumental fail and if game studios are using LLMs, good god, the future of buggy mess releases is only going to get worse.
I would summarize this experience as lots of amazement and new feature velocity. A little too loose with commits (too much entanglement to easily unwind later) and ultimately a negative experience.
A classic Agentic AI experience. 50% Amazing, 50% WTF.
Rants like this are
- entirely correct in describing frustration
- reasonable in their conclusions with respect to how and when to work with contemporary tools
- entirely incorrect in intuition about whether "writing by hand" is a viable path or career going forward
Like it not, as a friend observed, we are N months away a world where most engineers never looks at source code; and the spectrum of reasons one would want to will inexorably narrow.
It will never be zero.
But people who haven't yet typed a word of code never will.
The practice is older than the name, which is usually the way: first you start doing something frequently enough you need to name it, then you come up with the name.
This bubble is going to crash so much harder than any other bubble in history. It's almost impossible to overstate the level of hype. LLMs are functionally useless in any context. It's a total and absolute scam.
I vibe coded for a while (about a year) it was just so terrible for my ability to do anything, it started becoming recurring that I couldn't control my timelines because I would get into a loop where I would keep asking AI to "fix" things I didn't actually understand and had no mental capacity to actually read 50k lines of LLM generated code compared to if I had done it from scratch so I would keep and keep going.
Or how I would start spamming SQL scripts and randomly at some point nuke all my work (happened more than once)... luckily at least I had backups regularly but... yeah.
I'm sorry but no, LLMs can't replace software engineers.
Everything the OP says can be true, but there’s a tipping point where you learn to break through the cruft and generate good code at scale.
It requires refactoring at scale, but GenAI is fast so hitting the same code 25 times isn’t a dealbreaker.
Eventually the refactoring is targeted at smaller and smaller bits until the entire project is in excellent shape.
I’m still working on Sharpee, an interactive fiction authoring platform, but it’s fairly well-baked at this point and 99% coded by Claude and 100% managed by me.
Sharpee is a complex system and a lot of the inner-workings (stdlib) were like coats of paint. It didn’t shine until it was refactored at least a dozen times.
It has over a thousand unit tests, which I’ve read through and refactored by hand in some cases.
I'm a CS teacher, so this is where I see a huge danger right now and I'm explicit with my students about it: you HAVE to write the code. You CAN'T let the machines write the code. Yes, they can write the code: you are a student, the code isn't hard yet. But you HAVE to write the code.
This is the ultimate problem with AI in academia. We all inherently know that “no pain no gain” is true for physical tasks, but the same is true for learning. Struggling through the new concepts is essentially the point of it, not just the end result.
Of course this becomes a different thing outside of learning, where delivering results is more important in a workplace context. But even then you still need someone who does the high level thinking.
It's not a perfect analogy though because in this case it's more like automated driving - you should still learn to drive because the autodriver isn't perfect and you need to be ready to take the wheel, but that means deliberate, separate practice at learning to drive.
I think that's a bit of a myth. The Greeks and Romans had weightlifting and boxing gyms, but no forklifts. Many of the most renowned Romans in the original form of the Olympics and in Boxing were Roman Senators with the wealth and free time to lift weights and box and wrestle. One of the things that we know about the famous philosopher Plato was that Plato was essentially a nickname from wrestling (meaning "Broad") as a first career (somewhat like Dwayne "The Rock" Johnson, which adds a fun twist to reading Socratic Dialogs or thinking about relationships as "platonic").
Arguably the "meritocratic ideal" of the Gladiator arena was that even "blue collar" Romans could compete and maybe survive. But even the stories that survive of that, few did.
There may be a lesson in that myth, too, that the people that succeed in some sports often aren't the people doing physical labor because they must do physical labor (for a job), they are the ones intentionally practicing it in the ways to do well in sports.
They don’t go to the gym, they don’t have the energy; the job shapes you. More or less the same for the farmers in the family.
Perhaps this was less so in the industrial era because of poor nutrition (source: Bill Bryson, hopefully well researched). Hunter gatherer cultures that we still study today have tremendous fitness (Daniel Lieberman).
We may not have any evidence that they had forklifts but we also can't rule out the possibility entirely :)
Why do you think that? It's definitely true. You can observe it today if you want to visit a country where peasants are still common.
From Bret Devereaux's recent series on Greek hoplites:
> Now traditionally, the zeugitai were regarded as the ‘hoplite class’ and that is sometimes supposed to be the source of their name
> but what van Wees is working out is that although the zeugitai are supposed to be the core of the citizen polity (the thetes have limited political participation) there simply cannot be that many of them because the minimum farm necessary to produce 200 medimnoi of grain is going to be around 7.5 ha or roughly 18 acres which is – by peasant standards – an enormous farm, well into ‘rich peasant’ territory.
> Of course with such large farms there can’t be all that many zeugitai and indeed there don’t seem to have been. In van Wees’ model, the zeugitai-and-up classes never supply even half of the number of hoplites we see Athens deploy
> Instead, under most conditions the majority of hoplites are thetes, pulled from the wealthiest stratum of that class (van Wees figures these fellows probably have farms in the range of ~3 ha or so, so c. 7.5 acres). Those thetes make up the majority of hoplites on the field but do not enjoy the political privileges of the ‘hoplite class.’
> And pushing against the ‘polis-of-rentier-elites’ model, we often also find Greek sources remarking that these fellows, “wiry and sunburnt” (Plato Republic 556cd, trans. van Wees), make the best soldiers because they’re more physically fit and more inured to hardship – because unlike the wealthy hoplites they actually have to work.
( https://acoup.blog/2026/01/09/collections-hoplite-wars-part-... )
---
> Many of the most renowned Romans in the original form of the Olympics and in Boxing were Roman Senators
In the original form of the Olympics, a Roman senator would have been ineligible to compete, since the Olympics was open only to Greeks.
> The ability of skinny old ladies to carry huge loads is phenomenal. Studies have shown that an ant can carry one hundred times its own weight, but there is no known limit to the lifting power of the average tiny eighty-year-old Spanish peasant grandmother.
My favorite historic example of typical modern hypertrophy-specific training is the training of Milo of Croton [1]. By legend, his father gifted him with the calf and asked daily "what is your calf, how does it do? bring it here to look at him" which Milo did. As calf's weight grew, so did Milo's strength.
This is application of external resistance (calf) and progressive overload (growing calf) principles at work.
[1] https://en.wikipedia.org/wiki/Milo_of_Croton
Milo lived before Archimedes.
Alexander Zass (Iron Samson) also trained each day: https://en.wikipedia.org/wiki/Alexander_Zass
"He was taken as a prisoner of war four times, but managed to escape each time. As a prisoner, he pushed and pulled his cell bars as part of strength training, which was cited as an example of the effectiveness of isometrics. At least one of his escapes involved him 'breaking chains and bending bars'."
Rest days are overrated. ;)
Training volume of Bulgarian Method is not much bigger than that of regular training splits like Sheiko or something like that, if bigger at all. What is more frequent is the stimulation of muscles and nervous system paths and BM adapts to that - one does high percentage of one's current max, essentially, one is training with what is available to one's body at the time.
Also, ultra long distance runners regenerate cartilages: https://ryortho.com/2015/12/what-ultra-long-distance-runners...
Our bodies are amazing.
... it's a calf, dad, just like yesterday
Like many educational tests the outcome is not the point - doing the work to get there is. If you're asked to code fizz buzz it's not because the teacher needs you to solve fizz buzz for them, it's because you will learn things while you make it. Ai, copying stack overflow, using someone's code from last year, it all solves the problem while missing the purpose of the exercise. You're not learning - and presumably that is your goal.
People used to get strong because they had to survive. They stopped needing strength to survive, so it became optional.
So what does this mean about intelligence? Do we no longer need it to survive so it's optional? Yes/No informs on how much young and developing minds should be exposed to AI.
Now compare this to using the LLM with a grammar book and real world study mechanisms. This creates friction which actually causes your mind to learn. The LLM can serve as a tool to get specialized insight into the grammar book and accelerate physical processes (like generating all forms of a word for writing flashcards). At the end of day, you need to make an intelligent separation where the LLM ends and your learning begins.
I really like this contrast because it highlights the gap between using an LLM and actually learning. You may be able to use the LLM to pass college level courses in learning the language but unless you create friction, you actually won’t learn anything! There is definitely more nuance here but it’s food for thought
Here's the thing -- I don't care about "getting stronger." I want to make things, and now I can make bigger things WAY faster because I have a mech suit.
edit: and to stretch the analogy, I don't believe much is lost "intellectually" by my use of a mech suit, as long as I observe carefully. Me doing things by hand is probably overrated.
I’ve worked with plenty of self taught programmers over the years. Lots of smart people. But there’s always blind spots in how they approach problems. Many fixate on tools and approaches without really seeing how those tools fit into a wider ecosystem. Some just have no idea how to make software reliable.
I’m sure this stuff can be learned. But there is a certain kind of deep, slow understanding you just don’t get from watching back-to-back 15 minute YouTube videos on a topic.
But if they actually spent time trying to learn architecture and how to build stuff well, either by reading books or via good mentorship on the job, then they can often be better than the folks who went to school. Sometimes even they don't know how to make software reliable.
I'm firmly in the middle. Out of the 6 engineers I work with on a daily basis (including my CTO), only one of us has a degree in CS, and he's not the one in an architecture role.
I do agree that learning how to think and learn is its own valuable skill set, and many folks learn how to do that in different ways.
I've worked with PhDs on projects (I'm self-taught), and those guys absolutely have blind spots in how they approach problems, plenty of them. Everyone does. What we produce together is better because our blind spots don't typically overlap. I know their weaknesses, and they know mine. I've also worked with college grads that overthink everything to the point they made an over-abstracted mess. YMMV.
>you just don’t get from watching back-to-back 15 minute YouTube videos on a topic.
This is not "self taught". I mean maybe it's one kind of modern-ish concept of "self taught" in an internet comment forum, but it really isn't. I watch a ton of sailing videos all day long, but I've never been on a sailboat, nor do I think I know how to sail. Everyone competent has to pay their dues and learn hard lessons the hard way before they get good at anything, even the PhDs.
1. contacts - these come in the form of peers who are interested in the same things and in the form of experts in their fields of study. Talking to these people and developing relationships will help you learn faster, and teach you how to have professional collegial relationships. These people can open doors for you long after graduation.
2. facilities - ever want to play with an electron microscope or work with dangerous chemicals safely? Different schools have different facilities available for students in different fields. If you want to study nuclear physics, you might want to go to a school with a research reactor; it's not a good idea to build your own.
And I'd argue for:
3. Realisation of the scope of computing.
IE Computers are not just phones/laptop/desktop/server with networking - all hail the wonders of the web... There are embedded devices, robots, supercomputers. (Recent articles on HN describe the computing power in a disposable vape!)
There are issues at all levels with all of these with algorithms, design, fabrication, security, energy, societal influence, etc etc - what tradeoffs to make where. (Why is there computing power in a disposable vape?!?)
I went in thinking I knew 20% and I would learn the other 80% of IT. I came out knowing 5 times as much but realising I knew a much smaller percentage of IT... It was both enabling and humbling.
If you weren't even "clever enough" to write the program yourself (or, more precisely, if you never cultivated a sufficiently deep knowledge of the tools & domain you were working with), how do you expect to fix it when things go wrong? Chatbots can do a lot, but they're ultimately just bots, and they get stuck & give up in ways that professionals cannot afford to. You do still need to develop domain knowledge and "get stronger" to keep pace with your product.
Big codebases decay and become difficult to work with very easily. In the hands-off vibe-coded projects I've seen, that rate of decay was extremely accelerated. I think it will prove easy for people to get over their skis with coding agents in the long run.
That's kinda how I see vibe coding. It's extremely easy to get stuff done but also extremely easy to write slop. Except now 10x more code is being generated thus 10x more slop.
Learning how to get quality robust code is part of the learning curve of AI. It really is an emergent field, changing every day.
There are other fictional variants: the giant mech with the enormous support team, or Heinlein's "mobile infantry." And virtually every variantion on the Heinlein trope has a scene of drop commandos doing extensive pre-drop checks on their armor.
The actual reality is it isn't too had for a competent engineer to pair with Claude Code, if they're willing to read the diffs. But if you try to increase the ratio of agents to humans, dealing with their current limitations quickly starts to feel like you need to be Tony Stark.
Thinking through the issue, instead of having the solve presented to you, is the part where you exercise your mental muscles. A good parallel is martial arts.
You can watch it all you want, but you'll never be skilled unless you actually do it.
In true HN fashion of trading analogies, it’s like starting out full powered in a game and then having it all taken away after the tutorial. You get full powered again at the end but not after being challenged along the way.
This makes the mech suit attractive to newcomers and non-programmers, but only because they see product in massively simplified terms. Because they don’t know what they don’t know.
Or "An [electric] bicycle for the mind." Steve Jobs/simonw
You need to be strong to do so. Things of any quality or value at least.
There is more than one kind of leverage at play here.
That's the job of the backhoe.
(this is a joke about how diggers have caused quite a lot of local internet outages by hitting cables, sometimes supposedly "redundant" cables that were routed in the same conduit. Hitting power infrastructure is rare but does happen)
Regardless of whose fault it was, the end result was the bucket snagged the power lines going into the datacentre and caused an outage.
Unless you happen to drive a forklift in a power plant.
> expose millions to fraud and theft
You can if you drive forklift in a bank.
> put innocent people in prison
You can use forklift to put several innocent people in prison with one trip, they have pretty high capacity.
> jeopardize the institutions of government.
It's pretty easy with a forklift, just try driving through main gate.
> There is more than one kind of leverage at play here.
Forklifts typically have several axes of travel.
The activity would train something, but it sure wouldn't be your ability to lift.
There are enthusiasts who will spend an absolute fortune to get a bike that is few grams lighter and then use it to ride up hills for the exercise.
Presumably a much cheaper bike would mean you could use a smaller hill for the same effect.
If you practice judo you're definitely exercising but the goal is defeating your opponent. When biking or running you're definitely exercising but the goal is going faster or further.
From an an exercise optimization perspective you should be sitting on a spinner with a customized profile, or maybe do some entirely different motion.
If sitting on a carbon fiber bike, shaving off half a second off your multi-hour time, is what brings you joy and motivation then I say screw it to further justification. You do you. Just be mindful of others, as the path you ride isn't your property.
OK but then why even use Python, or C, or anything but Assembly? Isn't AI just another layer of value-add?
[0] https://eazypilot.com/blog/automation-dependency-blessing-or...
I think forklifts probably carry more weight over longer distances than people do (though I could be wrong, 8 billion humans carrying small weights might add up).
Certainly forklifts have more weight * distance when you restrict to objects that are over 100 pounds, and that seems like a good decision.
So the idea is that you should learn to do things by hand first, and then use the powerful tools once you're knowledgeable enough to know when they make sense. If you start out with the powerful tools, then you'll never learn enough to take over when they fail.
Indeed, usually after doing weightlifting, you return the weights to the place where you originally took them from, so I suppose that means you did no work at in the first place..
There has to be a base of knowledge available before the student can even comprehend many/most open research questions, let alone begin to solve them. And if they were understandable to a beginner, then I’d posit the LLM models available today would also be capable of doing meaningful work.
We have a decent sized piece of land and raise some animals. People think we're crazy for not having a tractor, but at the end of the day I would rather do it the hard way and stay in shape while also keeping a bit of a cap on how much I can change or tear up around here.
https://www.youtube.com/watch?v=Be7WBGMo3Iw
Unfortunately, many sdevs don't understand it.
I wouldn't want to write raw bytes like Mel did though. Eventually some things are not worth getting good at.
When I stand still for hours at a time, I end up with aching knees, even though I'd have no problem walking for that same amount of time. Do you experience anything like that?
Forklift operators don't lift things in their training. Even CS students start with pretty high level of abstraction, very few start from x86 asm instructions.
We need to make them implement ALU's on logical gates and wires if we want them to lift heavy things.
Though I also wonder what advanced CS classes should look like. If they agent can code nearly anything, what project would challenge student+agent and teach the student how to accomplish CS fundamentals with modern tools.
As an added bonus, being able to discuss your code with another engineer that wasn't involved in writing it is an important skill that might not otherwise be trained in college.
It was a bizarre disconnect having someone be both highly educated and yet crippled by not doing.
The students had memorized everything, but understood nothing. Add in access to generative AI, and you have the situation that you had with your interview.
It's a good reminder that what we really do, as programmers or software engineers or what you wanna call it, is understanding how computers and computations work.
It's probably also worth reading Feynman's Cargo Cult Science: https://sites.cs.ucsb.edu/~ravenben/cargocult.html
Star Trek or Idiocracy.
Just checking I have that right... is that what you meant?
I think that's what you were implying but it's just want to check I have that right? if so
... that ... is .... wow ...
A good analogy here is programming in assembler. Manually crafting programs at the machine code level was very common when I got my first computer in the 1980s. Especially for games. By the late 90s that had mostly disappeared. Games like Roller Coaster Tycoon were one of the last ones with huge commercial success that were coded like that. C/C++ took over and these days most game studios license an engine and then do a lot of work with languages like C# or LUA.
I never did any meaningful amount of assembler programming. It was mostly no longer a relevant skill by the time I studied computer science (94-99). I built an interpreter for an imaginary CPU at some point using a functional programming language in my second year. Our compiler course was taught by people like Eric Meyer (later worked on things like F# at MS) who just saw that as a great excuse to teach people functional programming instead. In hindsight, that was actually a good skill to have as functional programming interest heated up a lot about 10 years later.
The point of this analogy: compilers are important tools. It's more important to understand how they work than it is to be able to build one in assembler. You'll probably never do that. Most people never work on compilers. Nor do they build their own operating systems, databases, etc. But it helps to understand how they work. The point of teaching how compilers work is understanding how programming languages are created and what their limitations are.
People learn by doing. There's a reason that "do the textbook problems" is somewhat of a meme in the math and science fields - because that's the way that you learn those things.
I've met someone who said that when he get a textbook, he starts by only doing the problems, and skipping the chapter content entirely. Only when he has significant trouble with the problems (i.e. he's stuck on a single one for several hours) does he read the chapter text.
He's one of the smartest people I know.
This is because you learn by doing the problems. In the software field, that means coding.
Telling yourself that you could code up a solution is very different than actually being able to write the code.
And writing the code is how you build fluency and understanding as to how computers actually work.
> I never did any meaningful amount of assembler programming. It was mostly no longer a relevant skill by the time I studied computer science (94-99). I built an interpreter for an imaginary CPU at some point using a functional programming language in my second year.
Same thing for assembly. Note that you built an interpreter for an imaginary CPU - not a real one, as that would have been a much harder challenge given that you didn't do any meaningful amount of assembly program and didn't understand low-level computer hardware very well.
Obviously, this isn't to say that information about how a system works can't be learned without practice - just that that's substantially harder and takes much more time (probably 3-10x), and I can guarantee you that those doing vibecoding are not putting in that extra time.
The brave new world is that you no longer have to do “coding” in our sense of the word. The doing, and what exercises you should learn with have both changed.
Now students should build whole systems, not worry about simple Boolean logic and program flow. The last programmer to ever need to write an if statement may already be in studies.
Notice how I also talked about coding being a way that you learn how computers work.
If you don't code, you have a very hard time understanding how computers work.
And while there's some evidence that programmers may not need write all of their code by hand, there's zero evidence that either they don't need to learn how to code at all (as you're claiming), or that they don't need to even know how computers work (which is a step further).
There's tons of anecdotes from senior software engineers on Hacker News (and elsewhere) about coding agents writing bad code that they need to debug and fix by hand. I've literally never seen a single story about how a coding agent built a nontrivial program by itself without the prompter looking at the code.
I don't know that it's all these things at once, but most people I know that are good have done a bunch of spikes / side projects that go a level lower than they have to. Intense curiosity is good, and to the point your making, most people don't really learn this stuff just by reading or doing flash cards. If you want to really learn how a compiler works, you probably do have to write a compiler. Not a full-on production ready compiler, but hands on keyboard typing and interacting with and troubleshooting code.
Or maybe to put another way, it's probably the "easiest" way, even though it's the "hardest" way. Or maybe it's the only way. Everything I know how to do well, I know how to do well from practice and repitition.
Indeed, a lot of us looked with suspicion and disdain at people that used those primitive compilers that generated awful, slow code. I once spent ages hand-optimizing a component that had been written in C, and took great pleasure in the fact I could delete about every other line of disassembly...
When I wrote my first compiler a couple of years later, it was in assembler at first, and supported inline assembler so I could gradually convert to bootstrap it that way.
Because I couldn't imagine writing it in C, given the awful code the C compilers I had available generated (and how slow they were)...
These days most programmers don't know assembler, and increasingly don't know languaes as low level as C either.
And the world didn't fall apart.
People will complain that it is necessary for them to know the languages that will slowly be eaten away by LLMs, just like my generation argued it was absolutely necessary to know assembler if you wanted to be able to develop anything of substance.
I agree with you people should understand how things work, though, even if they don't know it well enough to build it from scratch.
Maybe the world didn't fall apart, but user interactions on a desktop pc feel slower than ever. So perhaps they should.
Software got significantly worse in that time period, though
Even while vibe-coding, I often found myself getting annoyed just having to explain things. The amount of patience I have for anything that doesn't "just work" the first time has drifted toward zero. If I can't get AI to do the right thing after three tries, "welp, I guess this project isn't getting finished!"
It's not just laziness, it's like AI eats away at your pride of ownership. You start a project all hyped about making it great, but after a few cycles of AI doing the work, it's easy to get sucked into, "whatever, just make it work". Or better yet, "pretend to make it work, so I can go do something else."
The progression from basic arithmetic, to complex ratios and basic algebra, graphing, geometry, trig, calculus, linear algebra, differential equations… all along the way, there are calculators that can help students (wolfram alpha basically). When they get to theory, proofs, etc… historically, thats where the calculator ended, but now there’s LLMs… it feels like the levels of abstractions without a “calculator” are running out.
The compiler was the “calculator” abstraction of programming, and it seems like the high-level languages now have LLMs to convert NLP to code as a sort of compiler. Especially with the explicitly stated goal of LLM companies to create the “software singularity”, I’d be interested to hear the rationale for abstractions in CS which will remain off limits to LLMs.
I've hired and trained tons of junior devs out of university. They become 20x productive after a year of experience. I think vibe coding is getting new devs to 5x productivity, which seems amazing, but then they get stuck there because they're not learning. So after year one, they're a 5x developer, not a 20x developer like they should be.
I have some young friends who are 1-3 years into software careers I'm surprised by how little they know.
The idea was to develop a feel for cutting metal, and to better understand what the machine tools were doing.
--
My wood shop teacher taught me how to use a hand plane. I could shave off wood with it that was so thin it was transparent. I could then join two boards together with a barely perceptible crack between them. The jointer couldn't do it that well.
This kind of workload was a shock to me. It more than a year to adapt to it.
Recently in comments people were claiming that working with LLMs has sharpened their ability to organize thoughts, and that could be a real effect that would be interesting to study. It could be that watching an LLM organize a topic could provide a useful example of how to approach organizing your own thoughts.
But until you do it unassisted you haven’t learned how to do it.
https://www.slater.dev/2025/08/llms-are-not-bicycles-for-the...
> grug once again catch grug slowly reaching for club, but grug stay calm
With how rapidly the world has been changing lately, it has become difficult to estimate which of those more specific skills will remain relevant for how long.
And plenty of people will still come along who love to code despite AI's excelling at it. In fact, calling out the AI on bad design or errors seems to be the new "code golf".
I do Windows development and GDI stuff still confuses me. I'm talking about memory DC, compatible DC, DIB, DDB, DIBSECTION, bitblt, setdibits, etc... AIs also suck at this stuff. I'll ask for help with a relatively straightforward task and it almost always produces code that when you ask it to defend the choices it made, it finds problems, apologizes, and goes in circles. One AI (I forget which) actually told me I should refer to Petzold's Windows Programming book because it was unable to help me further.
Without the clarity that comes from thinking with code, a programmer using AI is the blind leading the blind.
The social aspect of a dialogue is relaxing, but very little improvement is happening. It's like a study group where one (relatively) incompetent student tries to advise another, and then test day comes and they're outperformed by the weirdo that worked alone.
I actually fear more for the middle-of-career dev who has shunned AI as worthless. It's easier than ever for juniors to learn and be productive.
It's the difference between the employee who copy-pastes all of their email bodies from ChatGPT versus the one who writes a full draft themselves and then asks an LLM for constructive feedback. One develops skills while the other atrophies.
Though also in the 90's the standard library was new and often had bugs
I learned more about programming in a weekend badly copying hack modules for Minecraft than I learned in 5+ years in university.
All that stuff I did by hand back then I haven't used it a single time after.
You write sorting algorithms in college to understand how they work. Understand why they are faster because it teaches you a mental model for data traversal strategies. In the real world, you will use pre-written versions of those algorithms in any language but you understand them enough to know what to select in a given situation based on the type of data. This especially comes into play when creating indexes for databases.
What I take the OPs statement to mean are around "meta" items revolved more around learning abstractions. You write certain patterns by hand enough times, you will see the overlap and opportunity to refactor or create an abstraction that can be used more effectively in your codebase.
If you vibe code all of that stuff, you don't feel the repetition as much. You don't work through the abstractions and object relationships yourself to see the opportunity to understand why and how it could be improved.
I only had to do this leg work during university to prove that I can be allowed to try and write code for a living. The grounding as you call it is not required for that at all,since im a dozen levels of abstraction removed from it. It might be useful if I was a researcher or would work on optimizing complex cutting edge stuff, but 99% of what I do is CRUD apps and REST Apis. That stuff can safely be done by anyone, no need for a degree. Tbf I'm from Germany so in other places they might allow you to do this job without a degree
If AI is used by the student to get the task done as fast as possible the student will miss out on all the learning (too easy).
If no AI is used at all, students can get stuck for long periods of time on either due to mismatches between instructional design and the specific learning context (missing prereq) or by mistakes in instructional design.
AI has the potential to keep all learners within an ideal difficulty for optimal rate of learning so that students learn faster. We just shouldn't be using AI tools for productivity in the learning context, and we need more AI tools designed for optimizing learning ramps.
People said this about compilers. It depends what layer you care to learn/focus on. AI at least gives us the option to move up another level.
Edit: I expect it wouldn't be super hard to create though, you'd just have to hook into the editor's change event, probably compute the diff to make sure you don't lose anything, and then append it to the end of the json.
It does seem like they’re going the wrong way, repelling tech to keep things easy instead of embracing new tech by updating their teaching methods.
But I also think we’ve collectively fallen flat in figuring out what those methods are.
The one requirement I think is dumb though is we're not allowed to use the language's documentation for the final project, which makes no sense. Especially since my python is rusty.
Since you mentioned failure to figure out what better teaching methods are, I feel it's my sworn duty to put a plug for https://dynamicland.org and https://folk.computer, if you haven't heard about them :)
Making students fix LLM-generated code until they're at their wits' end is a fun idea. Though it likely carries too high of an opportunity cost education-wise.
Your curriculum may be different than it is around here, but here it's frankly the same stuff I was taught 30 years ago. Except most of the actual computer science parts are gone, replaced with even more OOP, design pattern bullshit.
That being said. I have no idea how you'd actually go about teaching students CS these days, considering a lot of them will probably use ChatGPT or Claude regardless of what you do. That is what I see in the statistic for grades around here. For the first 9 years I was a well calibrated grader, but these past 1,5ish years it's usually either top marks or bottom marks with nothing in between. Which puts me outside where I should be, but it matches the statistical calibration for everyone here. I obviously only see the product of CS educations, but even though I'm old, I can imagine how many corners I would have cut myself if I had LLM's available back then. Not to mention all the distractions the internet has brought.
In my experience, people who talk about business value expect people to code like they work at the assembly line. Churn out features, no disturbances, no worrying about code quality, abstractions, bla bla.
To me, your comment reads contradictory. You want initiative, and you also don't want initiative. I presume you want it when it's good and don't want it when it's bad, and if possible the people should be clairvoyant and see the future so they can tell which is which.
What I read from GP is that they’re looking for engineering innovation, not new science. I don’t see it as contradictory at all.
The word you’re looking for is skill. He wants devs to be skilled. I wouldn’t thought that to be controversial but hn never ceases to amaze
That includes understanding risk management and knowing what the risks and costs are of failures vs. the costs of delivering higher quality.
Engineering is about making the right tradeoffs given the constraints set, not about building the best possible product separate from the constraints.
Sometimes those constraints requires extreme quality, because it includes things like "this should never, ever fail", but most of the time it does not.
If it's firmware for a solar inverter in Poland, then quality matters.
That's typical misconception that "I'm an artist, let me rewrite in Rust" people often have. Code quality has a direct money equivalent, you just need to be able to justify it for people that pay you salary.
My son is in a CS school in France. They have finals with pen and paper, with no computer whatsoever during the exam; if they can't do that they fail. And these aren't multiple choice questions, but actual code that they have to write.
This was 30 years ago, though - no idea what it is like now. It didn't feel very meaningful even then.
But there's a vast chasm between that and letting people use AI in an exam setting. Some middle ground would be nice.
I wrote assembler on pages of paper. Then I used tables, and a calculator for the two's-complement relative negative jumps, to manually translate it into hex code. Then I had software to type in such hex dumps and save them to audio cassette, from which I could then load them for execution.
I did not have an assembler for my computer. I had a disassembler though- manually typed it in from a computer magazine hex dump, and saved it on an audio cassette. With the disassembler I could check if I had translated everything correctly into hex, including the relative jumps.
The planning required to write programs on sheets of paper was very helpful. I felt I got a lot dumber once I had a PC and actual programmer software (e.g. Borland C++). I found I was sitting in front of an empty code file without a plan more often than not, and wrote code moment to moment, immediately compiling and test running.
The AI coding may actually not be so bad if it encourages people to start with high-level planning instead of jumping into the IDE right away.
The only way to learn when abstractions are needed is to write code, hit a dead end, then try and abstract it. Over and over. With time, you will be able to start seeing these before you write code.
AI does not do abstractions well. From my experience, it completely fails to abstract anything unless you tell it to. Even when similar abstractions are already present. If you never learn when an abstraction is needed, how can you guide an AI to do the same well?
> Hell, I'd even like developers who will know when the code quality doesn't matter because shitty code will cost $2 a year but every hour they spend on it is $100-200.
> Except most of the actual computer science parts are gone, replaced with even more OOP, design pattern bullshit.
Maybe you should consider a different career, you sound pretty burnt out. There are terrible takes, especially for someone who is supposed to be fostering the next generation of developers.
In the US education has been bastardized into "job training"
Good workers don't really need to think in this paradigm.
> you give it a simple task. You’re impressed. So you give it a large task. You’re even more impressed.
That has _never_ been the story for me. I've tried, and I've got some good pointers and hints where to go and what to try, a result of LLM's extensive if shallow reading, but in the sense of concrete problem solving or code/script writing, I'm _always_ disappointed. I've never gotten satisfactory code/script result from them without a tremendous amount of pushback, "do this part again with ...", do that, don't do that.
Maybe I'm just a crank with too many preferences. But I hardly believe so. The minimum requirement should be for the code to work. It often doesn't. Feedback helps, right. But if you've got a problem where a simple, contained feedback loop isn't that easy to build, the only source of feedback is yourself. And that's when you are exposed to the stupidity of current AI models.
> There should be a TaskManager that stores Task objects in a sorted set, with the deadline as the sort key. There should be methods to add a task and pop the current top task. The TaskManager owns the memory when the Task is in the sorted set, and the caller to pop should own it after it is popped. To enforce this, the caller to pop must pass in an allocator and will receive a copy of the Task. The Task will be freed from the sorted set after the pop.
> The payload of the Task should be an object carrying a pointer to a context and a pointer to a function that takes this context as an argument.
> Update the tests and make sure they pass before completing. The test scenarios should relate to the use-case domain of this project, which is home automation (see the readme and nearby tests).
To me this reads like people have learned to put up with poor abstractions for so long that having the LLM take care of it feels like an improvement? It's the classic C++ vs Lisp discussion all over again, but people forgot the old lessons.
It's not that hard, but it's not that easy. If it was easy, everyone would be doing it. I'm a journalist who learned to code because it helped me do some stories that I wouldn't have done otherwise.
But I don't like to type out the code. It's just no fun to me to deal with what seem to me arbitrary syntax choices made by someone decades ago, or to learn new jargon for each language/tool (even though other languages/tools already have jargon for the exact same thing), or to wade through someone's undocumented code to understand how to use an imported function. If I had a choice, I'd rather learn a new human language than a programming one.
I think people like me, who (used to) code out of necessity but don't get much gratification out of it, are one of the primary targets of vibe coding.
I still write my code in all the places I care about, but I don’t get stuck on “looking up how to enable websockets when creating the listener before I even pass anything to hyper.”
I do not care to spend hours or days to know that API detail from personal pain, because it is hyper-specific, in both senses of hyper-specific.
(For posterity, it’s `with_upgrades`… thanks chatgpt circa 12 months ago!)
You don't even have to be as organised as in the example, LLMs are pretty good at making something out of ramblings.
And then you just rm -rf and repeat until something half works.
I actually don't like _writing_ code, but enjoy reading it. So sessions with LLM are very entertaining, especially when I want to push boundaries (I am not liking this, the code seems a little bit bloated. I am sure you could simplify X and Y. Also think of any alternative way that you reckon will be more performant that maybe I don't know about). Etc.
This doesn't save me time, but makes work so much more enjoyable.
I think this is one of the divides between people who like AI and people who don't. I don't mind writing code per se, but I really don't like text editing — and I've used Vim (Evil mode) and then Emacs (vanilla keybindings) for years, so it's not like I'm using bad tools; it's just too fiddly. I don't like moving text around; munging control structures from one shape to another; I don't like the busy work of copying and pasting code that isn't worth DRYing, or isn't capable of being DRY'd effectively; I hate going around and fixing all the little compiler and linter errors produced by a refactor manually; and I really hate the process of filling out the skeleton of an type/class/whatever architecture in a new file before getting to the meat.
However, reading code is pretty easy for me, and I'm very good at quickly putting algorithms and architectures I have in my head into words — and, to be honest, I often find this clarifies the high level idea more than writing the code for it, because I don't get lost in the forest — and I also really enjoy taking something that isn't quite good enough, that's maybe 80% of the way there, and doing the careful polishing and refactoring necessary to get it to 100%.
> I think this is one of the divides between people who like AI and people who don't. I don't mind writing code per se, but I really don't like text editing — and I've used Vim (Evil mode) and then Emacs (vanilla keybindings) for years, so it's not like I'm using bad tools; it's just too fiddly.
I feel the same way (to at least some extent) about every language I've used other than Lisp. Lisp + Paredit in Emacs is the most pleasant code-wrangling experience I've ever had, because rather having to think in terms of characters or words, I'm able to think in terms of expressions. This is possible with other languages thanks to technologies like Tree-sitter, but I've found that it's only possible to do reliably in Lisp. When I do it in any other language I don't have an unshakable confidence that the wrangling commands will do exactly what I intend.
Vehement agreeing below:
S-expressions are a massive boon for text editing, because they allow such incredible structural transformations and motions. The problem is that, personally, I don't actually find Lisp to be the best tool for the job for any of the things I want to do. While I find Common Lisp and to a lesser degree Scheme to be fascinating languages, the state of the library ecosystem, documentation, toolchain, and IDEs around them just aren't satisfactory to me, and they don't seem really well adapted to the things I want to do. And yeah, I could spend my time optimizing Common Lisp with `declare`s and doing C-FFI with it, massaging it to do what I want, that's not what I want to spend my time doing. I want to actually finish writing tools that are useful to me.
Moreover, while I used to have hope for tree-sitter to provide a similar level of structural editing for other languages, at least in most editors I've just not found that to be the case. There seem really to be two ways to use tree-sitter to add structural editing to languages: one, to write custom queries for every language, in order to get Vim style syntax objects, and two, to try to directly move/select/manipulate all nodes in the concrete syntax tree as if they're the same, essentially trying to treat tree-sitter's CSTs like S-expressions.
The problem with the first approach is that you end up with really limited, often buggy or incomplete, language support, and structural editing that requires a lot more cognitive overhead: instead of navigating a tree fluidly, you're having to "think before you act," deciding ahead of time what the specific name, in this language, is for the part of the tree you want to manipulate. Additionally, this approach makes it much more difficult to do more high level, interesting transformations; even simple ones like slurp and barf become a bit problematic when you're dealing with such a typed tree, and more advanced ones like convolute? Forget about it.
The problem with the second approach is that, if you're trying to do generalized tree navigation, where you're not up-front naming the specific thing you're talking about, but instead navigating the concrete syntax tree as if it's S-expressions, you run into the problem the author of Combobulate and Mastering Emacs talks about[1]: CSTs are actually really different from S-expressions in practice, because they don't map uniquely onto source code text; instead, they're something overlaid on top of the source code text, which is not one to one with it (in terms of CST nodes to text token), but many to one, because the CST is very granular. Which means that there's a lot of ambiguity in trying to understand where the user is in the tree, where they think they are, and where they intend to go.
There's also the fact that tree-sitter CSTs contain a lot of unnamed nodes (what I call "stop tokens"), where the delimiters for a node of a tree and its children are themselves children of that node, siblings with the actual siblings. And to add insult to injury, most language syntaces just... don't really lend themselves to tree navigation and transformation very well.
I actually tried to bring structural editing to a level equivalent to the S-exp commands in Emacs recently[2], but ran into all of the above problems. I recently moved to Zed, and while its implementation of structural editing and movement is better than mine, and pretty close to 1:1 with the commands available in Emacs (especially if they accept my PR[3]), and also takes the second, language-agnostic, route, it's still not as intuitive and reliable as I'd like.
[1]: https://www.masteringemacs.org/article/combobulate-intuitive...
[2]: https://github.com/alexispurslane/treesit-sexp
[3]: https://github.com/zed-industries/zed/pull/47571
When I code, I mostly go by two perspectives: The software as a process and the code as a communication medium.
With the software as a process, I'm mostly thinking about the semantics of each expressions. Either there's a final output (transient, but important) or there's a mutation to some state. So the code I'm writing is for making either one possible and the process is very pleasing, like building a lego. The symbols are the bricks and other items which I'm using to create things that does what I want.
With the code as communication, I mostly take the above and make it readable. Like organizing files, renaming variables and functions, modularising pieces of code. The intent is for other people (including future me) to be able to understand and modify what I created in the easiest way possible.
So the first is me communicating with the machine, the second is me communicating with the humans. The first is very easy, you only need to know the semantics of the building blocks of the machine. The second is where the craft comes in.
Emacs (also Vim) makes both easy. Code has a very rigid structure and both have tools that let you manipulate these structure either for adding new actions or refine the shape for understanding.
With AI, it feels like painting with a brick. Or transmitting critical information through a telephone game. Control and Intent are lost.
You can’t deny the fact that someone like Ryan dhal creator of nodejs declared that he no longer writes code is objectively contrary to your own experience. Something is different.
I think you and other deniers try one prompt and then they see the issues and stop.
Programming with AI is like tutoring a child. You teach the child, tell it where it made mistakes and you keep iterating and monitoring the child until it makes what you want. The first output is almost always not what you want. It is the feedback loop between you and the AI that cohesively creates something better than each individual aspect of the human-AI partnership.
The only thing I would change about what you said is, I don’t see it as a child that needs tutoring. It feels like I’m outsourcing development to an offshore consultancy where we have no common understanding, except the literal meaning of words. I find that there are very, very many problems that are suited well enough to this arrangement.
Software Developers have long been completely disconnected from the consequences of their work, and tech companies have diluted responsibility so much that working software doesn't matter anymore. This field is now mostly scams and bullshit, where developers are closer to finance bros than real, actual Engineers.
I'm not talking about what someone os building in their home for personal reasons for their own usage, but about giving the same thing to other people.
In the end it's just cost cutting.
I care about making stuff. "Making stuff" means stuff that I can use. I care about code quality yes, but not to an obsessive degree of "I hate my framework's ORM because of <obscure reason nobody cares about>". So, vibe coding is great, because I know enough to guide the agent away from issues or describe how I want the code to look or be changed.
This gets me to my desired effect of "making stuff" much faster, which is why I like it.
In real engineering disciplines, the Engineer is accountable for their work. If a bridge you signed off collapses, you're accountable and if it turns out you were negligent you'll face jail time. In Software, that might be a program in a car.
The Engineering mindset embodies these principles regardless of regulatory constraints. The Engineer needs to keep in mind those who'll be using their constructions. With Agentic Vibecoding, I can never get confident that the resulting software will behave according to specs. I'm worried that it'll scewover the user, the client, and all stakeholders. I can't accept half-assed work just because it saved me 2 days of typing.
I don't make stuff just for the sake of making stuff otherwise it would just be a hobby, and in my hobbies I don't need to care about anything, but I can't in good conscience push shit and slop down other people's throats.
Who are you people who spend so much time writing code that this is a significant productivity boost?
I'm imagining doing this with an actual child and how long it would take for me to get a real return on investment at my job. Nevermind that the limited amount of time I get to spend writing code is probably the highlight of my job and I'd be effectively replacing that with more code reviews.
And maybe child is too simplistic of an analogy. It's more like working with a savant.
The type of thing you can tell AI to do is like this: You tell it to code a website... it does it, but you don't like the pattern.
Say, "use functional programming", "use camel-case" don't use this pattern, don't use that. And then it does it. You can leave it in the agent file and those instructions become burned into it forever.
That's all to say the learning curve with LLMs is how to say things a specific way to reliability get an outcome.
I recently inherited an over decade old web project full of EOL'd libraries and OS packages that desperately needed to be modernized.
Within 3 hours I had a working test suite with 80% code coverage on core business functionality (~300 tests). Now - maybe the tests aren't the best designs given there is no way I could review that many tests in 3 hours, but I know empirically that they cover a majority of the code of the core logic. We can now incrementally upgrade the project and have at least some kind of basic check along the way.
There's no way I could have pieced together as large of a working test suite using tech of that era in even double that time.
For God's sake that's completely slop.
If you haven't reviewed and signed off then you have to assume that the stuff is garbage.
This is the crux of using AI to create anything and it has been a core rule of development for many years that you don't use wizards unless you understand what they are doing.
There is obvious division of ideas here. But calling one side stupid or referring to them as charlatans is outright wrong and biased.
There is a reason why they struggle selling them and executives are force feeding them to their workers.
Charlatan is the perfect term for those that stand to make money selling half baked goods and forcing more mass misery upon society.
I think uncritical AI enthusiasts are just essentially making the bet that the rising mountains of tech debt they are leaving in their wake can be paid off later on with yet more AI. And you know, that might even work out. Until such a time, though, and as things currently stand, I struggle to understand how one can view raw LLM code and find it acceptable by any professional standard.
The same coworker asked to update a service to Spring Boot 4. She made a blog post about. She used LLM for it. So far every point which I read was a lie, and her workarounds make, for example tests, unnecessarily less readable.
So yeah, “it works”, until it doesn’t, and when it hits you, that you need to work more in sum at the end, because there are more obscure bugs, and fixing those are more difficult because of terrible readability.
There are many ways to skin a cat, and in programming the happens-in-a-digital-space aspect removes seemingly all boundaries, leading to fractal ways to "skin a cat".
A lot of programmers have hard heads and know the right way to do something. These are the same guys who criticized every other senior dev as being a bad/weak coder long before LLMs were around.
Your own profile says you are a PM whose software skills amount to "Script kiddie at best but love hacking things together."
It seems like the "separate worlds" you are describing is the impression of reviewing the code base from a seasoned engineer vs an amateur. It shouldn't be even a little surprising that your impression of the result is that the code is much better looking than the impression of a more experienced developer.
At least in my experience, learning to quickly read a code base is one of the later skills a software engineer develops. Generally only very experienced engineers can dive into an open source code base to answer questions about how the library works and is used (typically, most engineers need documentation to aid them in this process).
I mean, I've dabbled in home plumbing quite a bit, but if AI instructed me to repair my pipes and I thought it "looked great!" but an experienced plumber's response was "ugh, this doesn't look good to me, lots of issues here" I wouldn't argue there are "two separate worlds".
This really is it: AI produces bad to mediocre code. To someone who produces terrible code mediocre is an upgrade, but to someone who produces good to excellent code, mediocre is a downgrade.
Still bad code though
And by bad I'm not making a stylistic judgement. I mean it'll be hell to work with, easy to add bugs, and slow to change
Rather, to me it looks like all we're getting with additional time is marginal returns. What'll it be in 1 year? Marginally better than today, just like today is marginally better compared to a year ago. The exponential gains in performance are already over. What we're looking at now is exponentially more work for linear gains in performance.
The problem is the 0.05X developers thought they were 0.5X and now they think they're 20X.
Plenty of respect to the craft of code but the AI of today is the worst is is ever going to be.
That's all before you even get to all of the other quirks with LLMs.
Getting code to do exactly what, based on using and prompting Opus in what way?
Of course it works well for some things.
Because of Beads I can have Claude do a code review for serious bugs and issues and sure enough it finds some interesting things I overlooked.
I have also seen my peers in the reverse engineering field make breakthroughs emulating runtimes that have no or limited existing runtimes, all from the ground up mind you.
I think the key is thinking of yourself as an architect / mentor for a capable and promising Junior developer.
I have ones that describe what kinds of functions get unit vs integration tests, how to structure them, and the general kinds of test cases to check for (they love writing way too many tests IME). It has reduced the back and forth I have with the LLM telling it to correct something.
Usually the first time it does something I don't like, I have it correct it. Once it's in a satisfactory state, I tell it to write a Cursor rule describing the situation BRIEFLY (it gets way to verbose by default) and how to structure things.
That has made writing LLM code so much more enjoyable for me.
For example, someone may ask an LLM to write a simple http web server, and it can do that fine, and they consider that complex, when in reality its really not.
This is an extremely false statement.
There also seem to be people hearing big names like Karpathy and Linus Torvalds say they are vibe coding on their hobby projects, meaning who knows what, and misunderstanding this as being an endorsement of "magic genie" creation of professional quality software.
Results of course also vary according to how well what you are asking the AI to do matches what it was trained on. Despite sometimes feeling like it, it is not a magic genie - it is a predictor that is essentially trying to best match your input prompt (maybe a program specification) to pieces of what it was trained on. If there is no good match, then it'll have a go anyway, and this is where things tend to fall apart.
It seems clear that Karpathy himself is well aware of the difference between "vibe coding" as he defined it (which he explicitly said was for playing with on hobby projects), and more controlled productive use of AI for coding, which has either eluded him, or maybe his expectations are too high and (although it would be surprising) he has not realized the difference between the types of application where people are finding it useful, and use cases like his own that do not play to its strength.
You have to pick people with nothing to gain. https://x.com/rough__sea/status/2013280952370573666
You don't have to be bad at coding to use LLMs. The argument was specifically about thinking that LLMS can be great at accomplishing complex tasks (which they are not)
I hold a result of AI in front of your face and they still proclaim it’s garbage and everything else is fraudulent.
Let’s be clear. You’re arguing against a fantasy. Nobody even proponents of AI claims that AI is as good as humans. Nowhere near it. But they are good enough for pair programming. That is indisputable. Yet we have tons of people like you who stare at reality and deny it and call it fraudulent.
Examine the lay of the land if that many people are so divided it really means both perspectives are correct in a way.
Because the dirty secret is a lot of successful people aren't actually smart or talented, they just got lucky. Or they aren't successful at all, they're just good at pretending they are, either through taking credit for other people's work or flat out lying.
I've run into more than a few startups that are just flat out lying about their capabilities and several that were outright fraud. (See DoNotPay for a recent fraud example lol)
Pointing to anyone and going "well THEY do it, it MUST work" is frankly engineering malpractice. It might work. But unless you have the chops to verify it for yourself, you're just asking to be conned.
If we're being honest with ourselves, Opus 4.5 / GPT 5.2 etc are maybe 10-20% better than GPT 3.5 at most. It's a total and absolute catastrophic failure that will go down in history as one of humanity's biggest mistakes.
His tweets were getting ~40k views average. He made his big proclamation about AI and boom viral 7 million
This is happening over, and over, and over again
I'm not saying he's making shit up but you're naive if you don't think they're slightly tempted by the clear reaction this content gets
That's exactly the point. Modern coding agents aren't smart software engineers per se; they're very very good goal-seekers whose unit of work is code. They need automatable feedback loops.
A complete exercise in frustration that has turned me off of all agentic code bullshit. The only reason I still have Claude Code installed is because I like the `/multi-commit` skill I made.
These cases are common enough to where it's more systemic than isolated.
I read these comments and articles and feel like I am completely disconnected from most people here. Why not use GenAI the way it actually works best: like autocomplete on steroids. You stay the architect, and you have it write code function by function. Don't show up in Claude Code or Codex asking it to "please write me GTA 6 with no mistakes or you go to jail, please."
It feels like a lot of people are using GenAI wrong.
That argument doesn’t fly when the sellers of the technology literally sing at you “there’s no wrong way to prompt”.
https://youtu.be/9bBfYX8X5aU?t=48
You try a gamut of sample inputs and observe where its going awry? Describe the error to it and see what it does
It would correctly modify a single method. I would ask it to repeat for next and it would fail.
The code that our contractors are submitting is trash and very high loc. When you inspect it you can see that unit tests are testing nothing of value.
stuff like thatits all fake coverage, for fake tests, for fake OKRs
what are people actually getting done? I've sat next to our top evangelist for 30 minutes pair programming and he just fought the tool saying something was wrong with the db while showing off some UI I dont care about.
like that seems to be the real issue to me. i never bother wasting time with UI and just write a tool to get something done. but people seem impressed that AI did some shitty data binding to a data model that cant do anything, but its pretty.
it feels weird being an avowed singularitarian but adamant that these tools suck now.
A trivial example is your happy path git workflow. I want:
- pull main
- make new branch in user/feature format
- Commit, always sign with my ssh key
- push
- open pr
but it always will
- not sign commits
- not pull main
- not know to rebase if changes are in flight
- make a million unnecessary commits
- not squash when making a million unnecessary commits
- have no guardrails when pushing to main (oops!)
- add too many comments
- commit message too long
- spam the pr comment with hallucinated test plans
- incorrectly attribute itself as coauthor in some gorilla marketing effort (fixable with config, but whyyyyyy -- also this isn't just annoying, it breaks compliance in alot of places and fundamentally misunderstands the whole point of authorship, which is copyright --- and AIs can't own copyright )
- not make DCO compliant commits ...
Commit spam is particularly bad for bisect bug hunting and ref performance issues at scale. Sure I can enforce Squash and Merge on my repo but why am I relying on that if the AI is so smart?
All of these things are fixed with aliases / magit / cli usage, using the thing the way we have always done it.
Because it's not? I use these things very extensively to great effect, and the idea that you'd think of it as "smart" is alien to me, and seems like it would hurt your ability to get much out of them.
Like, they're superhuman at breadth and speed and some other properties, but they don't make good decisions.
Yet. Most of my criticism is not after running the code, but after _reading_ the code. It wrote code. I read it. And I am not happy with it. No even need to run it, it's shit at glance.
It sounds like you know what the problem with your AI workflow is? Have you tried using an agent? (sorry somewhat snarky but… come on)
The thing that differentiates LLM's from my stupid but cute vacuum cleaner, is that the (at least OpenAI's) AI model is cocksure and wrong, which is infinitely more infuriating than being a bit clueless and wrong.
ETA: I've probably gotten 10k worth of junior dev time out of it this month.
Im not crazy about signing up for a subscription service, it depends on you remembering to cancel and not have a headache when you do cancel.
My theory is that the people who are impressed are trying to build CRUD apps or something like that.
"It Is Difficult to Get a Man to Understand Something When His Salary Depends Upon His Not Understanding It"
...might be my issue indeed. Trying to balance it by not being too stubborn though. I'm not doing AI just to be able to dump on them, you know.
They're certainly not perfect, but many of the issues that people post about as though they're show-stoppers are easily resolved with the right tools and prompting.
A lot of the failures people talk about seem to involve expecting the models to one-shot fairly complex requirements.
It is hands down good for code which is laborious or tedious to write, but once done, obviously correct or incorrect (with low effort inspection). Tests help but only if the code comes out nicely structured.
I made plenty of tools like this, a replacement REPL for MS-SQL, a caching tool in Python, a matplotlib helper. Things that I know 90% how to write anyway but don't have the time, but once in front of me, obviously correct or incorrect. NP code I suppose.
But business critical stuff is rarely like this, for me anyway. It is complex, has to deal with various subtle edge cases, be written defensively (so it fails predictably and gracefully), well structured etc. and try as I might, I can't get Claude to write stuff that's up to scratch in this department.
I'll give it instructions on how to write some specific function, it will write this code but not use it, and use something else instead. It will pepper the code with rookie mistakes like writing the same logic N times in different places instead of factoring it out. It will miss key parts of the spec and insist it did it, or tell me "Yea you are right! Let me rewrite it" and not actually fix the issue.
I also have a sense that it got a lot dumber over time. My expectations may have changed of course too, but still. I suspect even within a model, there is some variability of how much compute is used (eg how deep the beam search is) and supply/demand means this knob is continuously tuned down.
I still try to use Claude for tasks like this, but increasingly find my hit rate so low that the whole "don't write any code yet, let's build a spec" exercise is a waste of time.
I still find Claude good as a rubber duck or to discuss design or errors - a better Stack Exchange.
But you can't split your software spec into a set of SE questions then paste the code from top answers.
> It is hands down good for code which is laborious or tedious to write, but once done, obviously correct or incorrect (with low effort inspection).
The problem here is, that it fills in gaps that shouldn't be there in the first place. Good code isn't laborious. Good code is small. We learn to avoid unnecessary abstractions. We learn to minimize "plumbing" such that the resulting code contains little more than clear and readable instructions of what you intend for the computer to do.
The perfect code is just as clear as the design document in describing the intentions, only using a computer language.
If someone is gaining super speeds by providing AI clear design documents compared to coding themselves, maybe they aren't coding the way they should.
My biggest LLM success resulted in something operationally correct but was something that I would never want to try to modify. The LLM also had an increasingly difficult time adding features.
Meanwhile my biggest 'manual' successes have resulted in something that was operationally correct, quick to modify, and refuses to compile if you mess anything up.
The only thing I think I learned from some of those exchanges was that xslt adherents are approximately as vocal as lisp adherents.
I still use it from time to time for config files that a developer has to write. I find it easier to read that JSON, and it supports comments. Also, the distinction between attributes and children is often really nice to have. You can shoehorn that into JSON of course, but native XML does it better.
Obviously, I would never use it for data interchange (e.g. SOAP) anymore.
Well, those comments were arguing about how it is the absolute best for data interchange.
> I still use it from time to time for config files that a developer has to write.
Even back when XML was still relatively hot, I recalled thinking that it solved a problem that a lot of developers didn't have.
Because if, for example, you're writing Python or Javascript or Perl, it is dead easy to have Python or Javascript or Perl also be your configuration file language.
I don't know what language you use, but 20 years ago, I viewed XML as a Java developer's band-aid.
I don't how much scope realistically there is for writing these kinds of code nicely.
You can't dispense with yourself in those scenarios. You have to read, think, investigate, break things down into smaller problems. But I employ LLM's to help with that all the time.
Granted, that's not vibe coding at all. So I guess we are pretty much in agreement up to this point. Except I still think LLMs speed up this process significantly, and the models and tools are only going to get better.
Also, there are a lot of developers that are just handed the implementation plan.
That's your job.
The great thing about coding agents is that you can tell them "change of design: all API interactions need to go through a new single class that does authentication and retries and rate-limit throttling" and... they'll track down dozens or even hundreds of places that need updating and fix them all.
(And the automated test suite will help them confirm that the refactoring worked properly, because naturally you had them construct an automated test suite when they built those original features, right?)
Going back to typing all of the code yourself (my interpretation of "writing by hand") because you don't have the agent-managerial skills to tell the coding agents how to clean up the mess they made feels short-sighted to me.
I dunno, maybe I have high standards but I generally find that the test suites generated by LLMs are both over and under determined. Over-determined in the sense that some of the tests are focused on implementation details, and under-determined in the sense that they don't test the conceptual things that a human might.
That being said, I've come across loads of human written tests that are very similar, so I can see where the agents are coming from.
You often mention that this is why you are getting good results from LLMs so it would be great if you could expand on how you do this at some point in the future.
Or I can say "use pytest-httpx to mock the endpoints" and Claude knows what I mean.
Keeping an eye on the tests is important. The most common anti-pattern I see is large amounts of duplicated test setup code - which isn't a huge deal, I'm much more more tolerant of duplicated logic in tests than I am in implementation, but it's still worth pushing back on.
"Refactor those tests to use pytest.mark.parametrize" and "extract the common setup into a pytest fixture" work really well there.
Generally though the best way to get good tests out of a coding agent is to make sure it's working in a project with an existing test suite that uses good patterns. Coding agents pick the existing patterns up without needing any extra prompting at all.
I find that once a project has clean basic tests the new tests added by the agents tend to match them in quality. It's similar to how working on large projects with a team of other developers work - keeping the code clean means when people look for examples of how to write a test they'll be pointed in the right direction.
One last tip I use a lot is this:
I do this all the time with different existing projects I've written - the quickest way to show an agent how you like something to be done is to have it look at an example.Yeah, this is where I too have seen better results. The worse ones have been in places where it was greenfield and I didn't have an amazing idea of how to write tests (a data person working on a django app).
Thanks for the information, that's super helpful!
I am not sure why, but it kept trying to do that, although I made several attempts.
Ended up writing it on my own, very odd. This was in Cursor, however.
If you start with an example file of tests that follow a pattern you like, along with the code the tests are for, it's pretty good at following along. Even adding a sentence to the prompt about avoiding tautological tests and focusing on the seams of functions/objects/whatever (integration tests) can get you pretty far to a solid test suite.
Another agent reviews the tests, finds duplicate code, finds poor testing patterns, looks for tests that are only following the "happy path", ensures logic is actually tested and that you're not wasting time testing things like getters and setters. That agent writes up a report.
Give that report back to the agent that wrote the test or spin up a new agent and feed the report to it.
Don't do all of this blindly, actually read the report to make sure the llm is on the right path. Repeat that one or two times.
Just writing one line in CLAUDE.md or similar saying "don't test library code; assume it is covered" works.
Half the battle with this stuff is realizing that these agents are VERY literal. The other half is paring down your spec/token usage without sacrificing clarity.
Just like anything else in software, you have to iterate. The first pass is just to thread the needle.
I don't get it. I have insanely high standards so I don't let the LLM get away with not meeting my standards. Simple.
Incidentally, I wonder if anyone has used LLMs to generate complex test scenarios described in prose, e.g. “write a test where thread 1 calls foo, then before hitting block X, thread 2 calls bar, then foo returns, then bar returns” or "write a test where the first network call Framework.foo makes returns response X, but the second call returns error Y, and ensure the daemon runs the appropriate mitigation code and clears/updates database state." How would they perform in this scenario? Would they add the appropriate shims, semaphores, test injection points, etc.?
> [Agents write] units of changes that look good in isolation.
I have only been using agents for coding end-to-end for a few months now, but I think I've started to realise why the output doesn't feel that great to me.
Like you said; "it's my job" to create a well designed code base.
Without writing the code myself however, without feeling the rough edges of the abstractions I've written, without getting a sense of how things should change to make the code better architected, I just don't know how to make it better.
I've always worked in smaller increments, creating the small piece I know I need and then building on top of that. That process highlights the rough edges, the inconsistent abstractions, and that leads to a better codebase.
AI (it seems) decides on a direction and then writes 100s of LOC at one. It doesn't need to build abstractions because it can write the same piece of code a thousand times without caring.
I write one function at a time, and as soon I try to use it in a different context I realise a better abstraction. The AI just writes another function with 90% similar code.
We expect the spec writing and prompt management to cover the "work smarter" bases, but part of the work smarter "loop" is hitting those points where "work harder" is about to happen, where you know you could solve a problem with 100s or 1000s of lines of code, pausing for a bit, and finding the smarter path/the shortcut/the better abstraction.
I've yet to see an "agentic loop" that works half as well as my well trained "work smarter loop" and my very human reaction to those points in time of "yeah, I simply don't want to work harder here and I don't think I need hundreds more lines of code to handle this thing, there has to be something smarter I can do".
In my opinion, the "best" PRs delete as much or more code than they add. In the cleanest LLM created PRs I've never seen an LLM propose a true removal that wasn't just "this code wasn't working according to the tests so I deleted the tests and the code" level mistakes.
I increasingly feel a sort of "guilt" when going back and forth between agent-coding and writing it myself. When the agent didn't structure the code the way I wanted, or it just needs overall cleanup, my frustration will get the best of me and I will spend too much time writing code manually or refactoring using traditional tools (IntelliJ). It's clear to me that with current tooling some of this type of work is still necessary, but I'm trying to check myself about whether a certain task really requires my manual intervention, or whether the agent could manage it faster.
Knowing how to manage this back and forth reinforces a view I've seen you espouse: we have to practice and really understand agentic coding tools to get good at working with them, and it's a complete error to just complain and wait until they get "good enough" - they're already really good right now if you know how to manage them.
> So I’m back to writing by hand for most things. Amazingly, I’m faster, more accurate, more creative, more productive, and more efficient than AI, when you price everything in, and not just code tokens per hour
At least he said "most things". I also did "most things" by hand, until Opus 4.5 came out. Now it's doing things in hours I would have worked an entire week on. But it's not a prompt-and-forget kind of thing, it needs hand holding.
Also, I have no idea _what_ agent he was using. OpenAI, Gemini, Claude, something local? And with a subscription, or paying by the token?
Because the way I'm using it, this only pays off because it's the 200$ Claude Max subscription. If I had to pay for the token (which once again: are hugely marked up), I would have been bankrupt.
"vibe coding" didn't really become real until 2025, so how were they vibe coding for 2 years? 2 years ago I couldn't count on an llm to output JSON consistently.
Overall the article/video are SUPER ambiguous and frankly worthless.
I remember being amazed and at the time thinking the game had changed. But I've never been able to replicate it since. Even the latest and greatest models seem to always go off and do something stupid that it can't figure out how to recover from without some serious handholding and critique.
LLMs are basically slot machines, though, so I suppose there has always been a chance of hitting the jackpot.
No, that isn't. To quote your own blog, his job is to "deliver code [he's] proven to work", not to manage AI agents. The author has determined that managing AI agents is not an effective way to deliver code in the long term.
> you don't have the agent-managerial skills to tell the coding agents how to clean up the mess they made
The author has years of experience with AI assisted coding. Is there any way we can check to see if someone is actually skilled at using these tools besides whether they report/studies measure that they do better with them than without?
Or those skills are a temporary side effect of the current SOTA and will be useless in the future, so honing them is pointless right now.
Agents shouldn't make messes, if they did what they say on the tin at least, and if folks are wasting considerable time cleaning them up, they should've just written the code themselves.
Exactly.
AI assisted development isn't all or nothing.
We as a group and as individuals need to figure out the right blend of AI and human.
"We as a group need to figure out the right blend of strong static and weak dynamic typing."
One can look around and see where that old discussion brought us. In my opinion, nowhere, things are same as they were.
So, where will LLM-assisted coding bring us? By rhyming it with the static types, I see no other variants than "nowhere."
For small projects, I don’t think it makes a huge difference.
But for large projects, I’d guess that most die-hard dynamic people who have tried typescript have now seen the light and find lots of benefits to static typing.
My own experience suggest that if you need to develop heavily multithreaded application, you should use Haskell and you need some MVars if you are working alone and you need software transactional memory (STM) if you are working as part of a team, two and more people.
STM makes stitching different parts of the parallel program together as easy as just writing sequential program - sequential coordination is delegated to STM. But, STM needs control of side effects, one should not write a file inside STM transaction, only before transaction is started or after transaction is finished.
Because of this, C#, F#, C++, C, Rust, Java and most of programming languages do not have a proper STM implementation.
For controlling (and combining) (side) effects one needs higher order types and partially instantiated types. These were already available in Haskell (ghc 6.4, 2005) at the time Rust was conceived (2009), for four years.
Did Rust do anything to have these? No. The authors were a little bit too concerned to reimplement what Henry Baker did at the beginning of 1990-s, if not before that.
Do Rust authors have plans to implement these? No, they have other things to do urgently to serve community better. As if making complex coordination of heavily parallel programs is not a priority at all.
This is where I get my "rhyme" from.
Vibe coding is the extreme end of using AI, while handwriting is the extreme end of not using AI. The optimal spot is somewhere in the middle. Where exactly that spot is, I think is still up for debate. But the debate is not progressed in any way by latching on to the extremes and assuming that they are the only options.
Because when I see people that are downplaying LLMs or the people describing their poor experiences it feels like they're trying to "vibe code" but they expect the LLM to automatically do EVERYTHING. They take it as a failure that they have to tell the LLM explicitly to do something a couple times. Or they take it as a problem that the LLM didn't "one shot" something.
But I'm thankful for you devs that are giving me job security.
For example, most political flamefests
Like people in here complaining about how poor the tests are... but did they start another agent to review the tests? Did they take that and iterate on the tests with multiple agents?
I can attest that the first pass of testing can often be shit. That's why you iterate.
So far, by the time I’m done iterating, I could have just written it myself. Typing takes like no time at all in aggregate. Especially with AI assisted autocomplete. I spend far more time reading and thinking (which I have to do to write a good spec for the AI anyways).
It came very close to success, but there were 2 or 3 big show-stopping bugs such as it forgetting to update the spatial partitioning when the entities moved, so it would work at the start but then degrade over time.
It believed and got stuck on thinking that it must be the algorithm itself that was the problem, so at some point it just stuck a generic boids solution into the middle of the rest. To make it worse, it didn't even bother to use the spatial partitioning and they were just brute force looking at their neighbours.
Had this been a real system it might have made its way into production, which makes one think about the value of the AI code out there. As it was I pointed out that bit and asked about it, at which point it admitted that it was definitely a mistake and then it removed it.
I had previously implement my own version of the algorithm and it took me quite a bit of time, but during that I built up the mental code model and understood both the problem and solution by the end. In comparison it easily implemented it 10-30x faster than I did but would never have managed to complete the project on its own. Also if I hadn't previously implemented it myself and had just tried to have it do the heavy lifting then I wouldn't have understood enough of what it was doing to overcome its issues and get the code working properly.
There's been such a massive leap in capabilities since claude code came out, which was middle/end of 2025.
2 years ago I MAYBE used an LLM to take unstructured data and give me a json object of a specific structure. Only about 1 year ago did I start using llms for ANY type of coding and I would generally use snippets, not whole codebases. It wasn't until September when I started really leveraging the LLM for coding.
https://x.com/karpathy/status/1886192184808149383
I shipped a small game that way (https://love-15.com/) -- one that I've wished to make for a long time but wouldn't have been worth building other wise. It's tiny, really, but very niche -- despite being tiny, I hit brick walls multiple times vibing it, and had to take a few brief breaks from vibing to get it unstuck.
Claude Code was a step change after that, along with model upgrades, about 9 months ago. That size project has been doable as a vibe coded project since then without hitting brick walls.
All this to say I really doubt most claims about having been vibe coding for more than 9-15 months.
Now I expect to start seeing job postings asking for "3 years of experience vibe coding"
It's used more broadly now, but still to refer to the opposite end of the spectrum of AI-assisted coding to what you described.
Best case is still operationally correct but nightmare fuel on the inside. So maybe good for one off tools where you control inputs and can vibe check outputs without diaster if you forget to carry the one.
Well yea, but you can guard against this in several ways. My way is to understand my own codebase and look at the output of the LLM.
LLMs allow me to write code faster and it also gives a lot of discoverability of programming concepts I didn't know much about. For example, it plugged in a lot of Tailwind CSS, which I've never used before. With that said, it does not absolve me from not knowing my own codebase, unless I'm (temporarily) fine with my codebase being fractured conceptually in wonky ways.
I think vibecoding is amazing for creating quick high fidelity prototypes for a green field project. You create it, you vibe code it all the way until your app is just how you want it to feel. Then you refactor it and scale it.
I'm currently looking at 4009 lines of JS/JSX combined. I'm still vibecoding my prototype. I recently looked at the codebase and saw some ready made improvements so I did them. But I think I'll start to need to actually engineer anything once I reach the 10K line mark.
Then you are not vibe coding. The core, almost exclusive requirement for "vibe coding" is that you DON'T look at the code. Only the product outcome.
You don't even look at the diffs. You just yolo the code.
https://x.com/i/status/1886192184808149383
> It’s not until I opened up the full codebase and read its latest state cover to cover that I began to see what we theorized and hoped was only a diminishing artifact of earlier models: slop.
This is true vibe coding, they exclusively interacted with the project through the LLM, and only looked at its proposed diffs in a vacuum.
If they had been monitoring the code in aggregate the entire time they likely would have seen this duplicative property immediately.
> What’s worse is code that agents write looks plausible and impressive while it’s being written and presented to you. It even looks good in pull requests (as both you and the agent are well trained in what a “good” pull request looks like).
Which made me think that they were indeed reading at least some of the code - classic vibe coding doesn't involve pull requests! - but weren't paying attention to the bigger picture / architecture until later on.
Is it a skill for the layman?
Or does it only work if you have the understanding you would need to manage a team of junior devs to build a project.
I feel like we need a different term for those two things.
Programming together with AI however, is a skill, mostly based on how well you can communicate (with machines or other humans) and how well your high-level software engineering skills are. You need to learn what it can and cannot do, before you can be effective with it.
I call the act of using AI to help write code that you review, or managing a team of coding agents "AI-assisted programming", but that's not a snappy name at all. I've also skirted around the idea of calling it "vibe engineering" but I can't quite bring myself to commit to that: https://simonwillison.net/2025/Oct/7/vibe-engineering/
I think we need another term for using an LLM to write code but absolutely not forgetting the code exists.
I often use LLMs to do refactoring and, by definition, refactoring cannot be vibe-coding because that's caring about the code.
That is now what software engineering is.
Normally I'd know 100% of my codebase, now I understand 5% of it truly. The other 95% I'd need to read it more carefully before I daresay I understand it.
I agree there is a spectrum, and all the way to the left you have "vibe coding" and all the way to the right you have "manual programming without AI", of course it's fine to be somewhere in the middle, but you're not doing "vibe coding" in the way Karpathy first meant it.
This is the bit I think enthusiasts need to argue doesn't apply.
Have you ever read a 200 page vibewritten novel and found it satisfying?
So why do you think a 10 kLoC vibecoded codebase will be any good engineering-wise?
I've been coding a side-project for a year with full LLM assistance (the project is quite a bit older than that).
Basically I spent over a decade developing CAD software at Trimble and now have pivoted to a different role and different company. So like an addict, I of course wanted to continue developing CAD technology.
I pretty much know how CAD software is supposed to work. But it's _a lot of work_ to put together. With LLMs I can basically speedrun through my requirements that require tons of boilerplate.
The velocity is incredible compared to if I would be doing this by hand.
Sometimes the LLM outputs total garbage. Then you don't accept the output, and start again.
The hardest parts are never coding but design. The engineer does the design. Sometimes I pain weeks or months over a difficult detail (it's a sideproject, I have a family etc). Once the design is crystal clear, it's fairly obvious if the LLM output is aligned with the design or not. Once I have good design, I can just start the feature / boilerplate speedrun.
If you have a Windows box you can try my current public alpha. The bugs are on me, not on the LLM:
https://github.com/AdaShape/adashape-open-testing/releases/t...
About the project itself, do you plan to open source if eventually? LLM discussion aside, I've long been frustrated by the lack of a good free desktop 3D CAD software.
I would love to build this eventually to a real product so am not currently considering open sourcing it.
I can give you a free foreverlicense if you would like to be an alpha tester though :) - but am considering in any case for the eventual non-commercial licenses to be affordable&forever.
IMHO what the world needs is a good textbook on how to build CAD software. Mäntylä’s ”Solid modeling” is almost 40 years old. CAD itself is pushing 60-70 years.
The highly non-trivial parts in my app are open source software anyways (you can check the attribution file) and what this contributes is just a specific, opinionated way of how a program like this should work in 2020’s.
What I _would_ like to eventually contribute is a textbook in how to build something like this - and after that re-implementation would be a matter of some investment to LLM inference, testing, and end-user empathy. But that would have to wait either for my financial independence, AI-communism or my retirement :)
Thank you!
I shared the app because it’s not confidential and it’s concrete - I can’t really discuss work stuff without stressing out what I can share and what not.
At least in my workplace everyone I know is using Claude Code or Cursor.
Now, I don’t know why some people are productive with tools and some aren’t.
But the code generation capabilities are for real.
We’re moving into a world where suboptimal code doesn’t matter that much because it’s so cheap to produce.
(The model was asked to generate a chapter at a time. At each step, it was given the full outline of the novel, the characters, and a summary of each chapter so far.)
Did the model also come up with the idea for the novel, the characters, the outline?
For the other, my son wrote ~200 words total describing the story idea and the characters.
In each case, the model created the detailed outline and did all the writing.
I suspect part of the reason we see such a wide range of testimonies about vibe-coding is some people are actually better at it, and it would be useful to have some way of measuring that effectiveness.
—
I would never use, let alone pay for, a fully vibe-coded app whose implementation no human understands.
Whether you’re reading a book or using an app, you’re communicating with the author by way of your shared humanity in how they anticipate what you’re thinking as you explore the work. The author incorporates and plans for those predicted reactions and thoughts where it makes sense. Ultimately the author is conveying an implicit mental model (or even evoking emotional states or sensations) to the reader.
The first problem is that many of these pathways and edge cases aren’t apparent until the actual implementation, and sometimes in the process the author realizes that the overall product would work better if it were re-specified from the start. This opportunity is lost without a hands on approach.
The second problem is that, the less human touch is there, the less consistent the mental model conveyed to the user is going to be, because a specification and collection of prompts does not constitute a mental model. This can create subconscious confusion and cognitive friction when interacting with the work.
If you’re writing novel algorithms all day, then I get your point. But are you? Or have you ever delegated work? If you find the AI losing its train of thought all it takes is to try again with better high level instructions.
It wasn't fully autonomous (the reliability was a bit low -- e.g. had to get the code out of code fences programmatically), and it wasn't fully original (I stole most of it from Auto-GPT, except that I was operating on the AST directly due to the token limitations).
My key insight here was that I allowed GPT to design the apis that itself was going to use. This makes perfect sense to me based on how LLMs work. You tell it to reach for a function that doesn't exist, and then you ask it to make it exist based on how it reached for it. Then the design matches its expectations perfectly.
GPT-4 now considers self modifying AI code to be extremely dangerous and doesn't like talking about it. Claude's safety filters began shutting down similar conversations a few months ago, suggesting the user switch to a dumber model.
It seems the last generation or two of models passed some threshold regarding self replication (which is a distinct but highly related concept), and the labs got spooked. I haven't heard anything about this in public though.
Edit: It occurs to me now that "self modification and replication" is a much more meaningful (and measurable) benchmark for artificial life than consciousness is...
BTW for reference the thing that spooked Claude's safety trigger was "Did PKD know about living information systems?"
I speculate that this has more to do with recent high-profile cases of self harm related to "AI psychosis" than any AGI-adjacent danger. I've read a few of the chat transcripts that have been made public in related lawsuits, and there seems to be a recurring theme of recursive or self-modifying enlightenment role-played by the LLM. Discouraging exploration of these themes would be a logical change by the vendors.
When some people say vibe coding, they mean they're copy-pasting snippets of code from ChatGPT.
When some people say vibe coding, they give a one sentence prompt to their cluster of Claude Code instances and leave for a road trip!
You don't need a "fully agentic" tool like Claude Code to write code. Any of the AI chatbots can write code too, obviously doing so better since the advent of "thinking" models, and RL post-training for coding. They also all have had built-in "code interpreter" functionality for about 2 years where they can not only write code but also run and test it in a sandbox, at least for Python.
Recently at least, the quality of code generation (at least if you are asking for something smallish) is good enough that cut and pasting chatbot output (e.g. C++, not Python) to compile and run yourself is still a productivity boost, although this was always an option.
Just more FUD from devs that think they're artisans.
On a personal note, vibe coding leaves me with that same empty hollow sort of tiredness, as a day filled with meetings.
And as a added benefit: I feel accomplished and proud of the feature.
We need to find the Goldilocks optimal level of AI assistance that doesn't leave everyone hating their jobs, while still boosting productivity.
I also like to think that I'm utilising the training done on many millions of lines of code while still using my experience/opinions to arrive at something compared to just using my fallible thinking wherein I could have missed some interesting ideas. Its like me++. Sure, it does a lot of heavy lifting but I never leave the steering wheel. I guess I'm still at the pre-agentic stage and not ready to letting go fully.
I’m not sure if this counts as “vibe coding” per se, but I like that this mentality keeps my workday somewhat similar to how it was for decades. Finding/creating holes that the agent can fill with minimal adult supervision is a completely new routine throughout my day, but I think obsessing over maintainability will pay off, like it always has.
It's crazy to me nevertheless that some people can afford the luxury to completely renounce AI-assisted coding.
My habit now: always get a 2nd or 3rd opinion before assuming one LLM is correct.
All code written by an LLM is reviewed by an additional LLM. Then I verify that review and get one of the agents to iterate on everything.
If what you're doing is proprietary, or even a little bit novel. There is a really good chance that AI will screw it up. After all, how can it possibly know how to solve a problem it has never seen before?
Might be my skills but I can tell you right now I will not be as fast as the AI especially in new codebases or other languages or different environments even with all the debugging and hell that is AI pull request review.
I think the answer here is fast AI for things it can do on its own, and slow, composed, human in the loop AI for the bigger things to make sure it gets it right. (At least until it gets most things right through innovative orchestration and model improvement moving forward.)
I tried minimalist example where it totally failed few years back, and still, ChatGPT 5 produced 2 examples for "Async counter in Rust" - using Atomics and another one using tokio::sync::Mutex. I learned it was wrong then the hard way, by trying to profile high latency. To my surprise, here's quote from Tokio Mutex documentation:
Contrary to popular belief, it is ok and often preferred to use the ordinary Mutex from the standard library in asynchronous code.
The feature that the async mutex offers over the blocking mutex is the ability to keep it locked across an .await point.
I have AI build self-contained, smallish tasks and I check everything it does to keep the result consistent with global patterns and vision.
I stay in the loop and commit often.
Looks to me like the problem a lot of people are having is that they have AI do the whole thing.
If you ask it "refactor code to be more modern", it might guess what you mean and do it in a way you like it or not, but most likely it won't.
If you keep tasks small and clearly specced out it works just fine. A lot better than doing it by hand in many cases, specially for prototyping.
it'll be really interesting to see in the decades to come what happens when a whole industry gets used to releasing black boxes by vb coding the hell out of it
Once I mastered the finite number of operations and behaviors, I knew how to tell "it" what to do and it would work. The only thing different about vibe coding is the scale of operations and behaviors. It is doing exactly what you're telling it to do. And also expectations need to be aligned. Don't think you can hand over architecture and design to the LLM; that's still your job. The gain is, the LLM will deal with the proper syntax, api calls, etc. and work as a reserach tool on steroids if you also (from another mentor later in life) ask good questions.
And I also might "vibe code" when I need to add another endpoint on a deadline to earn a living. To be fair - I review and test the code so not sure it's really vibe coding.
For me it's not that binary.
- No ai engineers - Minimal AI autocomplete engineers - Simple agentic developers - Vibe coders who review code they get - Complete YOLO vibe coders who have no clue how their "apps" work
And that spectrum will also correlate to the skill level in engineering: from people who understand what they are doing and what their code is doing - to people who have lost (or never even had) software engineering skills and who only know how to count lines of code and write .md files.
We're modern day factory workers.
I have to go out of my way to get this out of llms. But with enough persuasion, they produce roughly what I would have written myself.
Otherwise they default to adding as much bloat and abstraction as possible. This appears to be the default mode of operation in the training set.
I also prefer to use it interactively. I divide the problem to chunks. I get it to write each chunk. The whole makes sense. Work with its strengths and weaknesses rather than against them.
For interactive use I have found smaller models to be better than bigger models. First of all because they are much faster. And second because, my philosophy now is to use the smallest model that does the job. Everything else by definition is unnecessarily slow and expensive!
But there is a qualitative difference at a certain level of speed, where something goes from not interactive to interactive. Then you can actually stay in flow, and then you can actually stay consciously engaged.
I work on game engines which do some pretty heavy lifting, and I'd be loath to let these agents write the code for me.
They'd simply screw too much of it up and create a mess that I'm going to have to go through by hand later anyway, not just to ensure correctness but also performance.
I want to know what the code is doing, I want control over the fine details, and I want to have as much of the codebase within my mental understanding as possible.
Not saying they're not useful - obviously they are - just that something smells fishy about the success stories.
Option 1: The cost/benefit delta of agentic engineering never improves past net-zero, and bespoke hand-written code stays as valuable as ever.
Option 2: The cost/benefit becomes net positive, and economics of scale forever tie the cost of code production directly to the cost of inference tokens.
Given that many are saying option #2 is already upon us, I'm gonna keep challenging myself to engineer a way past the hurdles I run into with agent-oriented programming.
The deeper I get, the more articles like this feel like the modern equivalent of saying "internet connections are too slow to do real work" or "computers are too expensive to be useful for regular people".
[0]: https://news.ycombinator.com/item?id=37888477
It's worth mentioning that even today, Copilot is an underwhelming-to-the-point-obstructing kind of product. Microsoft sent salespeople and instructors to my job, all for naught. Copilot is a great example of how product > everything, and if you don't have a good product... well...
As I have never tried Claude Code, I can't say how much better it is. But Copilot is definitely more then auto-complete. Like I already wrote, it can do Planning mode, edit mode, mcp, tool calling, web searches.
Or any specific that caused this feeling
All under one subscription.
Does not support upload / reading of PDF files :(
Yes, definitely. I use it mostly in Agent mode, then switch to Ask mode to ask it questions.
> How's the autocomplete?
It works reasonably well, but I'm less interested in autocomplete.
While this is likely feasible, I imagine it is also an instant fireable offense at these sites if not already explicitly directed by management. Also not sure how Microsoft would react upon finding out (never seen the enterprise licensing agreement paperwork for these setups). Someone's account driving Claude Code via Github Copilot will also become a far outlier of token consumption by an order(s) of magnitude, making them easy to spot, compared to their coworkers who are limited to the conventional chat and code completion interfaces.
If someone has gotten the enterprise Github Copilot integration to work with something like Claude Code though (simply to gain access to the models Copilot makes available under the enterprise agreement, in a blessed golden path by the enterprise), then I'd really like to know how that was done on both the non-technical and technical angles, because when I briefly looked into it all I saw were very thorny, time-consuming issues to untangle.
Outside those environments, there are lots of options to consume Claude Code via Github Copilot like with Visual Studio Code extensions. So much smaller companies and individuals seem to be at the forefront of adoption for now. I'm sure this picture will improve, but the rapid rate of change in the field means those whose work environment is like those enterprise constrained ones I described but also who don't experiment on their own will be quite behind the industry leading edge by the time it is all sorted out in the enterprise context.
I don't "vibecode" though, if I don't understand what it's doing I don't use it. And of course, like all LLMs, sometimes it goes on a useless tangent and must be reigned in.
I am writing a game in Monogame, I am not primarily a game dev or a c sharp dev. I find AI is fantastic here for "Set up a configuration class for this project that maps key bindings" and have it handle the boiler plate and smaller configuration. Its great at give me an A start implementation for this graph. But when it becomes x -> y -> z without larger contexts and evolutions it falls flat. I still need creativity. I just don't worry too much about boiler plate, utility methods, and figuring out specifics of wiring a framework together.
I will have a conversation with the agent. I will present it with a context, an observed behavior, and a question... often tinged with frustration.
What I get out of this interaction at the end of it is usually a revised context that leads me figure out a better outcome. The AI doesn't give me the outcome. It gives me alternative contexts.
On the other hand, when I just have AI write code for me, I lose my mental model of the project and ultimately just feel like I'm delaying some kind of execution.
As a PRODUCT person, it writes code 100x faster than I can, and I treat anything it writes as a "throwaway" prototype. I've never been able to treat my own code as throwaway, because I can't just throw away multiple weeks of work.
It doesn't aid in my learning to code, but it does aid in me putting out much better, much more polished work that I'm excited to use.
We can identify 3 levels of "vibe coding":
1. GenAI Autocomplete
2. Hyperlocal prompting about a specific function. (Copilot's orginal pitch)
3. Developing the app without looking at code.
Level 3 is hardly considered "vibe" coding, and Level 2 is iffy.
"90% of code written by AI" in some non-trivial contexts only very recently reached level 3.
I don't think it ever reached Level 2, because that's just a painfully tedious way of writing code.
It is quite scary that junior devs/college kids are more into vibe coding than putting in the effort to actually learn the fundamentals properly. This will create at least 2-3 generations of bad programmers down the line.
1: https://asfaload.com/blog/ai_use/
Then, I can reason through the AI agent's responses and decide what if anything I need to do about them.
I just did this for one project so far, but got surprisingly useful results.
It turns out that the possible bugs identified by the AI tool were not bugs based on the larger context of the code as it exists right now. For example, it found a function that returns a pointer, and it may return NULL. Call sites were not checking for a NULL return value. The code in its current state could never in fact return a NULL value. However, future-proofing this code, it would be good practice to check for this case in the call sites.
Nobody forces you to completely let go of the code and do pure vibe coding. You can also do small iterations.
What AI (LLMs) do is raises the level of abstraction to human language via translation. The problem is human language is imprecise in general. You can see this with legal or science writing. Legalese is almost illegible to laypeople because there are precise things you need to specify and you need be precise in how you specify it. Unfortunately the tech community is misleading the public and telling laypeople they can just sit back and casually tell AI what you want and it is going to give you exactly what you wanted. Users are just lying to themself, because most-likely they did not take the time to think through what they wanted and they are rationalizing (after the fact) that the AI is giving them exactly what they wanted.
Examples.
Thanks to Claude I've finally been able to disable the ssh subsystem of the GNOME keyring infrastructure that opens a modal window asking for ssh passhprases. What happened is that I always had to cancel the modal, look for the passhprase in my password manager, restart what made the modal open. What I have now is either a password prompt inside a terminal or a non modal dialog. Both ssh-add to a ssh agent.
However my new emacs windows still open in an about 100x100 px window on my new Debian 13 install, nothing suggested by Claude works. I'll have to dig into it but I'm not sure that's important enough. I usually don't create new windows after emacs starts with the saved desktop configuration.
I think coding with an AI changes our role from code writer to code reviewer, and you have to treat it as a comprehensive review where you comment not just on code "correctness" but these other aspects the author mentions, how functions fits together, codebase patterns, architectural implications. While I feel like using AI might have made me a lazier coder, it's made me a me a significantly more active reviewer which I think at least helps to bridge the gap the author is referencing.
So while there’s no free lunch, if you are willing to pay - your lunch will be a delicious unlimited buffet for a fraction of the cost.
In order to get high accuracy PRs with AI (small, tested commits that follow existing patterns efficiently), you need to spend time adding agents (claude.md, agents.md), skills, hooks, and tools specific to your setup.
This is why so much development is happening at the plugin layer right now, especially with Claude code.
The juice is worth the squeeze. Once accuracy gets high enough you don't need to edit and babysit what is generated, you can horizontally scale your output.
I admit I could be an outlier though.
That's exactly why this whole (nowadays popular) notion of AI replacing senior devs who are capable of understanding large codebases is nonsense and will never become reality.
The opener is 100% true. Our current approach with AI code is "draft a design in 15mins" and have AI implement it. The contrasts with the thoughtful approach a human would take with other human engineers. Plan something, pitch the design, get some feedback, take some time thinking through pros and cons. Begin implementing, pivot, realizations, improvements, design morphs.
The current vibe coding methodology is so eager to fire and forget and is passing incomplete knowledge unto an AI model with limited context, awareness and 1% of your mental model and intent at the moment you wrote the quick spec.
This is clearly not a recipe for reliable and resilient long-lasting code or even efficient code. Spec-driven development doesn't work when the spec is frozen and the builder cannot renegotiate intent mid-flight..
The second point made clearer in the video is the kind of learned patterns that can delude a coder, who is effectively 'doing the hard part', into thinking that the AI is the smart one. Or into thinking that the AI is more capable than it actually is.
I say this as someone who uses Claude Code and Codex daily. The claims of the article (and video) aren't strawman.
Can we progress past them? Perhaps, if we find ways to have agents iteratively improve designs on the fly rather than sticking with the original spec that, let's be honest, wasn't given the rigor relative to what we've asked the LLMs to accomplish. If our workflows somehow make the spec a living artifact again -- then agents can continuously re-check assumptions, surface tradeoffs, and refactor toward coherence instead of clinging to the first draft.
Perhaps that is the distinction between reports of success with AI and reports of abject failure. Your description of "Our current approach" is nothing like how I have been working with AI.
When I was making some code to do a complex DMA chaining, the first step with the AI was to write an emulator function that produced the desired result from the parameters given in software. Then a suite of tests with memory to memory operations that would produce a verifiable output. Only then started building the version that wrote to the hardware registers ensuring that the hardware produced the same memory to memory results as the emulator. When discrepancies occurred, checking the test case, the emulator and the hardware with the stipulation that the hardware was the ground truth of behaviour and the test case should represent the desired result.
I occasionally ask LLMs to one shot full complex tasks, but when I do so it is more as a test to see how far it gets. I'm not looking to use the result, I'm just curious as to what it might be. The amount of progress it makes before getting lost is advancing at quite a rate.
It's like seeing an Atari 2600 and expecting it to be a Mac. People want to fly to the moon with Atari 2600 level hardware. You can use hardware at that level to fly to the moon, and flying to the moon is an impressive achievement enabled by the hardware, but to do so you have to wrangle a vast array of limitations.
They are no panacea, but they are not nothing. They have been, and will remain, somewhere between for some time. Nevertheless they are getting better and better.
"AI can be good -- very good -- at building parts. For now, it's very bad at the big picture."
I disagree though. There’s no good reason that careful use of this new form of tooling can’t fully respect the whole, respect structural integrity, and respect neighboring patterns.
As always, it’s not the tool.
That's a very bad way to look at these tools. They legit know nothing, they hallucinate APIs all the time.
The only value they have at least in my book is they type super fast.
It's just a tool with a high level of automation. That becomes clear when you have to guide it to use more sane practices, simple things like don't overuse HTTP headers when you don't need them.
You should never just let AI "figure it out." It's the assistant, not the driver.
Good take though.
this is such an individualized technology that two people at the same starting point two years ago, could've developed wildly different workflows.
There are many instances where I get to the final part of the feature and realize I spent far more time coercing AI to do the right thing than it would have taken me to do it myself.
It is also sometimes really enjoyable and sometimes a horrible experience. Programming prior to it could also be frustrating at times, but not in the same way. Maybe it is the expectation of increased efficiency that is now demanded in the face of AI tools.
I do think AI tools are consistently great for small POCs or where very standard simple patterns are used. Outside of that, it is a crapshoot or slot machine.
I have been tolerably successful. However, I have almost 30 years of coding experience, and have the judgement on how big a component should be - when I push that myself _or_ with AI, things go hairy.
ymmv.
For the record, I use AI to generate code but not for "vibecoding". I don't believe when people tell me "you just prompt it badly". I saw enough to lose faith.
Homelab is my hobby where I run Proxmox, Debian VM, DNS, K8s, etc, all managed via Ansible.
For what it is worth, I hate docker :)
I wanted to setup a private tracker torrent that should include:
1) Jackett: For the authentication
2) Radarr: The inhouse browser
3) qBitorrent: which receives the torrent files automatically from Radarr
4) Jellyfin: Of course :)
I used ChatGPT to assist me into getting the above done as simple as possible and all done via Ansible:
1) Ansible playbook to setup a Debian LXC Proxmox container
2) Jackett + Radarr + qBitorrent all in one for simplicity
3) Wireguard VPN + Proton VPN: If the VPN ever go down, the entire container network must stop (IPTables) so my home IP isn't leaked.
After 3 nights I got everything working and running 24/7, but it required a lot of review so it can be managed 10 years down the road instead of WTF is this???
There were silly mistakes that make you question "Why am I even using this tool??" but then I remember, Google and search engines are dead. It would have taken me weeks to get this done otherwise, AI tools speed that process by fetching the info I need so I can put them together.
I use AI purely to replace the broken state of search engines, even Brave and DuckDuckGo, I know what I am asking it, not just copy/paste and hope it works.
I have colleagues also into IT field whose the company where they work are fully AI, full access to their environment, they no longer do the thinking, they just press the button. These people are cooked, not just because of the state of AI, if they ever go look for another job, all they did for years was press a button!!
This is vibe argumenting.
Have people always been this easy to market to?
You gotta have a better argument than "AI Labs are eating their own dogfood". Are there any other big software companies doing that successfully? I bet yes, and think those stories carry more weight.
I think the most I can say I've dove in was in the last week. I wrangled some resources to build myself a setup with a completely self-hosted and agentic workflow and used several open-weight models that people around me had specifically recommended, and I had a work project that was self-contained and small enough to work from scratch. There were a few moving pieces but the models gave me what looked like a working solution within a few iterations, and I was duly impressed until I realized that it wasn't quite working as expected.
As I reviewed and iterated on it more with the agents, eventually this rube-goldberg machine started filling in gaps with print statements designed to trick me and sneaky block comments that mentioned that it was placeholder code not meant for production in oblique terms three lines into a boring description of what the output was supposed to be. This should have been obvious, but even at this point four days in I was finding myself missing more things, not understanding the code because I wasn't writing it. This is basically the automation blindness I feared from proprietary workflows that could be changed or taken away at any time, but much faster than I had assumed, and the promise of being able to work through it at this higher level, this new way of working, seemed less and less plausible the more I iterated, even starting over with chunks of the problem in new contexts as many suggest didn't really help.
I had deadlines, so I gave up and spent about half of my weekend fixing this by hand, and found it incredibly satisfying when it worked, but all-in this took more time and effort and perhaps more importantly caused more stress than just writing it in the first place probably would have
My background is in ML research, and this makes it perhaps easier to predict the failure modes of these things (though surprisingly many don't seem to), but also makes me want to be optimistic, to believe this can work, but I also have done a lot of work as a software engineer and I think my intuition remains that doing precision knowledge work of any kind at scale with a generative model remains A Very Suspect Idea that comes more from the dreams of the wealthy executive class than a real grounding in what generative models are capable of and how they're best employed.
I do remain optimistic that LLMs will continue to find use cases that better fit a niche of state-of-the-art natural language processing that is nonetheless probabilistic in nature. Many such use cases exist. Taking human job descriptions and trying to pretend they can do them entirely seems like a poorly-thought-out one, and we've to my mind poured enough money and effort into it that I think we can say it at the very least needs radically new breakthroughs to stand a chance of working as (optimistically) advertised
I chuckled at this. This describes pretty much every large piece of software I've ever worked on. You don't need an LLM to create a giant piece of slop. To avoid it takes tons of planning, refinement, and diligence whether it's LLM's or humans writing it.
This is no different. And I'm not talking about vibe coding. I just mean having an llm browser window open.
When you're losing your abilities, it's easy to think you're getting smarter. You feel pretty smart when you're pasting that code
But you'll know when you start asking "do me that thingy again". You'll know from your own prompts. You'll know when you look at older code you wrote with fear and awe. That "coding" has shifted from an activity like weaving cloth to one more like watching YouTube.
Active coding vs passive coding
AI is far from perfect, but the same is true about any work you may have to entrust to another person. Shipping slop because someone never checked the code was literally something that happened several times at startups I have worked at - no AI necessary!
Vibecoding is an interesting dynamic for a lot of coders specifically because you can be good or bad at vibecoding - but the skill to determine your success isn't necessarily your coding knowledge but your management and delegation soft skills.
I just bootstrapped a 500k loc MVP with AI Generator, Community and Zapier integration.
www.clases.community
And is my 3rd project that size, fully vibe coded
"Amazingly, I’m faster, more accurate, more creative, more productive, and more efficient than AI, when you price everything in, and not just code tokens per hour."
For 99.99% of developers this just won't be true.
I also keep seeing that writing more detailed specs is the answer and retorts from those saying we’re back to waterfall.
That isn’t true. I think more of the iteration has moved to the spec. Writing the code is so quick now so can make spec changes you wouldn’t dare before.
You also need gates like tests and you need very regular commits.
I’m gradually moving towards more detailed specs in the form of use cases and scenarios along with solid tests and a constantly tuned agent file + guidelines.
Through this I’m slowly moving back to letting Claude lose on implementation knowing I can do scan of the git diffs versus dealing with a thousand ask before edits and slowing things down.
When this works you start to see the magic.
Relevant xkcd: https://xkcd.com/568/
Even if we reach the point where it's as good as a good senior dev. We will still have to explain what we want it to do.
That's how I find it most helpful too. I give it a task and work out the spec based on the bad assumptions it makes and manually fix it.
The result stunned everyone I work with. I would never in a million years put this code on Github for others. It's terrible code for a myriad reasons.
My lived experience was... the task was accomplished but not in a sustainable way over the course of perhaps 80 individual sessions with the longest being multiple solid 45 minute refactors...(codex-max)
About those. One of things I spotted fairly quickly was the tendency of models to duplicate effort or take convoluted approaches to patch in behaviors. To get around this, I would every so often take the entire codebase, send it to Gemini-3 Pro and ask it for improvements. Comically, every time, Gemini-3-Pro responds with "well this code is hot garbage, you need to refactor these 20 things". Meanwhile, I'm side-eying like.. dude you wrote this. Never fails to amuse me.
So, in the end, the project was delivered, was pretty cool, had 5x more features than I would have implemented myself and once I got into a groove -- I was able to reduce the garbage through constant refactors from large code reviews. Net Positive experience on a project that had zero commercial value and zero risk to customers.
But on the other hand...
I spend a week troubleshooting a subtle resource leak (C#) on a commercial project that was introduced during a vibe-coding session where a new animation system was added and somehow added a bug that caused a hard crash on re-entering a planet scene.
The bug caused an all-stop and a week of lost effort. Countless AI Agent sessions circularly trying to review and resolve it. Countless human hours of testing and banging heads against monitors.
In the end, on the maybe random 10th pass using Gemini-3-Pro it provided a hint that was enough to find the issue.
This was a monumental fail and if game studios are using LLMs, good god, the future of buggy mess releases is only going to get worse.
I would summarize this experience as lots of amazement and new feature velocity. A little too loose with commits (too much entanglement to easily unwind later) and ultimately a negative experience.
A classic Agentic AI experience. 50% Amazing, 50% WTF.
2026: "If I can just write the specs so that the machine understands them it will write me code that works."
Like it not, as a friend observed, we are N months away a world where most engineers never looks at source code; and the spectrum of reasons one would want to will inexorably narrow.
It will never be zero.
But people who haven't yet typed a word of code never will.
his points about why he stopped using AI: these are the things us reluctant AI adopters have been saying since this all started.
Or how I would start spamming SQL scripts and randomly at some point nuke all my work (happened more than once)... luckily at least I had backups regularly but... yeah.
I'm sorry but no, LLMs can't replace software engineers.
It requires refactoring at scale, but GenAI is fast so hitting the same code 25 times isn’t a dealbreaker.
Eventually the refactoring is targeted at smaller and smaller bits until the entire project is in excellent shape.
I’m still working on Sharpee, an interactive fiction authoring platform, but it’s fairly well-baked at this point and 99% coded by Claude and 100% managed by me.
Sharpee is a complex system and a lot of the inner-workings (stdlib) were like coats of paint. It didn’t shine until it was refactored at least a dozen times.
It has over a thousand unit tests, which I’ve read through and refactored by hand in some cases.
The results speak for themselves.
https://sharpee.net/ https://github.com/chicagodave/sharpee/
It’s still in beta, but not far from release status.
Sharpee’s success is rooted in this and its recorded:
https://github.com/ChicagoDave/sharpee/tree/main/docs/archit...