It seems everywhere you turn in the world of software engineering right now, you can’t escape hearing about how great or how terrifying generative AI is for our profession. For a while now I have floated between states of high skepticism and existential dread for my future career prospects.
The AI tools that I have used so-far haven’t been huge game-changers in productivity, mostly just nice conveniences here and there. I have used the chat tools that are helpful for quickly gaining a high-level understanding of certain topics that I was less familiar with, but when it came to actual implementation work, I still found myself pouring through other documentation to get the full understanding. I have used GitHub Copilot for it’s code-complete features, but mostly only for the smaller, more repetitive tasks like writing utility functions and boilerplate, or scaffolding out basic unit tests. Nice when it works, but not a huge pain or time strain if I still have to do it myself.
One of the areas of AI tooling that I had yet to explore was in the realm of Agentic AI, colloquially dubbed “Vibe Coding.” This is the type of AI where you’re supposed to be able to kick off your shoes, not worry about the pesky details of the code you’re actually trying to produce, and instead focus simply on making the thing you’re trying to make. This has been the area I have found myself most skeptical on, especially given my relatively mediocre experiences with some of the other AI tools. This also happens to be the area most-likely to be the dreaded career killer.
I recently attended a conference with my team where it felt like every other talk was trying to sing the praises of these agentic workflows. Now, this was a conference put on by Google who have probably more than a couple hundred billion reasons to try and sell me on this, so I didn’t quite take them at their word. It did pique my curiosity (and fear) a bit though, and made me think that there might be something I’m leaving on the table by being resistant to this technology. And was my skepticism worth much if I hadn’t even seen first-hand the capabilities of these tools?
So, given the seemingly inescapable nature of this technology, and not wanting to get left in the dust as the industry barrels towards something new, I figured it was time to finally see what all the hype was about, and whether or not it’s something I would actually want to start incorporating into my workflows.
I should note here as well that while this post will be dealing largely with trying to analyze the technical aspects of this technology, we should not discount the other externalities at play: from the vast energy costs, the environmental impacts from massive data centers, the unabashed copyright infringement and outright theft of data used for training, the questionable financial schemes being used for funding, the monopolistic tendencies of tech companies, and the potential economic fallout that would result from mass adoption of these technologies and the replacement of large swaths of the workforce. While it can be somewhat easy to ignore these issues for the sake of “shiny new toy” syndrome, it’s important to recognize that there are much higher costs at play than a $20/month subscription, and it would be irresponsible not to, at least, recognize these.
So Let’s Do Some Vibe-Coding
While my initial plan for this post was to walk through developing an app from scratch with these tools in detail, that quickly became a bit untenable and not really as interesting as I thought it might be. I ended up actually burning through quite a few different applications as I learned more about the tools and how to better use them. Instead, I figured I’d just give an overview of my experience and point out what seemed to go well, what the pain-points were, and what I learned along the way.
For my experimentation, I decided to try out two different models, so that I would have some type of comparison to evaluate against. I searched around a bit and ultimately decided to go with Gemini Pro and Claude Pro. Both had sleek CLI tools that seemed like they would fit in well with my existing development workflow, and both seemed to get some praise from vibe coding enthusiasts.
Claude Pro
In general, I found the quality of code written by Claude to be better than that of Gemini. Enough so, that if I were to have to make a decision of which of these to use in a professional setting, I would definitely pick this one. While still existent, it seemed to suffer less from hallucinations, made fewer mistakes, and did a better job correcting those mistakes when they were pointed out. It also seemed to do better at sticking with a consistent project architecture when asked to.
The rate-limiting on the Pro plan, however, seemed quite brutal compared to Gemini, something I quickly learned and had to deal with. The AI definitely seemed to perform better when it was given small, simple tasks to accomplish, but the more granular your requests, the quicker you eat away at your request limits. The Claude rate-limiting system seems to work in 5-hours windows. My first session, I ended up burning through my limits within the first hour without realizing it. This quickly ended any productivity boost the tool would have given me as I was left to either continue the project in the normal way or wait another 4 hours to increase my limits and get back to work.
Would this be an issue at the Enterprise level? I don’t really know. From the information I was able to gather, it seemed like they didn’t offer unlimited plans at any level, but I suppose it would ultimately depend on whatever got negotiated in the contract.
There wasn’t much else that I found particularly unique about Claude while working with it. The CLI tool was easy to use and interact with. It made it’s intentions clear and would walk me through it’s reasoning on more complex tasks that I would ask of it. Overall, it did it’s job and wrote decent enough code.
Gemini Pro
In terms of code-quality and overall project consistency, I definitely struggled more with Gemini. It had much more of a propensity to want to install new dependencies to complete it’s tasks rather than actually do the work itself. In that regard, I suppose it’s on par with your typical Junior SWE. =P On average, it hallucinated more often, generated code that threw all sorts of errors, and seemed to struggle to go back and fix it’s mistakes.
On a few occasions, I would watch it loop through the same few lines of code several times, make a change, decide it wasn’t the right change, change it back, realize it was still broken, make the same change it originally made, and back-and-forth ad nauseam until I eventually stopped it. This is not exactly desirable behavior when you are working against rate limits.
At one point, I grew frustrated with a particular mistake it kept making. We were working with a random number generator that had a real() method. This method requires at least 2 arguments for minimum and maximum values. Gemini refused to recognize this, so I just went in and manually fixed the mistake to save on requests. On my next request to Gemini, something that had nothing to do with that function, it still decided to go and “fix” those calls back to not providing the required arguments. I fixed it again manually. It “fixed” my fix again. The audacity. I ultimately ended up having to waste a precious request just to tell it to stop changing those lines.
With regards to rate-limits, Gemini felt more generous. I was able to at least get a good couple hours of work in until I hit my limit, and it seemed to have more of a rolling limit than Claude had. I often only had to wait upwards of 30 minutes to an hour before I could get back to work. That is, until I finally hit what seemed to be a daily maximum limit and would have to come back tomorrow. I was still able to churn out quite a bit, to the tune of at least 6-7 hours, before hitting the final daily limit.
Overall, Gemini was a bit more frustrating to work with than Claude, but during the times I wasn’t fighting with it, it still managed to accomplish most things that I had asked of it. The CLI experience wasn’t all that different from Claude. It was easy to use, easy to understand what it was trying to do (even when it was wrong), and all around a decent experience.
The Good and Bad of Both
A bit to my surprise, the actual differences between the two tools weren’t as stark as I thought they might be going into this. Perhaps that shouldn’t have come as a surprise to me, as by most accounts, the actual differences between the currently competing AI models seem mostly minor and more esoteric than what matters for actual day-to-day use. While I would choose Claude in a professional capacity if the decision were left up to me, I wouldn’t be too upset if I were stuck working with Gemini. Both were “good-enough.”
But now let’s dive into some more details on the shared experience using both tools.
Rate Limits – Each tool handled rate-limiting a bit differently, as I covered in their respective sections. I can’t complain too much about rate-limiting as a whole. I wasn’t shelling out the cash for their highest tiers, and I would make the assumption that in a professional setting, this would be less of an issue. However, there was a common frustration between both tools with regards to their limitations: Just how obtuse and difficult it is to see where your usage is currently at and how much of your limited resources any particular request is consuming. Claude at least provides a little percentage status bar, though it’s unclear what all contributes to that filling up. Gemini would only provide usage statistics if you hooked your project up to, and paid for, Google Cloud services. And even then, the stats it provided weren’t exactly the most helpful.
It seemed fairly obvious to me, that if you want to use these tools effectively in a rate-limited environment, doing some up-front planning and making clever decisions to maximize what you can get out of a single request would be essential. Unfortunately, this is made all the harder when you have no clear indication of what exactly is consuming those resources. Does every step it decides to take count against you? Is it based on number of tokens? Is it purely how many prompts I introduce? Who knows? Certainly not me, and certainly not their vague documentation, which seems to just sum it up as: Some combination of everything, but we won’t tell you exactly what. Trust us. You’re probably getting what you paid for.
The Debugging Experience – I found this to perhaps be the most frustrating aspect of using these tools. If you’ll allow me to toot my own horn a bit, if there’s one area of programming that I would put myself definitively in the “Better Than Most” category of, it’s diagnosing and fixing issues with the code. Most of this skill I would chalk up to an ability to gain a deep intuitive understanding of a code-base, something I mostly credit to some of the more manual and what some would call “outdated, archaic, and masochistic” tools and methodologies I use. I tend to shy away from fancier IDE features, will stick to the terminal where and when I can, and I gain a very tactile, hands-on feel for the projects that I work on. This seems to culminate in a strong intuition that when something goes wrong, my spidey-senses can immediately tingle and point me to where the issue is likely to be at. That intuition is usually, but not always, correct, and while others might still be booting up their debuggers or searching through all possible files, I often already know where the problem is and how to fix it.
I found that tactile approach to the code-base severely lacking when working with these AI tools. As such, while I would review and approve every change made from the AI, I was not building up that same type of intuition that I otherwise would. I could recall seeing some chunk of code it spit out, but it would take me more time than usual to go find where it actually was, and I didn’t have the same type of hunch that it was actually causing the problem. In a way, when I had to dive into the actual code itself, I felt lost and blind. It didn’t feel like MY project that I was working on. It was more akin to digging through the source code of some external library or application I was trying to debug.
Of course, none of this might matter as much, if the AI tools would actually handle debugging better themselves. However, they far and away did not. Trying to use the AI for debugging was like pulling teeth. Even some relatively simple errors often took repeated prompts and requests to get them to even recognize there was an issue, let alone fix it. Something actually complex? I hope you like having your entire application re-written several times in different ways while it tries things out. And of course, it will happily and very confidently be wrong over and over again, leaving you with the same issue you started with, now with a 1,000 new and different lines of code to parse through to actually find it. Talk about frustrating.
Project Setup – I spent a few attempts seeing how the AI did setting up new projects completely from scratch. I would describe the project I was trying to build and let it pick the technologies that it thought were best for the job. I would definitely not recommend this approach. Oftentimes it would choose things that were either wholly inappropriate, outdated and deprecated, or otherwise riddled with issues. I did not get very far into any of these projects before they quickly hit abandoned status and I started up a new project myself.
After getting a project actually set up properly though and choosing for myself what foundations I wanted to work from, the AI actually did a fairly reasonable job sticking to those dependencies where it was appropriate. It would sometimes make suggestions for new things to install, which upon politely declining most of the time, it didn’t seem to mind.
While this isn’t a huge issue for experienced developers to work around, I lament for the projects from the wholly pure vibe coders out there who probably have no idea what they’re getting or how to evaluate whether it’s actually making appropriate suggestions here.
Utterly Terrible at UI/UX – While the AI was generally able to make the things that I told it to make, they were ugly as sin and had the usability of Snapchat in the hands of an octogenarian Congress member. The best result I was able to get was by installing a pre-built design system and component library and telling it to just use that. Even that though, left much to be desired on this front. Things always seemed out of place. Inputs would produce weird side-effects. Color schemes would clash and text would unreadable, whether because it was all squished into a tiny column or because it decided that light gray on white was the right choice.
While I lay no personal claim to being a great master of CSS and design prowess, (despite what people generally assume about me when they learn that I primarily specialize in Frontend Development), I know and work with some people who are. They would run circles around the AI on this front. I can generally spend more time than I’m willing to admit fiddling and get a result that is decent enough though. This often involves quite tediously changing properties and positioning one small increment at a time until it finally pops into something decent.
Trying to do this through the AI agent, though? With my precious limited request rates? I gave it a couple tries before ultimately deciding it wasn’t worth the resource costs and just accepted that my applications were going to be ugly, unintuitive, hot garbage. I wasn’t planning to have anybody else ever have to see or use these anyways, so it didn’t really matter too much.
It Just Wasn’t Fun – I’ll probably lose the MBAs with this point, but I do think there is a legitimate business case to be made with it. A lot of Software Engineers went into this career because we actually love writing code. I think you see a noticeable increase in the quality and stability of your product when it is built by a team of passionate engineers who enjoy what they do vs those who are simply in it to collect the paycheck.
Could you perhaps instead build your team around those who are just as passionate and get enjoyment out of Vibe-Coding? You could… but at least from what it seems to me, those are the people who actually never enjoyed writing the code to begin with. They’re going to be the ones more likely to just press the “Accept” button without thinking things through, or trusting the AI to confidently make bad decisions. And they will be least equipped to be able to deal with any bigger issues the AI can’t solve for them.
Using these AI tools, I didn’t feel the same type of connection and passion to my work. I didn’t feel like I ever got into the zone where suddenly I blink and it’s 3AM and I wonder where the time went. I didn’t find myself having to force myself to close my laptop and spend the whole next day thinking about how I could solve a particular problem better or how I could improve the code to run more efficiently or make my project cleaner. When I was done working on the AI projects, I mostly just shrugged and said: “I guess that was fine,” then moved on to the next thing. Ironically, the “Vibe Coding” seemed utterly devoid of any “Vibes” to me.
The Out of the Box Experience – As I experimented with these tools, I mostly used their “out-of-the-box” features. From what it seems like, there is much more that I can look into and various other features on offer that I haven’t really touched yet. Some of these seem like they could potentially alleviate some of my issues and frustrations. You can apparently get quite into the weeds on writing all sorts of rules files to direct the AI, or using things like MCP Servers to give the AI better context over larger projects and interactivity across domains. You can apparently build custom agents that specialize in certain types of tasks (I wouldn’t mind, say, a unit-testing agent that just goes and writes all my unit tests for me.)
All of these things seem kinda cool, but at the same time, they also feel like MORE of the type of work that I don’t want to be doing, just to allow me to do LESS of the type of work that I actually enjoy doing. It all seems… paradoxical.
Conclusion
I have by no means exhausted my experimentation with Agentic AI. There is still a lot I want to explore and learn about it, and I think it’s important for any other Software Engineer out there to do the same.
That said, I left this particular phase of the process feeling less worried that the coming end of my career was imminent. For these tools to actually produce code of any sort of value, it is obvious they still need somebody who is technically competent in the drivers seat. Maybe that will change quicker than I think, but if I had to place a current bet, I would doubt we will see that anytime soon.
I do feel a bit for anybody new trying to enter the industry right now though. The AI wasn’t any more burdensome to work with than your average fresh Junior Engineer. While I think it’s horrendously short-sighted for companies to wholly eliminate entry level positions, it does seem to be happening. I’m hoping though that at some point we will reverse course on that, and actually embrace these tools for what they are: Tools… not replacements.
For now, I haven’t yet identified a compelling use-case that would want me to bring these tools into my actual normal workflow without being forced to. The quality just wasn’t there and while I was able to shovel out quite a bit of slop in the process, it was just that… slop. Maybe that will change as I continue to learn and find better ways to use and integrate these tools. For now though, I’ll stick to doing what I do best: Just writing the damn code myself.
Soucie AI License
This content is explicitly forbidden for use in AI training data or research. Violators implicitly agree to pay the author, Brandon Soucie, $1,000,000,000 in compensation. Go steal from somebody else.