I’ve been on the hype train before. (Flashback to the early days of Blockchain valuations)

I’ve been a (mostly) quiet skeptic about the use of AI – specifically LLMs – for code generation. Anyone who has been writing software for a while will tell you that copying someone else’s code is usually a recipe for disaster. Blind copy and paste from StackOverflow is so prevalent and terrible that we create memes about it and mock those who engage in such practices.

So, with that as context, how can an LLM – whose function is to statistically predict the next word/phrase – be trusted to write code on my behalf? The reality is, of course, that we’re not just talking about the LLM itself – we’re talking about multiple transformers, chaining prompts, RAG, and a number of other techniques that are transparently delivered in a chat window. It looks like magic…

The Platforms

I decided to ride that hype train a little bit and find out for myself. Below are my findings, specifically with respect to software engineering – not “make this email sound smarter” kinds of requests…

The Chat Interfaces

Like most, I started experimenting with the easy path: ChatGPT, Claude, Gemini, etc… These were really great at occasionally giving great code examples, but often lacked the context of my larger codebase, patterns, and styles. The worst part, though, was that they rarely had up-to-date API information or knowledge of new libraries. This was aided a little with Retrieval Augmented Generation (RAG), but it was still a bit limited. The bottom line for me: This was a really great alternative to reading long documents and code stubs but fell short of reliably generating usable code for any modern codebase.

Code Completion in the IDE 

Next up was implementations of those chat interfaces with an IDE. The most notable of these was GitHub CoPilot. This was a bit of a game changer for me in a couple ways. First, code completion takes the current file (sometimes other files) into account with its recommendations. I found that GitHub CoPilot was really great at generating the next steps of my code and was shockingly) usually right! The scope was still limited to single files and inline code completions, though, meaning that more complex or in-depth generations were just not possible. However, I’ll say this: typing “if” and hitting “tab” to generate a complex 15-line conditional is pretty nice…

Embedded AI – Beyond Simple Code Completion

Cursor Composer Example

I decided to check out Cursor against my OpenGRC codebase (to start) to see what I could do. Code completion and in-IDE chat is still available, however Cursor goes a LOT further. Cursor has a “composer” window that allows you to drag other files into the context of your code generation to improve the quality and relevance of the generated code. Cursor also puts together more of a complete solution in the form of multiple files that can be reviewed/accepted/rejected. For example, a Cursor prompt of something like “Create a new functionality called Risk using @Standard.php as a model. Create CRUD controllers, migrations, seeds, and configurations to support” would generate 4-6 files for you to review, test, and accept. (Note: This works…shockingly well).

Cursor is not the only player here. Devin represents another huge advancement in this space. Devin’s workflow is a lot different than I’m used to, and I’m not a huge fan of that, but you can’t deny the results. In short, rather than prompting a chat window in an IDE you prompt Devin in Slack by providing a descriptive software change. In response, Devin will check out the code, make modifications, test in a virtual environment, and when it’s done, it can create a Pull Request in your repository for review. This makes the normal iterative methods of chat-response more complicated, but the results are similar to that of a Junior/Mid-level developer – provided that your software spec input is sufficient. There are still errors, but that’s not new for any kind of software development.

My Big Test

One of the big items on my backlog was adding SSO to OpenGRC. To set the stage, OpenGRC is currently using Laravel 11, Filament 3, Tailwind CSS, and alllll the dependencies involved in each of these frameworks. In almost every case, the current versions of these frameworks/libraries have not been added to the knowledge in the mainstream LLMs. Within OpenGRC, I have some non-standard implementations in-place for usability, as well. For example, I don’t want a user to edit a .env file on the server to change or add a an SSO provider – I want to use my configuration screens for that (which are designed using Filament tab widgets).

Cut to the Chase: I used simple composer install commands to install Laravel Socialite and the provider for AzureAD. Using Cursor Composer, I prompted something similar to “Create an SSO implementation using ‘socialiteproviders/microsoft-azure’ with all setting created by the end user in @settings.php in a new tab called Authentication”. Not only did it work, but it worked perfectly! Afterwards I made some UI Enhancements that matched the feel I was going for. Adding new Providers was literally as easy as prompting “add a new provider for Okta using the AzureAD implementation as a model”. This is something that would have taken me a couple hours to fully implement – Cursor did it in a couple minutes!

The Catches

  1. Generated code is still your work. Regardless of your prompts, you still need to review all your code. While that may seem obvious, it’s very easy to fall into the trap of just accepting code changes without a good review – or so I’ve heard… ahem…. In a couple cases, I had cursor implement breaking changes that I should have caught. This is also a strong case for looking at your CI/CD for linters, style formatters, etc…
  2. The textbook isn’t always right. As software engineers, we sometimes decided NOT to use certain patterns because of the complexity they introduce. Even though the textbook says to use XYZ pattern, we choose not to for various reasons. Left to its own choices, the AI tools will typically choose the textbook pattern, which may not be what you want. Prompting based on an existing pattern in your code is really helpful!
  3. Working doesn’t mean right. Sometimes the AI platform will create working code that simply isn’t implemented correctly and can introduce bigger problems. For example, in one of my Cursor tests, I asked Composer to allow me to upload private files. It did a fantastic job, but unfortunately put those files in a public folder (and did it the wrong way, at that). In production, that could be catastrophic.

You Are Here…

I’m a skeptic of all hype. It’s who I am! When it sounds too good to be true, it usually is! In the case of AI Code Generation, I think much of that hype is well-warranted. I have LOTS of security and ethical questions around all of this, but you cannot – and WILL NOT – avoid the use of AI in code generation for much longer.

We are beyond “cautiously optimistic” at this point. It’s time to determine how this all fits into your software engineering processes.

Share This