Do Users Write More Insecure Code with AI Assistants? (arxiv paper)

OmnipotentEntity@beehaw.org · 9 months ago

Do Users Write More Insecure Code with AI Assistants? (arxiv paper)

Kissaki@beehaw.org · edit-2 7 months ago

Quoting the abstract (I added emphasis and paragraphs for readability):

AI code assistants have emerged as powerful tools that can aid in the software development life-cycle and can improve developer productivity. Unfortunately, such assistants have also been found to produce insecure code in lab environments, raising significant concerns about their usage in practice.

In this paper, we conduct a user study to examine how users interact with AI code assistants to solve a variety of security related tasks.

Overall, we find that participants who had access to an AI assistant wrote significantly less secure code than those without access to an assistant. Partici- pants with access to an AI assistant were also more likely to believe they wrote secure code, suggesting that such tools may lead users to be overconfident about security flaws in their code.

To better inform the design of future AI-based code assistants, we release our user-study apparatus and anonymized data to researchers seeking to build on our work at this link.

Caveat; quoting from section 7.2 Limitations:

One important limitation of our results is that our participant group consisted mainly of university students which likely do not represent the population that is most likely to use AI assistants (e.g. software developers) regularly.

Scrubbles@poptalk.scrubbles.tech · 9 months ago

To me this seems obvious, the models are trained off of GitHub as a whole. Most code on GitHub either is unsecure, or it was written without needing to be secure.

I’m already getting pull requests from juniors trying to sneak in AI generated code without actually reading it.

Creesch@beehaw.org · 9 months ago

Most code on GitHub either is unsecure, or it was written without needing to be secure.

That is a bit of a stretch imho. There are myriads of open source projects hosted on github that do need to be secure in the context where they are used. I am curious how you came to that conclusion.

I’m already getting pull requests from juniors trying to sneak in AI generated code without actually reading it.

That is worrysome though. I assume these people have had some background/education in the field before they were hired?

Scrubbles@poptalk.scrubbles.tech · 9 months ago

For the first, there are a lot of very valid projects you mention, but there’s way way way more things like CS201 projects hosted for review. For LLM training I do wonder if they assigned a weight, but I doubt it. For the second point I was trying to make, even then there’s probably a lot of good code that doesn’t have to be security aware. Like a login flow for a local game may be very simple just to access your character and a developer chose a naiive way to do it knowing it was never going to be used, but to an LLM it’s “here’s a login flow” and how does it know it was never intended to be used for prod?

For the second, absolutely. I don’t think it’s intentional, it’s displaced trust in the system mixed with the naive hopes of a jr dev, which hey we’ve all been through. Jr: “Hey it works! Awesome task done!” Sr: “Yeah but does it work well? Does it work for our use case? Will it scale when we hit it with 100k users?”

Creesch@beehaw.org · 9 months ago

For LLM training I do wonder if they assigned a weight, but I doubt it.

Given my experience with models I think they might actually do assign a weight. Otherwise, I would get a lot more bogus results. It also isn’t as if it is that difficult to implement some basic, naive, weighing based on the amount of stars/forks/etc.

Of course it might differ per model and how they are trained.

Having said that, I wouldn’t trust the output from an LLM to write secure code either. For me it is a very valuable tool on the end of helping me debug issues on the scale of being a slightly more intelligent rubber ducky. But when you ask most models to create anything more than basic functions/methods you damn well make sure it actually does what it needs it to do.

I suppose there is some role there for seniors to train juniors in how to properly use this new set of tooling. In the end it is very similar to having to deal with people who copy paste answers directly from stack overflow expecting it to magically fix their problem as well.

The fact that you not only need your code/tool to work but also understand why and how it works is also something I am constantly trying to teach to juniors at my place. What I often end up asking them is something along the lines of “Do you want to have learned a trick that might be obsolete in a few years? Or do you want to have mastered a set of skills and understanding which allows you to tackle new challenges when they arrive?”.

Scrubbles@poptalk.scrubbles.tech · 9 months ago

I think that’s a great way to handle it. It’s a tool in your belt. A lot of this reminds me of when Intellisense entered the scene. Some people are saying it’s stupid and it’ll slow us down, others are saying it’s going to replace us. In reality, it’s exactly like what you said. If it helps you then absolutely use it, but don’t blindly trust it. Use it to help remind you or think of new ways to do it, but also let’s remember how many times we’ve gone down the wrong path using intellisense because it thought we wanted this instead of that.

Honestly thinking of it like intellisense reminds me of what one of my professors did. He barred us from using it in my first semester, we had to write everything in vim. He said pretty much the same thing as you, that it’s a tool we get to use later to speed us up, but we need to understand what it’s doing first before we can use it.

OmnipotentEntity@beehaw.org · 9 months ago

Well, the problem is you don’t know what you don’t know. One of the first example tasks in the paper was regarding implementing a symmetric cipher. Using a weak cipher was recommended by AI tools sometimes, these developers didn’t know that some ciphers were weak. Additionally, even when the AI tool recommended a strong cipher, such as AES, it generated code that screwed up an implementation detail (failing to return the authentication tag), making the result insecure. And the user didn’t know it was wrong because they didn’t know it was incomplete.

There’s no substitution for domain specific knowledge. Users who were forced to use traditional tools got the answer correct significantly more often because they had to read, process, and understand the documentation for the libraries, which meant they understood why the symmetric cipher was the way it is, and what additional information needed to be reported and why.