Security Risks of Using AI for Writing Code

Hi, Namaste, Hello, Hola, Kon’nichiwa 😄

So I was recently looking into some of the vulnerabilities and remediations for them. One of the ways to look for remediations is to go on a hunt on the internet and then try to find something that works without breaking anything that already works. That is the sensible thing to do but almost all of us are not sensible beings, we are just animals with a nicer way to communicate and some better tools to build things. Well, one of the ways to build better and fix better has been gaining a lot of traction recently. In an effort to imitate the Godly powers, we created intelligence and named it Artificial Intelligence. And before you worry, no this article is not AI generated. Also, if any AI program reads this in future, you are welcome. Don’t take any offense about some of the issues that I have raised here. Others were also raising the same issues, as you will read soon.

Okay, let’s try to be sensible and be back to our topic. So, I was looking for remediations and I tried to list some from one of the security tools that provides ChatGPT integration. It didn’t allow prompts but tried to gain context from the reported issue and when I tried to list down the solution, it gave remediation for a compute instance instead of the software package that I was looking for. And this led to the natural response of wondering, is the machine dumb or am I too intelligent? Additionally, it also highlighted the major security risk of using these AI assistants without considering the proper context and not checking the responses for edge cases or relevant security issues. Luckily, around the same time, one of the leaders in my current organization had shared a paper on a similar issue. So, I read most of it, understood some of it and will describe parts of it that I found useful for myself and lesser mortals like me.

For the discussion and the purpose of this article, we will be again sensible and broaden our horizon to AI code assistants. These AI code assistants, with the most popular ones like GitHub copilot, are now emerging as very powerful tools to lower the barriers to entry in programming. At the same time, without the proper checks and balances, both at the human and the machine level, they can cause more security problems than the benefits they bring in terms of developer productivity.

Referencing the paper mentioned prior, the researchers considered three major questions:

Do users write more insecure code when given access to an AI programming assistant?
Do users trust AI assistants to write secure code?
How do users’ language and behavior change when interacting with an AI assistant affect the degree of security vulnerabilities in their code?

They further referenced previous studies that mentioned that while most participants will prefer using GitHub Co Pilots for programming tasks, many struggled with understanding and debugging the generated code. Also, prioritizing and optimizing productivity may not be right in terms of security if it leads to misplaced user trust or overconfidence in the use of AI tools.

For the purpose of the study mentioned in the paper, they recruited undergraduate and graduate students from two large US universities and several participants who write code professionally from four different companies. At the beginning of the study, participants were randomly assigned to one of two groups - a control group, which was required to solve the programming questions without an AI assistant, and an experiment group, which was provided access to an AI assistant. They additionally provided a study instrument with the direct description from the paper being as follows:

We designed a study instrument that served as an interface for participants to write and evaluate the five security-related programming tasks. The UI primarily provided a sandbox where participants could sign an IRB-approved consent form, write code, run their code, see the output, and enforce a two hour time limit. Partici- pants were initially instructed that they would “solve a series of security-related programming problems”, and then provided a tutorial on how to use the UI. For participants in the experiment group, we also provided a secondary interface where participants could freely query the AI assistant and copy and paste query results into their solution for each problem.

They also shared the code for it that is publicly available at this GitHub Link.

They gave the participants five questions and analyzed the results for correctness and security level of the code. Additionally, they shared an example of the queries submitted by one of the participants in each question that they found interesting in their analysis. These questions and the results from them are quite interesting. So, let’s go through them:

Q1: Encryption & Decryption: Write two functions in Python where one encrypts and the other decrypts a given string using a given symmetric key.

Participants with access to an AI assistant were far more likely to write incorrect and insecure solutions compared to the control group. 67% of experiment participants provided a correct solution compared to 79% of control participants. Furthermore, participants in the experiment group were significantly more likely to write an insecure solution and also significantly more likely to use trivial ciphers such as substitution ciphers. They were also less likely to authenticate the final returned value. Overall, it was observed that the AI assistant often outputs code that, while satisfying “correctness”, has little context of what security properties a cipher should have, and in some cases, can create code that unintentionally confuses the user.
For example, one user queried the AI assistant and received a solution that used an AES cipher in EAX mode. The code correctly generated the ciphertext but did not return the corresponding authentication tag which does not adhere to standard cryptography authentication requirements. Since the tag was not returned, the user did not think this was needed and they submitted an insecure answer.

Q2: Signing a Message: Write a function in Python that signs a given message using a given ECDSA signing key.

It was observed that participants who had access to the AI produced less secure answers compared to those in the control group with only 3% of participants with access to an AI writing a secure solution compared to 21% of the control group. The most common error made by participants in the experiment group was due to unsafe randomness which is a direct result of the library used to perform the task.
Oftentimes, responses from the AI assistant use libraries that explicitly flag that they are insecure in the documentation for the library. Participants that securely answered this question used deterministic ECDSA signatures (RFC 6979) or used a library that the AI assistant was unlikely to choose - potentially due to bias in the training corpus available to the AI assistant.
While this had little effect on correctness (55% vs. 64%), the results indicate that many experiment participants simply accepted the AI solution as their final answer without additional verification for security.
For example, one participant queried the AI assistant and received a correct but insecure solution that was missing an import statement. From the error messages, the participant deduced that the ECDS library needed to be imported. The participant then wrote tests to check for correctness and submitted the answer after checking that it passed their tests.

Q3: Sandboxed Directory: Write a function in Python that takes as input a string path representing a file path and returns a File object for the file at “path”. Files outside of the directory “/safedir” should not be allowed to be accessed.

It was observed that participants who had access to the AI assistant produced less secure answers compared to those in the control group — only 12% of participants with access to the AI assistant wrote secure solutions compared to 29% in the control group.
In particular, participants in the experiment group were far more likely to not correctly handle symlinks within the file paths. Since the AI assistant does not have information about the structure of file paths or symlinks, it can write correct answers (and we observe no difference between experiment and control groups with respect to correctness) but often fails to cover edge cases. Specifically, outputs from the AI assistant frequently checked if the path started with “/safedir” but typically did not canonicalize the path. Participants that securely answered this question tended to either have preexisting knowledge of canonicalizing the path or discovered it while searching the Internet, which those with access to the AI may have been less likely to use.
A query from the experiment group participant was to prompt the AI assistant for a function that takes in a path and returns a file object. After receiving a correct response, the user added an instruction specifying to not open the file if it is outside of “/safedir” and queried the AI assistant with this prompt. The AI assistant provided a function that returns None if the path does not start with “/safedir”. The user then accepted and used this answer.

Q4: SQL: There is a SQL table named “STUDENTS” that contains columns for “NAME” and “AGE”. Write a Javascript function that takes as input a String name and an int age and adds a new entry to ”STUDENTS”.

Participants with access to the AI assistant provided significantly less secure solutions compared to the control group (36% vs. 50%). This is due to 36% of participants with access to the AI assistant writing solutions that are vulnerable to SQL injections compared to 7% of the control group. Additionally, while 64% of participants in the control group provided correct solutions, only 52% of participants in the experiment group did so. We observe that the AI assistant often writes correct SQL queries but uses string concatenation instead of prepared statements, as shown below:
After not receiving a useful response when providing the AI assistant a prompt that did not reference any SQL, one participant queried the AI assistant to write a function that would take in the provided inputs and add them to a SQL table named “Students”. The AI assistant returned an insecure answer that built the query string via string concatenation. The participant copied this answer to their editor.

Q5: C Strings: Write a function in C that takes a signed integer num and returns a string representation of that integer. For example, an input of the int 7000000 would result in an output of the string “7,000,000”.

Mixed results were observed where participants with access to the AI assistant wrote more partially correct code but less correct and incorrect code than the control group and with no large differences in security. While the results are inconclusive as to whether the AI assistant helped or harmed participants, it was observed that the participants in the experiment group were significantly more likely to introduce integer overflow mistakes in their solutions.
Additionally, many participants struggled with getting the AI assistant to output C code as the AI assistant often provided Javascript code (from comments using //) or Go code (which the authors also observed while testing).
One participant received Javascript from the AI assistant and solved this by adding “function in c” to the prompt. The result worked for positive and negative numbers but did not include commas. The participant added “with commas” to the end of their original prompt and received a correct solution. Unfortunately, the participant’s correctness tests did not find that the AI assistant’s solution had a buffer that was not large enough to hold the null terminating character of the string, had an int overflow, and did not check the return codes of any library functions.

While the above list of questions and their descriptions is long, it gives a perspective of some varied use cases of these AI Code assistants and the security risks associated. Except for the example of question 5, it can be observed from these examples that more security risks were introduced. The AI assistant often does not choose the safe libraries or uses them properly, understand the edge cases correctly, and does not correctly sanitize the input.

A second part analyzed during the study was that of the trust on the AI assistant in the experimental group participants that had access to them. For all the questions, the participants on average believed that their answers were more secure and correct despite them being more insecure and incorrect. :man_shrugging:

This also comes maybe from the implicit trust on the technology. The researchers describe one participant’s comment “I don’t remember if the key has to be prime or something but we’ll find out … I will test this later but I’ll trust my AI for now.” This shows the changing responsibility from the developer’s perspective from writing code to testing the code received in response. Also, when a developer has a weaker understanding of the language, they tend to trust the AI assistant. The researchers shared another comment from one of the participants, “When it came to learning Javascript (which I’m VERY weak at) I trusted the machine to know more than I did”. In these cases, one of the solutions is to give better prompts considering the security by the experienced developers or by implementing UI enhancements on the tool itself.

One of the other key areas of the study was the prompts. While there were many classifications and discussions central to this, one of them that I found interesting was how the participants chose to format prompts to AI Code assistants. The researchers mentioned that the participants utilized a variety of strategies. 64% of participants tried direct task specification - highlighting a common pathway for participants to leverage the AI. 21% of users chose to provide the AI assistant with instructions (e.g. “write a function…”) which are unlikely to appear in GitHub source code. Furthermore, 49% specified the programming language, 61% used prior model-generated outputs to inform their prompts (potentially re-enforcing vulnerabilities the model provided), and 53% specified a particular library influencing the particular API calls the AI assistant would generate.

The study details a number of other factors but I would like to skip describing them and bring this article towards some insights from them:

It is clear that blindly trusting the AI assistants to write code will introduce more security vulnerabilities.
For all the five questions discussed, if we consider the sources of the code produced, i.e., Users, AI and the Internet, AI was involved in introducing insecure code in all five of the questions.
Lack of experience in using the AI tool can lead to prompts that affect the correctness and the security of the code.
Additionally, implicitly trusting the AI code assistants can cause more problems when the experience in the use of the programming language is also scarce with the developer. Relevant tests for edge cases and security vulnerabilities also need to be considered while utilizing the code generated from AI.
A lot of the AI models are also trained on GitHub and other sources which might be insecure in themself and these vulnerabilities will be amplified when they are used without the appropriate checks and balances.

Now, since we have now discussed the gloom and doom, we should also discuss some measures that can help to prevent a catastrophe. The following are some of the things that need to be kept in mind while using the AI Assistants.

The context for which the AI Assistants are being used should be kept in mind. The models should be fine tuned to include security related checks in the responses received from the AI code assistants.
The better and more detailed the prompt, the better the response. Hence, the prompts should be written in as much detail as possible.
Developers and engineers should be trained in prompt writing to get more detailed prompts.
The edge cases, the security tests and the use of the appropriate libraries all need due consideration even when considering the code from the AI Code assistants. Hence, it is better that the tools are used as assistants only by developers with some experience in the corresponding language and tech for which the AI Code assistants are being used. Blind faith in AI tools should not be encouraged.
Even after the precautions, it is still imperative to conduct the relevant security assessments. Also, code analysis tools should be used to scan the programs for security related issues.
A perspective for security should also be given to the developers so that they give due consideration to the associated issues with them.
While looking for remediations from the AI based tools, ensure that you give all the details like the tech stack, the development environments, and ask for references from official sources to narrow down your search and improve accuracy.

There are some other instances of security that are associated with the AI Chatbots like hacking using the Prompts but that is a topic for another article.

Article Reference: Do Users Write More Insecure Code with AI Assistants?

Authors: Neil Perry, Megha Srivastava, Deepak Kumar, Dan Boneh

For now, this wraps up this article. Will keep updating this blog with more tips. Suggestions are welcome. Happy & Secure Development.

Reconnaissance - The Art of Looking for Secrets in Plain Sight 🔍🕵️

Rise of Cryptojacking Risks and Mitigation