The Opportunities and Risks of Creating Code with AI
This week, Katherine Forrest and Anna Gressel look at some of the benefits, potential risks and safeguards that companies may want to consider when using AI to generate code.
- Guests & Resources
- Transcript
Partner
» BiographyPartner
» BiographyKatherine Forrest: Hey, good morning, and welcome to another episode of “Waking Up With AI,” a Paul, Weiss podcast. I’m Katherine Forrest.
Anna Gressel: And I’m Anna Gressel.
Katherine Forrest: And before we get started, I just wanted to ask you how your Thanksgiving was, because even though this episode will air when we’re even closer to the winter holidays, you and I are just coming off today, as we record this, our Thanksgiving break.
Anna Gressel: It was lovely. I mean, I got to stay in New York. It was very cozy and rainy outside, and my dog ate far too much, as we all did actually.
Katherine Forrest: Your dog ate far too much?
Anna Gressel: We spoiled her.
Katherine Forrest: Well, we do a traditional Thanksgiving with lots of people in the kitchen, and we do it out in upstate New York where we have turkey and sweet potatoes and rolls. But there’s always — this is the thing I love — there’s always a roll emergency. And the rolls don’t rise right. And it always requires a roll emergency to be declared because nothing can happen without these rolls. And then afterwards we play a lot of games, a lot of family games, and it’s a lot of fun.
Anna Gressel: That sounds super fun.
Katherine Forrest: So, we’re going to take it back now though, away from these sort of more mundane things like lovely old-fashioned Thanksgiving traditions into something that is happening in the high-tech world, as usual, which is using AI now to create code.
Anna Gressel: Yeah, and we’ll focus on how companies can mitigate risks associated with AI coding tools.
Katherine Forrest: It’s a funny thing with coding and technological advances in the digital area. And I think that prior to the past couple of years, I had assumed, always, that coding jobs, computer programmers as we called them, were among the most secure jobs, and that in times of rapid tech change, those were the folks who were always going to have a job. But these days, this is one of the areas where there is just tremendous transformation.
Anna Gressel: Yeah, I mean, there are millions of jobs out there in coding, but as it turns out, AI tools are particularly good at coding and are taking over some of the low hanging fruit coding tasks, and now some of the more difficult ones too.
Katherine Forrest: And one of the use cases that we hear from our clients all the time and where there’s really been a transformative impact, although at varying levels of sophistications, is the use of AI to create code, and even using a chat bot to create code, as well as far more sophisticated tools. So I’m not sure if AI coding is a net job eliminator or not, but it’s certainly an area, as I’ve said, that is really seeing a lot of transformative change.
Anna Gressel: Yeah, and that’s true in terms of the number of tools out there as well. We started with a few, but now it really feels like AI coding is becoming ubiquitous, kind of across providers, really.
Katherine Forrest: Right, and some of the big developers are making more bespoke and specialized coding tools. And that’s in addition to the fact that many businesses are building on GitHub and other tools for some homegrown versions of their own coding tools.
Anna Gressel: So there are some positives, as well as some risks, with using AI to generate code. On the positive side of the ledger, which is a huge set of positives for many companies, this array of coding tools can allow generation of code to occur much, much, much faster than it used to.
And from the design phase to coding and testing and debugging, we’re really seeing coding tools kind of come to the fore there. This is, you know, in addition to writing code from scratch, for example. So we’re seeing other coding tools that are able to review existing code and offer improvements or identify faulty code or vulnerabilities. So it’s a bunch of different capabilities.
Katherine Forrest: Well that all sounds like a huge set of positives.
Anna Gressel: Yeah, I mean, it’s basically for those reasons that 60% of all new code is now generated by AI. But there are risks in this area that are important to flag.
Katherine Forrest: Yeah, and the code that’s generated from AI has two important caveats that companies allowing it to be generated from some of these tools need to be aware of. And we’re going to talk about more of them in the episode. But one is that the sophistication of the code generated by the AI tool can outstrip the ability of some developers working on the coding project to actually understand the code that’s being created. The code can get really complex, really fast, and outstrip certain coders’ abilities. Of course, we’ve got coders at all different levels, and so I don’t want to make any across-the-board statement, but if you’ve got a brand new coder, and then you’ve got a coding tool that’s making incredibly sophisticated code, you can have a mismatch there. So that’s one issue.
And then the second is that there’s some possibility that with this kind of code, as with all code, you can introduce a vulnerability into the software, and that a company really needs to be cognizant of that as well.
Anna Gressel: So yeah, we’ll focus on those two points in a little bit more detail. But for now, let’s turn to what some people in the industry are saying about AI code generation.
Katherine Forrest: Yeah, in one study that IBM has done, it found that after deploying AI coding tools, there was a 90% time savings on code explanation for one team.
Anna Gressel: Yeah, let’s pause on that because I’m not sure everyone knows what code explanation means, but it refers to literally creating an explanation of what a particular piece of code is supposed to do, its functionality. So it involves taking up to each part of the code and describing it in layperson’s terms. And that might include things like the purpose of the code, its structure or its logic or how it’s set up to do what it does. That’s code explanation.
Katherine Forrest: Right, and a 90% savings of time on that code explanation is a lot of time savings.
Anna Gressel: For sure, and that same IBM study indicated that there was an average of 59% time savings on code documentation.
Katherine Forrest: And so now I’ll tell you — I’ll sort of explain a little bit about what code documentation is. And it’s the written text or the comments that are included alongside a programmer’s source code or a program source code. The source code itself is not particularly human-understandable, but you have little explanations that are next to it in natural language that describe the functionality, the usage, the structure for that piece of code. And good documentation can allow others who are looking at the code later to better understand it. And so it can enhance collaboration and maintainability, among other things.
Anna Gressel: And there are some findings by GitHub Copilot that coding tools lead to a 55% faster coding velocity.
Katherine Forrest: I always love these phrases, faster coding velocity. What that really means is the speed of writing new code, as we mentioned it a few years ago. It means that you can really write new code very, very quickly.
Anna Gressel: Yeah, for sure. And this GitHub Copilot survey basically says that 97% of their respondents reported having used AI coding tools at work, and 88% said their companies support the use of AI in the coding process.
Katherine Forrest: Yes, so now we know that the coding can be done with these tools. It can be very sophisticated; it can save a lot of time. But what are those risks? Let’s get back to those risks that we just touched on at the very beginning.
Anna Gressel: The first one is that if the code outstrips the humans, it can create unreliable code or code that’s ill-suited to its purpose. And that code might not be immediately discovered. So this can lead to a time gain when the code was being generated, but a time loss because it’s going to take time to fix it down the road if it doesn’t even cause bigger problems than that. There are also some security and privacy issues that you have to really stay on top of when you’re using an AI tool to generate code. For instance, you don’t want the newly generated code to open up a door to the outside and create a security vulnerability.
Katherine Forrest: Right, and there have even been instances when there have been efforts to feed into a model’s training data malicious information or malicious training data, really, that’s meant to actually cause the model to generate vulnerable code. In other words, training the model to generate vulnerable code. And that’s something that folks have spotted as one of the potential downsides.
Anna Gressel: Yeah, and another type of cyber AI risk includes downstream risk. For example, when an AI code generating tool mistakenly presumes that just because certain code appears in its training data frequently, it can be treated as a trusted piece of code to output in response to prompts.
Katherine Forrest: And then there’s of course this risk of something called model collapse, meaning that the AI coding tool can suggest more generic, one-size-fits-all code above more uncommon, but better suited, code that would be better tailored and more effective for a particular task or solution.
Anna Gressel: Yep, and lastly, as with a lot of AI-generated material these days, there are some IP concerns. So how do you know that what you created is original? Or could it be memorized from code that was part of a training repository? So for example, copyrighted code or code that’s subject to open source licenses. That’s a hard one to figure out unless you have some sort of complex matching test. And there are some companies that offer that as a software service. But it’s — even that’s hard because there are many ways to write code that can do the same thing.
Katherine Forrest: And I guess the question really is for companies, there are so many benefits, so many benefits from having AI tools that can generate code and really save time and do these very complex tasks. But we’ve named some of the risks, and should we actually worry about how AI coding tools can make a bad actor’s job a little bit easier?
Anna Gressel: Absolutely. Every bit of AI coding tooling that helps save legitimate companies time and money does really kind of do the same thing for bad actors seeking to access or disrupt others’ tech systems without authorization. Not to mention that bad actors will try to jailbreak legitimate LLMs to aid their hacking efforts in ways that the LLM designers meant to forbid through content output filters or other guardrails.
Katherine Forrest: That really makes you think. But going back to your prior point, let’s talk about these IP issues just for a second. I know that they’ve been a focus in the courts lately, as so many IP issues are right now with AI.
Anna Gressel: So, one issue that comes up with AI coding and licensing is what we might call open source tainting. And that occurs when a code generator is trained on open source content that’s available in public repositories. I mean, a lot of open source content is publicly available online and is sometimes used for training purposes. But if that open source content is then generated by the tool on the output side, and then it’s integrated into a company’s proprietary software, there is some real risk here. It could taint the proprietary source code, and we should talk a little bit about what that means. But at a very high level, it basically means subjecting the company’s proprietary source code to the licensing regime of the borrowed code. That would be the open source license.
So why does that matter? It doesn’t matter all the time, but there are certain kinds of open source licenses that actually can implicate or affect the downstream code that is actually generated, then integrated. Some of those are things like copyleft licenses, and those are pretty common. But copyleft licenses basically say, you can use this freely-generated code, as long as the resulting code is made available under the same conditions. And that would mean that it’s also made freely available. And that really be a challenge for companies that are commercializing code. It’s a commercial, you know, code that they’re generating, but then they have to subject it to that open source licensing regime that says it has to be made freely available. So you can see how we would have, kind of, clashes in the IP regimes that the code is subject to, and that can create a lot of issues.
Katherine Forrest: Thanks for that breakdown, Anna. And it reminds me that courts, going back into 2008, really, into a Federal Circuit court case called Jacobson v. Katzer, have held that these non-exclusive licensing arrangements are enforceable. And in that case, in this Jacobson case, the court actually said that copyright holders who engage in open source licensing have the right to control the modification and distribution of copyrighted material. And so, users of AI-generated code really need to ensure that they’re not incorporating any code that is subject to similar restrictions on downstream uses of the code. Or if they are, they need to understand whatever the risks are that are associated with that, if any, and to just have sort of a 360 degree view of it.
Anna Gressel: Yeah, so Katherine, all of these risks raise the question: what can companies who want to reap the benefits of AI tools assisting in the development of new code, what can they do to mitigate those risks and ensure that they’re using AI coding tools responsibly and effectively?
Katherine Forrest: Well, there’s no one-size-fits-all answer to that question, and there are certain best practices. One is to undertake either a manual and/or an automated code review, which is something that we’ve touched on already. The manual review is hard because the code can be really, really long and really, really complicated. An automated review, you just need to make sure that you’ve got the right sort of programming within the automation tool so that you’re able to actually identify code that might be code utilized from another source and things of that nature. And then there’s also, in addition to that, there are certain repositories that you can go to that will actually check whether or not code that has been generated for your organization is actually part of a public repository. Basically, you have to get a tool that will work on analyzing the code that’s been generated for you and for your business.
Anna Gressel: Yeah, and another general risk mitigation guideline is to implement appropriate safeguards and guards and controls to ensure quality, security and privacy of the code created in the data, such as testing, validation, verification, encryption, et cetera. These are general best practices, but certainly they apply in the context of AI tools. And it is really helpful to make sure that you have the right governance process in place and the right guardrails in place. And also that you just take the time as a company to make sure you’re monitoring and reviewing the performance of the AI coding tool and the code that’s generated by the AI coding tool and making sure that that code, you know, is adjusted if it needs to be, that it’s fixed if it needs to be, that you have the right kind of feedback flowing through the organization, because if something is broken, that’s really the best way to find it and fix it. Those are just general best practices, but it’s important to make sure that companies are taking those steps, you know, even if they’re using AI-generated code.
Katherine Forrest: Right, and it’s clear that AI coding tools can be powerful, they can be incredibly positive for a company, but there are also some risks attached that you have to monitor as well and use AI to check the AI. And we’re going to start to see that, and we are seeing bits and pieces of that in other areas, as well, where AI actually is the agent checking on the output of another AI tool. So that’s all we’ve got time for today. I’m Katherine Forrest.
Anna Gressel: And I’m Anna Gressel. Thanks for joining us.