AI, Security & Software Governance: How to Stay in Control Of AI Software Development
Dave Erickson 0:00
IHuman developers have enough trouble dealing with cybersecurity issues when writing code, so won't AI have the same issues writing secure code? On this ScreamingBox podcast, we're going to discuss AI security and preventing software chaos, using robots to make software better. Please like our podcast and subscribe to our channel to get notified when the next podcast is released.
Dave Erickson 0:45
When many people think of cybersecurity and AI, they think of hackers using AI to steal their data. But what about using AI to defend your systems? Welcome to the screambox technology and business rundown podcast. In this podcast I, Dave Erickson, and my very human co-host, Botond Seres, are going to hack our way to understanding AI and cybersecurity from a development perspective with Nir Valtman, co-founder and CEO of Arnica. NIR is a security executive whose work is focused on making software security seamless for developers, rather than a bottleneck. Before founding Arnica, he led security strategies at Finastra, NCR and cabbage, and he is a regular speaker at Black Hat, DEF CON and RSA cyber security events with seven patents and several open source projects to his name near brings first hand experience bridging enterprise needs and developer driven security NIRS company Arnica is breaking new ground by embedding security directly into developer workflows, including embedding for AI generated code near welcome to the podcast.
Nir Valtman 1:51
Hey Dave and Botond, and thanks for the invite.
Dave Erickson 1:55
To begin with, how did you start your technology journey and how did you get to, drawn into cybersecurity in the first place?
Nir Valtman 2:02
Well, that's an interesting piece. So when I was 13, my parents actually sent me to a Visual Basic course, and I like that really. But at some point it just got a little bit boring, and we had a huge lab, and we shared the games between each other. So I decided to write a piece of code that actually deletes a few interesting files from the operating system, you know, bootloaders and so on, because I got bored. And obviously, they, maybe not, obviously, but they kicked me out of this course after a few times and and I said, Oh, you know, that's interesting. So I like to do that. And, and since then, you know, I, I've been a script giddie,the things that I didn't know that are, you know, legal or illegal. And you know, when I went to the, to the Israeli Air Force back then, when I you know, when you grow up, I ended up learning how to do it more professionally. And this is pretty much how my career started.
Botond Seres 3:11
That's great. Started really early. And about our topic today, I was wondering if you could tell us a bit about, like, the big picture framing of things Nir, specifically, when we talk about governing software in the AI era, what actually changes compared to traditional software governance?
Nir Valtman 3:40
Oh, wow, there's a lot. So if you think about the traditional software governance, you know, every engineering team has its own development lifecycle, essentially. They have their own process. Some of them will have one peer review. Some of them will now not have any peer review, and some of them will require two people. And it's a must have in certain areas to deploy code. And not only that, they will manage tickets differently, with different details in the tickets, different tags, different workflows, and even the release life cycle will look different. Now, some companies or some teams will have a maybe one type of a PRD, and other teams will maybe have just a ticket that says, just build the damn thing and go for it, right? But in the, in the AI era, it makes certain things scale to the extent that you can't really even see how every team operates. So think about it this way. There is research that shows that roughly 59% of the teams utilize two or more AI coding tools. Well, that by itself, already changes how code is being delivered, because the different tools actually produce different results. Certain tools will, will be driven with PRDs. Some of them will be driven without. Some of them will have almost no context on what is required to be built or what would be the foundations. And some of them will say, you know, this is exactly how we build. This is how exactly we write code. And you have, you'll have those agents build your software as it develops it. So obviously, also here, we have different maturity levels. So with all of that and the growth of AI coding, you get to the place where there's less and less consistency and way more risks coming into production. And by calling it risks, by the way, yes, I am a security guy, and I would like to talk about security risks. But if you think about the developer side, there is an operational risk. There’s a risk of not having enough tests. There is a risk of not having the docs right. There’s multiple operational risks that are not only in security, that get to the point in which, where you have such huge productivity gains, you also have a big footprint of risks. Now you can say, okay, let's actually have a more rigorous code review process, because now we have more code we want to set our eyes into more of those reviews. So instead of developers spending roughly 15% of their time today on code reviews, give or take is maybe six hours a week now you will get to the point where developers spend more and more of their time on code reviews. And if you think about the developer state of mind, how much time can you actually spend on code reviews? It's mentally difficult.
Botond Seres 6:54
Indeed it is
Dave Erickson 6:55
Some developers think of code reviews and going to the dentist as the same thing.
Botond Seres 7:00
Personally, I do like them. That might be the outlier here, but…
Nir Valtman 7:06
Okay, but can you spend 100% of your time only reviewing code?
Botond Seres 7:11
Oh, personally (two days?), it's for many days.
Nir Valtman 7:14
That's amazing. That's amazing. I don't know how you're doing this.
Botond Seres 7:18
Well, it's very simple. When you have colleagues who push 2000 lines of code in a single PR it's very simple to review it for two or three days straight.
Nir Valtman 7:31
Yeah, of course, if you know, if it's a small PRs, it makes things way easier. And at the end of the day, if you think about it, even if you are, you know, the, the outlier, then, you know, if you try to think about taking this problem and scaling this across an enterprise, and it doesn't need to be, you know, a fortune 1000 company, think about, you know, company with 100 devs. How do you scale this problem across 100 devs and, and this is where you want to get your governance straight, before the PR, not at the PR.
Botond Seres 8:11
Well, one of the things I've, I've seen more and more of, is we use AI to write codes, and then we use another AI to review the codes that someone else's AI wrote. What do you think of that Nir?
Nir Valtman 8:22
Nir, well, it's completely three different problems that need to be solved. I know that most of the companies that generate, that provide the AI code Gen also have AI code review capabilities. And this is, I think, a reflection of the reality in which developers actually choose two or more tools, because eventually, right? Developers want best of breed in AI code generation, but they also want best of breed in AI code review, because at the end of the day, if you will get too many false positives in the review stage, you're missing the point, right, of that velocity that you can get with it with like best of breed one actually. Do you want an interesting fact? Okay, around code reviews, I can, I can, I can send you the source for this, but I found it one of the researchers of the university is that actually did you know some research around AI code reviews? And they found that 61% of all issues flagged by AI code review tools are dismissed by developers as in, not accurate or anything like this. So you get, you get a lot of comments, but then you dismiss them because they're irrelevant for you. So you're trying to optimize the time for review, but you're getting the opposite.
Dave Erickson 9:52
Yeah, well, again, people keep forgetting that Ai doesn't think, right and so in code review, when. It's trying to decide, is this a valid issue or not? It has a hard time thinking its way through that, so it just flags it. It says, this is a problem because it doesn't fit its formula of logic. And the developer will look at that and go, No, that's not a problem, right or but at least they can decide that. But I think part of one of the problems that people have with AI and even developers, and even using these developer tools, is it comes back to, you know, even these tools need a prompt, right? And developers can also be lazy, just like other people, and they put in a very broad, nondescript prompt, and they start generating code. From your perspective, how does the prompt that developers give these tools affect the outcome?
Nir Valtman 10:50
I think that at the end of the day, developers also evolve with what is expected to build a better product. So you know, initially developers started using this as a, you know, as a trial and error with prompts. And those developers that really adopt kind of how AI works and how to prompt this, they'll learn how to provide it very specific requirements. For example, you know, this is how we build the code, or this is how we expose an API endpoint, or this is how we decorate our functions and so on and so on and and on the flip side, you don't want to repeat exactly the same instructions every single time. And this is where you can see the developers coming up with, you know, cursor rules and cloud skills. And in copilot, you have the copilot instructions. You have the repetitive work added as a context in the repo, and then it is reusable by the rest of the devs trying to prompt the agents to do the work. Now it's great when you want to build, you know consistently. The thing is that if you think about not having this, the agents are designed to become more of a general purpose agents, and they learn how your code is built with different ways, and eventually they are designed to make this feature that you requested to be workable, to make it compilable to you know, if they can build and run it, then they can build and run it, but eventually The entire purpose is tweaked towards a workable product. But then the question is, what are you missing by doing like, by tweaking the logic so you're missing quality checks, you're missing security, of course, maybe you're missing the big picture of the architecture. How should things be built? How should you access the database? Instead of, for example, building a connection string, utilize this internal library that we already have for connecting to the database. These are the pieces that need to be embedded into the requirements of every prompt that the developers actually write the code with, and this is, for example, why we release the capability to push security requirements to every repo, because one of the annoying factors with security tools is that you need security people to review the vulnerabilities and go through different it's like a different process in the company. When you have security vulnerabilities, it's a risk. It's more kind of, requires, maybe executive approval. But if you already had the rules that tell you this is how you write software securely by default, you don't need to involve anyone. The devs kind of take security off their back by simply embedding those requirements into every repo. So that changes everything. Now think about the world in which you need to write code by yourself. You go back to IDE plugins, even to MCPS, both actually have an adoption problem. Think about an ID plugin that you want to add a security scanner to ask the developers to scan. Well, it's an opt-in functionality. The devs need to, to install it to connect to that security tool. It's already a difficult problem, right? And if you look at MCPs, it's pretty much the same thing. Even if you enable it across the enterprise, you need to authenticate to get those you know, to get the data properly and not across the enterprise. So, so, you have those challenges that are simply impossible to scale, unless you have very special tweaks on the developer workstations across the company, which is already difficult by itself. So, so, the best way in my mind to govern kind of, back to your question, what is the, how to see that difference. The best way to govern developers, to write the code, to adhere with your governance requirements, is by simply pushing those requirements to them, to every repo.
Botond Seres 15:38
Right, yeah, that makes a lot of sense about those requirements. So I personally know that, especially when we are talking about security requirements, we can have tons and tons of very specific ones, and those prompts can get incredibly long, sometimes longer than the entire context window that our AI tool has. So can you share some insight into how we can split these up? I don't know, like some for just a database access layer and some for our APIs.
Nir Valtman 16:17
I think that what we have learned after experimenting this for quite a lot of time. Now, we learned that in very specific cases it would be helpful to be language specific. And those very specific cases are, and this is why the window is long, right? The tokens get, you know, exponentially higher. It's very specific to a few cases. One, if you are using a cheap model for software development, which today developers really want the state of the art, right, so that you can put it aside unless they use open source. And the second scenario of kind of using the language specific is if you have your own custom ways to do things Okay. Other than that, you can actually use a general purpose prompt. So for example, if you tell it to expose an API endpoint in a way that requires authorization. Every time that you see elevated functionality, like an admin endpoint, that prompt will work on Java and Cobalt exactly the same way, okay. And therefore what you can do if you have general purpose requirements. Think about like, you take OWASP ASVs, which is the application security verification standard, and you codify this into a few 1000 tokens. Let's say sub 10. By the way, we have our own. It's 8k tokens. That's it. Now, the interesting part is,
Botond Seres 18:07
Sorry, I need to stop you there for a moment. What is one token in this scenario?
Nir Valtman 18:13
Four charactersYes, okay, what? What is? What is a token in your context?
Botond Seres 18:22
I was thinking one word is one token. But I'm not very deep into the whole AI thing, so
Nir Valtman 18:29
I'm not sure that's what the AI told us how to calculate this.
But yeah, it's,that's pretty much the way to think about it. Now the, the, the magic here in the context is, is not the input tokens, but the output tokens. So the input tokens are generally cheap. So if you think about the cost differences, it will be between a third and the fifth of the cost in an input token as opposed to the output token. So, So what you really want to provide? You want to provide as much as, tokens as you can, to not get it outside of the of the token window, of course. And at least what we have seen as kind of works perfectly. Is, you know, 50, 60k tokens is performing very well, but as you go further and further, it loses some of the context, which brings me to the point in which you need those instructions to take care of the most of the lifting with, let's say, default security requirements. But you also need a security scanner at the you know, on the push, on the PR to flag those things that could not be addressed by the agent. So that gives you kind of both sides, both on like the development life cycle, where you you have it as a gate, and when the agent generates code
Dave Erickson 20:13
As companies start focusing on augmenting their development teams with AI and most developers are already doing this on their own, but companies are doing it. But you also get a situation where developers are using different tools within the same team. You know you indicated that part of this governance, or this bigger picture of producing higher quality code, but also more secure code, is kind of a unified agreement within the team that these are the ways you want to approach these systems. These are the ways you want to do cyber security. The reality may be a little different in the sense that people don't always agree, and the teams may think, Okay, I'm going to do it this way, even though they say, do it this way, I'm going to do it my way, because I don't believe in that way. How big a problem is that within teams and enterprise situations to be able to get a high quality output as well as a secure output?
Nir Valtman 21:21
I think that there are certain things that you, that everyone can agree on, and there are certain things that every developer can configure for themselves. So for example, when it comes to security, it's pretty much standardized. It's, you know, these are the requirements. There are certain things you need to agree on when you prompt the agent, such as, you know, you must have a secure way to store, to store your passwords, for example, using a specific, strong, you know, hashing algorithm. But if the software already has something, then the prompt should say, but if I have something, don't break my software. Okay, so, so obviously, there is a craft to how you prompt. It to be a general purpose prompt for everyone to write code, security by default. Now, where is the difference? The difference is in, not necessarily in the central requirements, but in the steps to get to the central requirements. For example, a central role or a central skill can be, this is how you build the, you know, the service, and it will build it every single time, exactly the same way. There is no disagreement on this part, but one developer can simply prompt what is required to build. And Kibera can be very specific. For example, I built an e-commerce website, and I'm very specific about the fact that you cannot call the Checkout API until you add items to the basket. I'm just throwing something out there. This is a very specific domain knowledge where you expect the developer to bring this input by themselves. On the flip side, you can say, maybe I'm a maybe less experienced developer, or I want to ideate with my agent. Well, that's fine. You will go, you will not necessarily use the same agent for the coding task. You will use the agent to say, Hey, you know, help me think about this feature, how to build this. And the agent will need to go to your code to analyze the architecture. The agent will kind of analyze the, you know, maybe the other processes that you have, or maybe other services that it has access to. If you have a cross cross enterprise within the enterprise context, agent and and therefore, you will end up with a result that may be higher quality from someone that knows less. And this is where everyone can utilize agents in their own ways. One way is just prompt whatever you want, and you know what to expect to, versus I have this requirement of developing a service. I don't know what I'm expected to do from an architecture standpoint, but let me ideate and then give it to the agent to build.
Botond Seres 24:38
So we talked quite a lot about how we can use AI to make secure applications. But in my mind, there is another hidden layer of risk in just using someone else's AI. This might be a hot take, but in my personal opinion, if an enterprise decides to use AI, they really should not go to Copilot, Gemini, chat, GPT, whatever else, but they really should at least export the option to self host, and that's something I can't understand why no one, well, almost no one is trying to do, especially when we compare the costs, it's like incredibly cheap to buy 10 GPUs. Plug them into one PC with the craziest CPU you can find, and you have self hosted agents for every developer at every hour of the day.
Nir Valtman 25:35
I think that it's something that actually had a few conversations about. And, and, by the way, there are companies that that's all they do. They host models for you, something companies like Together AI that I like, you know they ,they take open source models. They can maybe not necessarily have the latest and greatest models, but have good enough models for you to utilize and and host, because, again, even buying GPUs or, you know, if you go to Google to run it on TPUs, it still gets to the point in which you need to maintain it and you need to get kind of address the needs of the developers. Now, most, the vast majority of the developers that I, that I spoke with, they want the latest and greatest models, and getting the latest and greatest model is possible when you go to a service provider, right and naturally now, obviously there is somewhere there's a middle ground, because if you look at Azure Open AI, for example, or Azure AI, you can actually get to to either the latest or latest minus one, and eventually you will pay for a hosted model. It's still SaaS, but it's your own instance of the model. And we see a lot of enterprises with Microsoft agreements actually going towards this route. It makes sense financially, but, but yet, we have not seen a company where all developers are using it because developers want to get the latest and greatest and like now with the announcements of Codex 5.3 and Opus 4.6 the context window is suddenly a million tokens. They want to do more. You can build a Linux Compiler, right? And, and, and that opens up the appetite for the devs to do more. Eventually they don't want the latest minus one.
Botond Seres 27:46
I don’t know, a million tokens is a lot like I remember the very first instances of these models, and when you try to put anywhere close to that number into the context window, they adjustments. That's a lot of words too, but I'm not reading them.
Nir Valtman 28:04
It's a lot of words, and it loses the context, obviously. And the main question is where it, where it exactly performs the best. So for example, if, and I don't know yet, what would be the answer for 4.6 it's still in research on our side. But let's say that you increase the tokens from, let's say 40k when it's optimal to, you know, 250 K tokens where it's optimal. Do we need to provide more security requirements? No. But can we work on a faster delivery? Yes, can we work on a, more strict requirements that will be addressed within this context? Yes, so obviously, certain things don't need to be kind of added with more tokens, but now we can handle Multi File in a better way. You can. You can read full repo in certain cases and have a full scan in like one shot. So it changes a lot of code generation.
Botond Seres 29:15
to read the whole repo is the holy grail of AI development in general, because many, many, many companies have just vast amounts of legacy code that they would like to bring in these AI agents to work on, to fix, to document, to do whatever with, and it's it's been impossible so far.
Nir Valtman 29:35
We're talking now, and we have a million. What will happen by the end of the year.
Dave Erickson 29:43
Well, if they can build enough nuclear power plants, then they might be able to expand the code window to a billion.
Nir Valtman 29:51
Maybe, and you know what? When that happens, a lot of things will change, right? Suddenly, you will not have a. A profile or multi File Context, or not a repo context, you will have an enterprise context, and obviously it comes with risks, and like sending the enterprise context every single time to the model, it may be challenging, but at the same time it may be super accurate, super aligned with their governance as well. Because you can see, these are the sample repos that I care about, that, I know that there are my templated repos with all of my requirements, security, testing, architecture and so on. Now I have an idea for a service. Here's an API. You can create a repo and deploy that by the time that I'll go home and kind of get my data, and that is possible, or will be possible.
Dave Erickson 30:53
So maybe we can. I don't know if taking a step back is the right word for it, but maybe we can kind of look at this, let's say we're a company, an enterprise. We have a team of, I don't know, 100 developers, and we would like to implement AI. We would like to get a production boost, 20% 30% productivity boost by using AI tools for doing development. But the department has not really had a lot of experience with using AI development. If you were going to be leading that department into implementing AI development, how, what would be kind of the basic things you would want to think about from not just an operations or production standpoint, but also a security standpoint. How would a team that does not have much experience with AI assisted development, how would they start doing it, and what are the things they need to be aware of or look out for?
Nir Valtman 31:55
I think that, well, there's a lot of unpacking work here, so I would say, you know, first of all, show them what's possible. You can get consultants. You can get someone who's excited about this thing. For example, in our company, we have a few folks that are next level in building agents, and they operate multiple agents and, and they just do lunch and learn, or they kind of share with the others in life how they build features. For example, we've got, like, right on Friday, we had a session about a feature that one of our customers requested. We thought that it would be a really big feature to build. And we said, let's just try. Let's see. Let's see. You know, what would that take? And in four hours, it's not one prompt, right? In four hours, the entire sea, the entire team, sat down and they, first of all, built a full PRD based on the customer requirements. Now, how do you build a PRD? You go to your agents, and you give them access to your source code, to your docs. Obviously, you have your own skills, and, and you distribute the work to kind of collect all of those requirements. Now, as part of this, you can also say, Let's ideate give me you know back and forth questions so eventually you can build a full PRD. Then when you know, all of the ideation was completed and the PRD was built, he just said, You know what, let's just try to see what the agents can do. So in Article, we use pretty much all, all state of the art models, because we need to. We're developing software that integrates with them. So we spun it up with, you know, with Claude and Copilot and Cursor, and kind of really try to figure out what will build, you know, the best piece. So you end up with a sub10 pull requests who just load the branch locally, and you test it, you see if it works. All of this journey happens to, first of all, people that already have some education about AI, right? It's not new to our development team, but it's really interesting to see how you have a pilot driving and you ideate with a pilot developer, and you see, how can you stretch the capabilities with the eye? So that's one thing. Just really, get consultant, get one of the devs that are excited about this, and build stuff together that's like, 101, Uh, like going and asking your chat, GPT, or whatever chat that you are interacting with, what can you do? For me? Will always yield to the same result. It will put you in the box. You won't know what it can do until you see someone actually does that. Got it so that's one could that degree more exactly so the teams, I believe that in order to get the teams to start using AI, do many Lunch and Learn, do many sessions where someone drives every single time, and you will learn from from each other on how to build this better. So that's definitely one. Now, obviously there are enterprise concerns that may come into play. You can say, okay, so how do you add those? Or how do you copy and paste those requirements across every piece in the product we spoke about. This is like the role files that you can add security and testing and so on and so on. You can also think about kind of taking your history of sessions that you did per developer, or, you know, can be per developer, per session that you make, and just ask your bot to summarize your entire session and build this into a requirements file, and you will see the things that you are not thinking about are also added automatically so, so essentially, it's a flywheel in which the moment that you start and you add more context, the better it becomes, and the easier it becomes for others to build on top of it. But the first steps are the ones that are most important.
Dave Erickson 36:50
And I would assume that anybody who's starting to move into AI assisted development, you know, you mentioned this in the beginning, is that, you know, there's a you can ask the prompts, or you can set up things so that the cyber security needs are kind of defined on the front end. Can you talk a little bit about what a company would need to actually put together to make sure that AI development is actually done in a secure way?
Nir Valtman 37:19
Yeah, I think that pretty much every company, it's just their requirements may, may have some, some tweaks in them. So for example, you know, let's say that every company that writes code with AI agents and exposes any services will need to have at least OWASP ASVs, type of requirements, you know, write the service securely, and you know, it goes up to the level of even, you know, throttling that you need to have in your APIs. Some of that is security, some of that is operational. Now, the main difference is, what do you want to apply towards more specific areas of the business? So for example, let's say that you have a product that handles personal healthcare information, PHI. Well, what you typically want to do there is that you want, you want it to write the API's, but you want to then tweak not the prompt, but or not only the prompt, but also the scanner that you have. So for example, within the prompt, you can say, hey, if there is any PHI information handled in this feature, make sure that you encrypt the data in the code before it goes to the database. I'm just throwing out something there. It's very specific to PHI, but it will not always understand it. The Age dividend has million kajillion tokens, it will not always understand it, and therefore you must have another piece of the scanner that will actually be able to identify the broader context, to say, hey, you know what? I, now that I review the code and I see what has been developed there after multiple prompts. Now let's see if I'm handling any PHI information, and if I am, I want to flag this as a finding that requires a review by security or compliance professional, right? So, so the, the involvement of the human in the loop is something that will typically happen at the scanner level, at the PR, at the push, while the requirements definition and kind of adhering to those requirements would be at the agent level.
Botond Seres 39:51
Nir, you mentioned a little bit earlier that when interacting with AI agents, or rather evaluating them, we shouldn't ask them what they can do for us, but we should actually test what they can do for us. And the thing that brings us really nice to a topic I believe you would also like to talk about, is there seems to be a gap of understanding between people who actually do AI assisted development and people who tell them to do AI assisted code development. I'm specifically referring to decision makers at high levels at companies, we set targets and quotas, and I do have my personal experience with this, but I would like, if you could talk about this a little bit, and maybe we can bring a little bit more understanding. What I see many, many, many times is decision makers ask AI what it can do for us, and then they take that as face value.
Nir Valtman 40:56
I think that there are two ways to look at that. First of all, if you look at that top down, decision makers, you know, setting goals to, you know, to utilize AI, I think this is actually the right thing to do. Let's say that if you are in a position of a CTO, and you're not showing metrics of increased cost of AI scanning or AI code generation, you're doing something wrong, unless you're maybe in the federal space, and you may need, I don't know, isolation that is not fully ready for whatever, but in most companies, that's the trend. You spend more money on AI you spend less money on AI, tools and other, kind of, pieces of the budget that you had. Now, the, the leaders can ask, what AI can do for us. But typically, if you look at the leader of, let's say again, 100, 100 person company, and in the 100 person company, will have a CTO that maybe has at least two, three people in this CTO squad, the special projects. They will tell the CTO exactly what they want to drive across the company. So it will go top down, but it goes one level, bottom, then top, then down. Okay, so you're the expectation is not to get the CTO or the VP of engineering to be the, the most forward thinking about AI. The expectation is to say, hey, we need to solve whatever business problem, productivity faster, you know, delivery, maybe autonomous development. You know, 2030 is going to be the year of autonomous development. And, okay, how do I start planning for less friction of, you know, humans, how do you, how do I plan my, you know, monitoring and production to make sure that things are actually working properly. And if I need to revert, then I'll revert. You start thinking strategically, how do you drive towards this goal within the next two, three years, in which you ship a feature from your phone, okay, or from your Tesla. You get into a Tesla and you talk with a Tesla, and you have a feature, okay, whatever. Now, when it comes from, from the bottom, we'll call the bottom up, although it's the flip if you look at the tree, and the leaves are the top, but if you look at the developers, eventually, developers will use whatever they want. Let's just be very realistic about that. Developers will utilize whatever is available for them, if they we have seen so many companies where, you know, you have a GitHub copilot, for example, mandated across the company, and then when our customers see the the inventory of agentic rules they have across the company and so on, they suddenly see that they have almost all tools in the company. Because developers, if I'm more productive with, I don't know with Cursor, and it costs me the 20 bucks a month to be more productive, I'll just pay the 20 bucks per month, sure. So, so this is why, you know, it's important to have certain goal and certain strategy top down. But at the end of the day, the devs are the ones that, that will, you know, you need to win their hearts in order to win their wallets later, right? That it boils down to this.
Botond Seres 44:52
yeah, that's that's absolutely right. I'm not trying to diminish the usefulness of AI. What I'm saying. Thing is that sometimes the implementation is not ideal, like I'm going to throw a curveball out here, but MIT did report that over 95% of companies who use AI see no measurable return on investments, yet we have so much enthusiasm, so much drive. It's, as you said, it's like a flywheel. They just keep spinning. Everybody is using AI for everything and yet solely to return. But on a personal level, I did see that it can be an amazing tool for productivity in very specific cases, but it doesn't seem to be the tool that is just a magic wand that solves everything
Nir Valtman 45:46
That's correct. And I think that if we are referring to the same research, it also shows and slice and dices the data between senior developers and junior developers. Because I think it actually demonstrated if, again, if this is the same report that we're talking about, it actually demonstrated, yeah, so it's probably demonstrated the, the fact that for senior developers it actually diminishes their productivity, and for junior developers it actually spikes their productivity. And, and that's quite interesting, because eventually senior developers essentially, kind of get married to the concept of how things should be built, and they have so much domain knowledge and, and therefore, sometimes, because they have so much development, sorry, because they have so much domain knowledge, they get to the point in which, when they review their code, it's not good enough for them, so they need to reprint, retest and so on. For junior developers, because they don't know that much yet, it feels, you know, oh, yeah, of course, you know, I'll address all of this stuff, right? Yeah, it looks great. Yeah, it will work. You know, I tested it locally. Compiles on my computer. It works. So, so that's one piece of it. Now, the other piece is, that is important to measure is the entire development life cycle. And where do you put AI? Because we are used to AI coding just fine. You put the governance. You make this work, you'll have the governance at the code review stage. But what about, you know, specifying what you need, like building the right ticket, or, you know, grooming the ticket. It's a product manager type of a work. But if the product manager does it solely, does the developer know what to do with that? Well, no, you need to groom it from the developer side. So if I believe that, if you will start investing in, in other areas, not only in this, the two areas that we know you, you actually have the opportunity to see better outcomes. Because if the design is right, the upstream cost goes down, okay? And at the same time, if you, I don't know, maybe have a magic way to read your APM logs from production and see why you have bugs, and then ship this context back to the agents. Then you may ever, you may also end up with a more stable system, which, again, will reduce the operational cost for you, but it will be hard to measure. Did the AI? Did the did the AI contribute to this, or the Dev or the will?
Dave Erickson 49:06
I mean, for, for your own company, you obviously use AI development to assist your, your development. What's your experience been by bringing AI to assist your developers?
Nir Valtman 49:19
Today I think it's very much clear to all of our developers that there is no other way. It was required by my co-founders and CTO from all developers to utilize AI, and the more invoices we're getting, the happier we are. Okay.
Dave Erickson 49:39
Okay? And how was that transition internally? Did it meet resistance initially, or did, were they all excited about it? And I assume it hasn't gone 100% smoothly, because nothing ever in business goes 100% smoothly. But of course, hopefully it was easy to deal with it, or it took a learning curve.
Nir Valtman 50:00
So obviously, you will always have in every company you will hold, you will always have those people that you know you don't want, they don't want you to move their cheese, right? And, but eventually, what people, I think, understood in Arnica is that if they're not going to use AI, they're not going to have a job in a few years, so they have the best opportunity to do it here, because we are enabling them to do AI. And if the company doesn't enable them to do AI, they should find another place to work. Because, you know, it's a bummer when you, you know, when you, let's say you're a front end developer, and you, you know you have your craft that you created for you know, you know that it takes a few days to develop every piece, every component, make it beautiful. And now you know it's a prompt that does that in 30 minutes with everything. So obviously, you know, I had the conversation with one of them, and it's like, hey, it's a bummer that, you know, all of that work that I did now can be translated into a prompt. But on the flip side, what an opportunity do we have now to be way more productive to, you know, learn the craft of prompting, because prompting is a skill, right?
Dave Erickson 51:29
They have to. They have to change their craft by using their experience, from that of writing the code to writing a prompt, and then after the prompt generates the code, then they have to focus on reviewing that code and make sure it's correct and making adjustments so it's it's not so much that you're taking a job away from them, but you're having them shift their work and experience into the other areas, right?
Nir Valtman 51:54
Yep, and you remember, you know, many you know, let's say, 15 years ago. You know, you could see developers idle on front of their workstation, and they would look at the computer and say, building or compiling or anything like that. So actually, a couple of months ago, I've been sitting in the in the room where, you know, have a few of our developers, and I saw them standing and chatting with each other and looking at the monitor every few, you know, minute or two, and I'm like, what's going on? And they're, I don't know. It's writing code. Like, whenever it finishes, I'll, you know, I'll get back multiple agents, and it writes code so they can't touch the computer because they spun up agents on their laptops. So okay, great, you're doing the right thing. So it definitely changes everything and, and I think the developers must be enabled to do AI, and just in some cases, if you, if you can, let them use whatever they like, and if you can, then, if you're in the large enterprise, just you'll need to figure out a way to, you know, get into an agreement with a vendor or two and try, try with a, you know, underlying on the try to enforce those specific tools,
Dave Erickson 53:24
You know, for developers who are developing and using AI to develop, but also you know from their own perspective, how should they or how can they use AI now to help them make the systems that they're developing even more secure, or more secure, from hackers who are using AI to hack in.
Nir Valtman 53:47
I think that there are multiple pieces of context that the developers need to be aware of. There is the context of the software itself, which is the piece that they can control. Maybe, you know, they can control, also some of the infrastructure that is deployed. Maybe they can control their Docker file. Maybe they control whatever environment it goes to. But in most cases, developers can't really control the full production environment, and that's the reality. Maybe you have an SRE team, maybe you have whatever centralized center of excellence that you have to control the production environment. So I think that given that limit of the context where it's essentially the code, I think that one thing is that developers really need to understand the context of where it is going to be deployed, not necessarily in software. Just understand the business case, understand what it interacts with, kind of do a mini threat model, if it's in security or in kind of model how it will operate, essentially, and if you can model. How it will operate, you can come up with those assumptions and bring them to your prompt when it builds the software. So for example, let's say that we'll go back to the throttling example. Let's say that you expose an API, but you know that this API endpoint is behind an API management system that handles all of the throttling for you, and maybe that system also handles all of the authentication authorization, and all all you need is, I don't know, claims to check then add these assumptions to your prompt, and you will build a workable product more consistently. So I think that the first point is really not only the security awareness, but the awareness of where it will operate is going to be one thing. And if you have this awareness, you know what to prompt. Second piece is that we know that developers are not security experts. They're and as a matter of fact, they shouldn't be, you know, security experts. But we do know one thing. We know that developers have a good intent. They want a workable software none of the developers want to introduce security vulnerabilities deliberately. And therefore, if you provide them with the right skills, codable skills, it can be the cursor rules, the you know, the agents, MD file that you can modify for them and include the security requirements there. You are pretty much done on that front right now. Are there business logic risks that can happen, of course, but it's really hard to find all of those business logic risks. And therefore you should validate this as you go through this development lifecycle. So for example, if you have a let's say that you again exposed an Admin API endpoint that is available for a certain role, right? Because you read it from the requirements you know it's implemented in a certain way. You need another tool that scans the changes right. The code dips. It can be again, when the code is pushed, it can be what the, you know at the PR itself, and at that point you want to provide this very narrow, very focused feedback to the dev based on the code. Now, there's more to that. Now, if, if you have an AI security scanner, like in our case, what we, what we built, is a we, we built a rule based scanner that scans for very specific patterns, and then we, we the added a hybrid multi agent scanner that comes on top of it, so it uses the rules very deterministically. But there are things that are not deterministic that need to be scanned as, as a meaning scanner, as opposed to a rule based scanner. And when we built this, we're really trying to understand what is the context that we need to look for. What is those up the up the, let's call it the up the stack issues that are not necessarily, you know, the SQL injections of the world, right and and when we built it, we also realized that it's, it can be a one size fits all. So because of that, we again, and it's iterative process, so we had to go through so we then opened a piece of our prompt to the customers, and we said, hey, if you have anything specific you want to tell us, this is the place, and this is where we see, you know, customers adding a custom prompt that says, generate a POC for me for a potential risk. Or, you know, this is a repo with a PII phi, show me the endpoints and go through a security approval. All of these things need to be customizable, either at the code review stage, or if you really want to go through the hurdle of integrating this across the company and in dynamic environments, that will be way more challenging, because kind of need to live in a dynamic environment, like in a runtime environment. So runtime like, if you really need to apply all of your security requirements in a runtime, there's a lot of value to this, but scaling it across the enterprise is almost impossible.
Botond Seres 1:00:01
All near you, very, very quickly mentions an agent's MD file I referring to markdown? Yes. Oh, thank you. I've been telling this to every single person I know who's less technically inclined than myself, that you need to write markdown to properly prompt the AI, yes, I feel like I'm going crazy, because literally, no one does it, no one that I know, and it gives so much better results. It's incredible. Thank you.
Nir Valtman 1:00:31
Yes. One thing about agents. MD, so if you go to the websites, the website agents.md there's a website for that, and yes, and it will tell you exactly how you should you prompt within this agent. Thing is that every agent has its own format as well, so they will respect agents MD, but for example, in copilot, it would be in dot, GitHub, slash, copilot, dash, instructions.md, or in cursor, it will it will be MDC, which is a markdown with comments and
Botond Seres 1:01:13
lots of flavors of markdown. So you're saying that different agents use different flavors of markdown,
Nir Valtman 1:01:19
exactly and back to back to governance, if you apply it only on agents, MD, other skills or other instructions may come up first before the agents indeed, and therefore you want to be specific to Every ecosystem and every agent when you write your requirements, right? So if you have, let's say, across the enterprise, you have both, let's say Claude and cursor.
Botond Seres 1:01:51
Okay, cool
Nir Valtman 1:01:52
actually need Claude and cursor files in every repo where Claude and or Cursor is there? Oh, cool. Okay, so, so that's a piece of like again, what we realized and what we had to build out of necessity.
Botond Seres 1:02:13
I'm really glad that someone else realized that mark, Dave, is really good for AI, yes, definitely. I feel like this is some kind of secret knowledge. I do understand that people are very much into AI probably find this incredibly like basic knowledge. But to the rest of the world, it's not. It's really not.
Nir Valtman 1:02:37
And I agree with you that we actually, when we have conversations with prospects, even customers that just started utilizing AI, there isn't much practice around those agents files, but the moment they realize what they can do with this, then it starts to kind of Cascade quickly around the companies, which is, it's amazing. Companies now have central repos of those that you can pull into your, into your repo and decide what you want to add. And there's best practices, there's language specific skills that you can download. It's phenomenal.
Botond Seres 1:03:18
Yeah. What do you, in your opinion, what is the future of AI or agent assisted developments?
Nir Valtman 1:03:27
I think that realistically, within a couple of years from now, what will happen is that you will have a workforce or fleet of agents that can understand your full context based on, you know, different connectors to build real autonomous software, meaning it will understand, you know, previous developer feedback and funnel this into how it builds software. It will look at the production logs, and it will look at what generate crashes, and it will add logic to avoid those crashes in the future, as well, as, you know, build new functionality that all it will ask is, you know, another agent to test it against the specs, and maybe the previous specs to make sure that nothing else breaks, and then, you know, ship into production. So in essence, what I see, the kind of, the role of the development in the future is essentially the point in which the software development life cycle and the points you have there,0 dictated when you need to be involved, because it's inconclusive and only when you approve it goes to production. And as a matter of fact, maybe one more piece on that, even product management can be somewhat auto. Mated. Think about looking at your customer support and Customer Success tickets, looking at your market, looking kind of you don't need an agent to talk with customers, but you can definitely, you know, based on the conversations and the summaries from those conversations, you can build a backlog of tickets that can be addressed by this fleet of agents.
Dave Erickson 1:05:24
Well, Nir, maybe you can talk a little bit about Arnica, and what do you guys do, and what kind of clients are you looking for?
Nir Valtman 1:05:35
Oh, yeah. So, you know, in Arnica, we really developed an entire SDLC governance mechanism that is really ready for this AI era. So we, today, we have the capabilities to scan every single piece of code that changes across the enterprise. We did this with web hooks that connect to everything if we identify vulnerabilities authored by developers or agents, we then communicate with either the developer or that authored the code or prompted the agent to write the code in a very private way. We do it over slack or teams. Remember, we spoke about IDE plugins in adoption, there is no adoption problem because everyone has slack in teams, and that enables issue resolution before the pull request. So 78% of all issues with flags never get to the PR because of that. So that gives a huge productivity boost in the code review stage and all of this feedback loop eventually can be driven back to the AI scanners, to the, to the agents files that eventually generate secure code by default. So it's all about the workflows, all about, how do you control every piece of software that is written around the company. In our case, we just make sure that it's secure as of today, but obviously there are extended use cases to this.
Dave Erickson 1:07:11
Well, we'll make sure that in the description we put in your website and your LinkedIn, so people can check that out. Nir, thank you so much for being on our podcast and discussing AI security and preventing software chaos.
Botond Seres 1:07:26
Well, we are at the end of the episode today, but before we go, we want you to think about this important question. How would you use AI to make your digital products more secure?
Dave Erickson 1:07:38
For our listeners, please subscribe and click the notifications to join us for our next ScreamingBox Technology and Business Rundown Podcast. And until then, try using AI to add security to your software projects. Thank you very much for taking this journey with us. Join us for our next exciting exploration of technology and business in the first week of every month, please help us by subscribing, liking and following us on whichever platform you're listening to or watching us on. We hope you enjoyed this podcast, and please let us know any subjects or topics you would like us to discuss in our next podcast by leaving a message for us in the comment sections or sending us a Twitter DM till next month. Please stay happy and healthy.
Creators and Guests
