Generative Data Intelligence

Threat Modeling in the Age of OpenAI’s Chatbot

Date:

Theres been a flood of news about OpenAIs new GPT-3 Chatbot. For all the very real critiques, it does an astounding and interesting job of producing reasonable responses. What does it mean for threat modeling? Theres real promise that it will transform threat modeling as we know it. 

For readability, I’ll just call it chatbot. The specific examples I use are from OpenAIs implementation, but we can think about this as a new type of technical capability that others will start to offer, and so I’ll go beyond what we see today. 

Lets start with what it can do, ask what can go wrong, see if we can manage those issues, and then evaluate. 

What Chatbots Can Do

On the Open Web Application Security Project® (OWASP) Slack, @DS shared a screenshot where he asked it to list all spoofing threats for a system which has back-end service to back-end service interaction in Kubernetes environment in a table format with columns — threats, description, and mitigations.

Chatbot description graphic

The output is fascinating. It starts Here is a table listing some examples … Note the switch from all to some examples. But more to the point, the table isn’t bad. As @DS says, it provided him with a base, saving hours of manual analysis work. Others have used it to explain what code is doing or to find vulnerabilities in that code. 

Chatbots (more specifically here Large Language Models, including GPT-3) dont really know anything. What they do under the hood is pick statistically likely next words to respond to a prompt. What that means is they’ll parrot the threats that someone has written about in their training data. On top of that, they’ll use symbol replacement for something that appears to our anthropomorphizing brains to be reasoning by analogy. 

When I created the Microsoft SDL Threat Modeling Tool, we saw people open the tool and be unsure what to do, so we put in a simple diagram that they could edit. We talked about it addressing blank page syndrome. Many people run into that problem as theyre learning threat modeling. 

What Can Go Wrong?

While chatbots can produce lists of threats, theyre not really analyzing the system that you’re working on. Theyre likely to miss unique threats, and theyre likely to miss nuance that a skilled and focused person might see. 

Chatbots will get good enough, and that mostly good enough is enough to lull people into relaxing and not paying close attention. And that seems really bad.  

To help us evaluate it, lets step way back, and think about why we threat model. 

What Is Threat Modeling? What Is Security Engineering?

We threat model to help us anticipate and address problems, to deliver more secure systems. Engineers threat model to illuminate security issues as they make design tradeoffs. And in that context, having an infinite supply of inexpensive possibilities seems far more exciting than I expected when I started this essay.

I’ve described threat modeling as a form of reasoning by analogy, and pointed out that many flaws exist simply because no one knew to look for them. Once we look in the right place, with the right knowledge, the flaws can be pretty obvious. (Thats so important that making that easier is the key goal of my new book.) 

Many of us aspire to do great threat modeling, the kind where we discover an exciting issue, something thatll get us a nice paper or blog post, and if you just nodded along there … its a trap. 

Much of software development is boring management of a seemingly unending set of details, such as iterating over lists to put things into new lists, then sending them to the next stage in a pipeline. Threat modeling, like test development, can be useful because it gives us confidence in our engineering work.

When Do We Step Back? 

Software is hard because its so easy. The apparent malleability of code makes it easy to create, and its hard to know how often or how deeply to step back. A great deal of our energy in managing large software projects (including both bespoke and general-use software) goes to assessing what were doing, and getting alignment on priorities — all these other tasks are done occasionally, slowly, rarely, because theyre expensive. 

Its not what the chatbots do today, but I could see similar software being tuned to report how much any given input changes its models. Looking across software commits, discussions in email and Slack, tickets, and helping us assess its similarity to other work could profoundly change the energy needed to keep projects (big or small) on track. And that, too, contributes to threat modeling. 

All of this frees up human cycles for more interesting work. 

spot_img

Latest Intelligence

spot_img