Generative Data Intelligence

Startups Scramble to Build Immediate AI Security

Date:

COMMENTARY

At the start of 2003, nobody knew the industry would be handed an imminent deadline to secure artificial intelligence (AI). Then ChatGPT changed everything. It also elevated startups working on machine learning security operations (MLSecOps), AppSec remediation, and adding privacy to AI with fully homomorphic encryption.

The threat of AI is not overhyped. It would be difficult to overstate how insecure today’s AI is.

AI’s largest attack surface involves its foundational models, such as Meta’s Llama, or those produced by giants like Nvidia, OpenAI, Microsoft, etc. They’re trained against expansive data sets and then made open source on sites like Hugging Face. The overwhelming majority of today’s machine learning (ML) development involves reusing these foundational models.

At least at the moment, building bespoke models from scratch has proven too expensive. Instead, engineers tune foundational models, train them on additional data, and blend these models into traditional software development.

Foundational models have all the existing vulnerabilities of the software supply chain, plus AI’s new mathematical threats. While new MITRE and OWASP frameworks provide a nice catalog of expected attacks, it’s still the Wild West.

Can you even figure out if you’ve already deployed a vulnerable model? There’s literally no widespread practice of enumerating model versions before release. The AI establishment has thus far focused on risks around accuracy, trust, and ethics. They’ve accomplished nothing on cybersecurity.

AI May Be Inherently Insecure

Traditional attacks like SQL injection involved altering characters in small amounts of structured input strings. Still, this exploit took 20 years to extinguish. Consider the difficulty in solving mathematical exploits of large unstructured inputs. One can change even a single pixel in an image and induce different model outputs. Some believe that, despite patching, there will always be ways to change inputs to attack foundational models.

And it’s not easy to patch all the known vulnerabilities in a model. Retraining can fall into the ML pitfall of “overfitting,” which intrinsically degrades performance and quality.

Analyzing software composition needs rethinking too. How can one create an AI bill of materials if an application continually learns? Its ML models are actually different on any given day.

Will the Visionaries of MLSecOps Save Us?

A handful of startups within MLSecOps are engaging in a feisty debate about what part of the ML life cycle they should focus on.

Thousands of academic papers describe adversarial AI attacks on deployed production models, as does the MITRE Atlas framework. HiddenLayer was the winner of 2023’s startup competition, Innovation Sandbox. It focuses on adversarial AI but also covers response and some of the early ML pipeline.

Adversarial AI wielded against models in production environments has caught the public’s attention. Yet many vendors of MLSecOps question how many black hats can afford its hefty compute costs. Also, consider that potential victims may throttle model queries so low that there aren’t enough interactions for the attacks in MITRE Atlas to even work.

Protect AI has shifted left within the MLSecOps space. It secures bespoke model development, training data, and analyze foundational models for vulnerabilities. Its MLSecOps.com community details vulnerabilities from leaked credentials and exposed training data to an enormous number of mathematical exploits.

A further debate is driven by Adversa AI and Calypso AI, which are both skeptical that foundational models can ever be secured. They’ve allocated their gunpowder to other approaches.

Adversa AI automates foundational model pen testing and validation, along with red-team services. Calypso AI focuses on scoring vulnerabilities at the point of model prompts and their responses, either logging or blocking.

Startups Got Realistic About Fully Homomorphic Encryption (FHE)

FHE is quite different than the all-or-nothing encryption of old. FHE outputs a structured cyphertext that includes a rich schema. While still encrypted, FHE can be productively used by many ML algorithms, neural networks, and even large language models (LLMs). And FHE is safe from brute-force attacks by quantum computing.

Magical mathematics allows gleaning business insight into the data without having to decrypt and expose secrets. It opens secure collaboration between multiple parties. It leaves investors salivating over a technology that could secure true privacy between business users and the ChatGPTs of the world.

Unfortunately, FHE’s promise fizzles out when confronting the size of its structured cyphertext. After encryption, cyphertext balloons to 10 to 20 times its plaintext size. The computing time and cost to encrypt are also prohibitive.

In April 2023, Innovation Sandbox finalist Zama admitted onstage that its FHE wasn’t ready to encrypt everything for most commercial applications. Many were disappointed, yet it was not an admission that Zama came up short. It was a vision for how this inherently nonperformative but immensely powerful encryption is meant for select high-value uses.

The encryption algorithms of old were one size fits all. FHE, on the other hand, outputs flexible cyphertext schemas and can be implemented with different algorithms for different business purposes.

Zama’s FHE focuses on blockchain encryption that can be used without investors exposing their smart contracts. Lorica Security is another upstart that focuses on keeping both queries into secure data stores and their responses private. Two smaller FHE startups also received strategic investments in 2023.

AI promises a world of benefits, yet it’s weighed down by spiraling compute costs. Only a small number of innovators at early growth startups have coherent visions of AI security. It would be wise to follow them closely.

spot_img

Latest Intelligence

spot_img

Chat with us

Hi there! How can I help you?