Opsec Cyber Indirect Instruction Injection in Multi-Modal LLMs

Indirect Instruction Injection in Multi-Modal LLMs

July 28 2023 By Bruce Schneier in machine learning, Security

Interesting research: “(Ab)using Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs“:

Abstract: We demonstrate how images and sounds can be used for indirect prompt and instruction injection in multi-modal LLMs. An attacker generates an adversarial perturbation corresponding to the prompt and blends it into an image or audio recording. When the user asks the (unmodified, benign) model about the perturbed image or audio, the perturbation steers the model to output the attacker-chosen text and/or make the subsequent dialog follow the attacker’s instruction. We illustrate this attack with several proof-of-concept examples targeting LLaVa and PandaGPT.

Indirect Instruction Injection in Multi-Modal LLMs

Indirect Instruction Injection in Multi-Modal LLMs

Recent Posts

The Constitutionality of Geofence Warrants

Ireland Proposes Giving Police New Digital Surveillance Powers

Friday Squid Blogging: Giant Squid in the Star Trek Universe

AIs are Getting Better at Finding and Exploiting Internet Vulnerabilities

Why AI Keeps Falling for Prompt Injection Attacks

Recent Comments

Archives

Categories

Meta

Contact Us

[email protected]

Singapore CBD

+65 8714 2780