The Pentagon is working to address vulnerabilities in its AI systems that could be exploited by attackers using visual tricks or manipulated signals. Their research programme, Guaranteeing AI Robustness Against Deception (GARD), has been investigating these "adversarial attacks" since 2022.
Researchers have shown how seemingly harmless patterns can fool AI into misidentifying objects, potentially leading to disastrous consequences on the battlefield. For instance, an AI could mistake a bus with passengers for a tank if it were tagged with the right "visual noise."
These concerns come amidst public anxieties about the Pentagon's development of autonomous weapons. To address this, the Department of Defence recently updated its AI development rules, emphasising "responsible behaviour" and requiring approval for all deployed systems.
The modestly funded GARD programme has made progress in developing defences against such attacks. They've even provided some tools to the newly formed Defence Department's Chief Digital and AI Office (CDAO).
However, some advocacy groups remain concerned. They worry that AI-powered weapons could misinterpret situations and attack without cause, even without someone deliberately manipulating signals. They argue that such weapons could lead to unintended escalations, especially in tense regions.
The Pentagon is actively modernising its arsenal with autonomous weapons, highlighting the urgency of addressing these vulnerabilities and ensuring the responsible development of this technology.
As per a statement by Defense Advanced Research Projects Agency, the GARD researchers from Two Six Technologies, IBM, MITRE, University of Chicago, and Google Research generated the following virtual testbed, toolbox, benchmarking dataset, and training materials that are now available to broader research community:
- The Armory virtual platform, available on GitHub, serves as a "testbed" for researchers in need of repeatable, scalable, and robust evaluations of adversarial defenses.
- Adversarial Robustness Toolbox (ART) provides tools for developers and researchers to defend and evaluate their ML models and applications against a number of adversarial threats.
- The Adversarial Patches Rearranged In COnText (APRICOT) dataset enables reproducible research on the real-world effectiveness of physical adversarial patch attacks on object detection systems.
- The Google Research Self-Study repository contains "test dummies" that represent a common idea or approach to build defenses.