Adversarial Evasion on LLMs

Guerraoui, Rachid; Pinot, Rafael

doi:10.1007/978-3-031-54827-7_20

book part or chapter

Adversarial Evasion on LLMs

Guerraoui, Rachid

•

Pinot, Rafael

January 1, 2024

Large Language Models in Cybersecurity: Threats, Exposure and Mitigation

While Machine Learning (ML) applications have shown impressive achievements in tasks such as computer vision, NLP, and control problems, such achievements were possible, first and foremost, in the best-case-scenario setting. Unfortunately, settings where ML applications fail unexpectedly, abound, and malicious ML application users or data contributors can trigger such failures. This problem became known as adversarial example robustness. While this field is in rapid development, some fundamental results have been uncovered, allowing some insight into how to make ML methods resilient to input and data poisoning. Such ML applications are termed adversarially robust. While the current generation of LLMs is not adversarially robust, results obtained in other branches of ML can provide insight into how to make them adversarially robust. Such insight would complement and augment ongoing empirical efforts in the same direction (red-teaming).