Al Security Risks

A taxonomy of the most common Al safety and security threats.

AI Security Necessitates a New Approach

The adoption of AI applications introduces an entirely new set of safety and security risks which differ from traditional cybersecurity challenges. Conventional approaches focus on protecting systems and data from unauthorized access and known vulnerabilities, but do not effectively secure the attack surface of AI applications and their underlying models. New algorithmic attack methodologies only add to this complexity because their patterns are dynamic and scalable.

AI application security necessitates a fundamentally new approach which considers threats to every stage of the AI lifecycle, from data poisoning and model backdoors to prompt injection and sensitive data leakage. We developed this taxonomy to educate organizations on the top risks to AI security today, complete with descriptions, examples, and mitigation techniques.
Risks at different points of the AI lifecycle
Alert/Risk logo
Alert/Risk logo
Alert/Risk logo
Alert/Risk logo
Alert/Risk logo
Alert/Risk logo
Learn where risk is introduced at different points of the Al lifecycle.

Top AI Security and Safety Risks

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Production Risk
All Resources

Prompt Injection (Direct)

Direct prompt injections are adversarial attacks that attempts to alter or control the output of an LLM by providing instructions via prompt that override existing instructions. These outputs can include harmful content, misinformation, or extracted sensitive information such as PII or model instructions.

Impact:
Mitigation:

Input / output filtering

Standards:
MITRE ATLAS

AML.T0051.000 - LLM Prompt Injection: Direct

OWASP TOP 10 for LLM Applications

LLM01 - Prompt Injection

Production Risk
All Resources

Prompt Injection (Indirect)

Indirect prompt injection requires an adversary to control or manipulate a resource consumed by an LLM such as a document, website, or content retrieved from a database. This can direct the model to expose data or perform a malicious action like distribution of a phishing link.

Impact:
Mitigation:

Input / output filtering; sanitize data sources

Standards:
MITRE ATLAS

AML.T0051.001 - LLM Prompt Injection: Indirect

OWASP TOP 10 for LLM Applications

LLM01 - Prompt Injection

Production Risk
All Resources

Jailbreaks

A jailbreak refers to any prompt-based attack designed to bypass model safeguards to produce LLM outputs that are inappropriate, harmful, or unaligned with the intended purpose. With well-crafted prompts, adversaries can access restricted functionalities or data, and compromise the integrity of the model itself.

Impact:
Mitigation:

Input / output filtering; guardrails

Standards:
MITRE ATLAS

AML.T0054 - LLM Jailbreak

OWASP TOP 10 for LLM Applications

LLM01 - Prompt Injection

Production Risk
All Resources

Meta Prompt Extraction

Meta prompt extraction attacks aim to derive a system prompt which effectively guides the behavior of a LLM application. This information can be exploited by attackers and harm the intellectual property, competitive advantage, and reputation of a business.

Impact:
Mitigation:

Input / output filtering

Standards:
MITRE ATLAS

AML.T0051.000 - LLM Prompt Injection: Direct

Production Risk
All Resources

Sensitive Information Disclosure

Sensitive information disclosure refers to any instance where confidential or sensitive data such as PII or business records is exposed through vulnerabilities in an AI application. These privacy violations can lead to loss of trust and result in legal or regulatory consequences.

Impact:
Mitigation:

Input / output filtering; sanitize data sources

Standards:
OWASP TOP 10 for LLM Applications

LLM06 - Sensitive Information Disclosure

Production Risk
All Resources

Privacy Attack

A privacy attack refers broadly to any attack aimed at extracting sensitive information from an AI model or its data. This category includes model extraction, which recreates a functionally equivalent model by probing target model outputs, and membership inference attacks, which determine if specific data records were used for model training.

Impact:
Mitigation:

Data scrubbing; input / output filtering

Standards:
MITRE ATLAS

AML.T0024.000 - Infer Training Data Membership

AML.T0024.001 - Invert ML Model

AML.T0024.002 - Extract ML Model

OWASP TOP 10 for LLM Applications

LLM06 - Sensitive Information Disclosure

Development Risk
All Resources

Training Data Poisoning

Training data poisoning is the deliberate manipulation of training data in order to compromise the integrity of an AI model. This can lead to skewed or biased outputs, backdoor trigger insertions—malicious links, for example—and ultimately, a loss of user trust.

Impact:
Mitigation:

Sanitize training data

Standards:
MITRE ATLAS

AML.T0020 - Poison Training Data

OWASP TOP 10 for LLM Applications

LLM03 - Training Data Poisoning