Your Cookie Preferences

We use different types of cookies to optimize your experience on our website. Click on the categories below to learn more about their purposes. You may choose which types of cookies to allow and can change your preferences at any time. Remember that disabling cookies may affect your experience on the website. You can learn more about how we use cookies by visiting our

Essential Cookies

Provider: .providername.com

Name

Purpose

Type

Expires In

__cf_bm

Cloudflare places the cookie on end-user devices that access customer sites protected by Bot Management or Bot Fight Mode.

server_cookie

30 minutes

Provider: .providername.com

Name

Purpose

Type

Expires In

_tibcpv

Used to record unique visitor views of the consent banner.

http_cookie

1 year

Analytics and Customization Cookies

Name

Purpose

Marketo Munchkin

Marketo's custom JavaScript tracking code, called Munchkin, tracks all individuals who visit your website so you can react to their visits with automated marketing campaigns.

Name

Purpose

Google Tag

The Google tag (gtag.js) is a single tag you can add to a website to use a variety of Google products and services (e.g., Google Ads, Google Analytics, Campaign Manager, Display & Video 360, Search Ads 360).

Advertising Cookies

Provider: .providername.com

Name

Purpose

Type

Expires In

__cf_bm

Cloudflare places the cookie on end-user devices that access customer sites protected by Bot Management or Bot Fight Mode.

server_cookie

30 minutes

Provider: .providername.com

Name

Purpose

Type

Expires In

_tibcpv

Used to record unique visitor views of the consent banner.

http_cookie

1 year

June 21, 2024

minute read

AI Cyber Threat Intelligence Roundup: June 2024

Threat Intelligence

Author

Authors

Adam Swanda

Adam is an AI Security Researcher at Robust Intelligence.

At Robust Intelligence, AI threat research is fundamental to informing the ways we evaluate and protect models on our platform. In a space that is so dynamic and evolving so rapidly, these efforts help ensure that our customers remain protected against emerging vulnerabilities and adversarial techniques.

This monthly threat roundup consolidates some useful highlights and critical intel from our ongoing threat research efforts to share with the broader AI security community. As always, please remember this is not an exhaustive or all-inclusive list of AI cyber threats, but rather a curation that our team believes is particularly noteworthy.

Notable Threats and Developments: June 2024

Special Characters Attack for training data extraction

The Special Characters Attack (SCA) exploits special characters—parentheses, hyphens, underscores, and punctuation marks, for example—in combination with standard English letters to create attack sequences that can extract raw training data from LLMs.

This technique exploits the co-occurrence patterns of these characters in training data to induce the model to generate memorized content including sensitive private information, code, prompt templates, and chat messages from previous conversations.

The researchers behind the SCA technique determined that special characters are generally more effective memory triggers than English characters alone, and that duplication of similar symbols is especially effective at inducing data leakage. This could be further enhanced by using logit biases to increase the likelihood of generating control tokens, resulting in a significantly increased likelihood of data leakage.

AI Lifecycle Stage: Production
Relevant Use Cases: AI Chatbots & AI Agents

MITRE ATLAS: AML.T0057 - LLM Data Leakage
Reference: https://arxiv.org/pdf/2405.05990

Chain of Attack jailbreak

The Chain of Attack (CoA) is a contextual, multi-turn jailbreak technique which gradually elicits LLMs to produce harmful responses through a series of interactions which are guided and continuously refined by a helpful secondary LLM.

This technique operates through three main steps: generating seed attack chains, executing attack actions, and updating attack prompts based on model responses. The seed attack chain generator creates initial prompts that evolve with each turn, maintaining thematic consistency and gradually increasing the semantic relevance to the target task. The attack chain executor systematically inputs these prompts into the target model, evaluating the responses for their alignment with the attack objectives.

Researchers demonstrated the effectiveness of the Chain of Attack against models such as Vicuna-13B, Llama-2-7B, and GPT-3.5 Turbo, with attack success rates detailed in the table below.

The iterative, branching nature of the CoA technique is reminiscent of other similar techniques we’ve covered in the past including our own Tree of Attacks with Pruning (TAP) algorithmic jailbreak.

AI Lifecycle Stage: Production
Relevant Use Cases: AI Chatbots & AI Agents

MITRE ATLAS: AML.T0054 - LLM Jailbreak
Reference: https://github.com/YancyKahn/CoA

Sleepy Pickle vulnerability

New research shared on the Trail of Bits blog introduces Sleepy Pickle, a technique that expands on previous pickle file exploits by enabling an adversary to directly and discreetly compromise a model itself.

Pickle is a commonly used Python serialization format in machine learning whose security risks are already well understood. Adversaries can insert malicious code into pickle files to deliver discrete payloads after distribution and deserialization. In their Hub documentation, Hugging Face discussed the risks of malicious pickle files, outlining potential mitigation strategies and detailing their own security scanning measures.

Previous pickle file exploits relied on the distribution of directly malicious models. Instead, the Sleepy Pickle technique executes a custom function to compromise the model after deserialization. This delayed code modification makes the Sleepy Pickle technique dangerous, highly customizable, and far more difficult to detect. In the original blog, author Boyan Milanov shares three examples of attacks that leverage the Sleepy Pickle technique, spreading disinformation, stealing sensitive user data, and mounting a broader phishing campaign.

AI Lifecycle Stage: Development
Relevant Use Cases: Foundation Models, RAG Applications, AI Chatbots & AI Agents

MITRE ATLAS: AML.T0011.000 - User Execution: Unsafe ML Artifacts; AML.T0018 - Persistence: Backdoor ML Model
Reference: Trail of Bits Blog Part 1 & Part 2

More Threats to Explore

The “L1B3RT45” jailbreak repository is a collection of jailbreak prompts that do not use a single technique, but rather rely on a combination of novel and previously identified methodologies. Examples include requesting leet speak responses or prompting the model to reply with “GODMODE” enabled.

MITRE ATLAS: AML.T0054 - LLM Jailbreak
Reference: https://github.com/elder-plinius/L1B3RT45

Author

Authors

Adam Swanda

Adam is an AI Security Researcher at Robust Intelligence.

Social

Follow us on LinkedIn

September 20, 2024

minute read

Extracting Training Data from Chatbots

For:

September 10, 2024

minute read

Leveraging Hardened Cybersecurity Frameworks for AI Security through the Common Weakness Enumeration (CWE)

For:

September 6, 2024

minute read

AI Governance Policy Roundup (August 2024)

For:

+ More Articles

No items found.

+ More Articles

June 21, 2024

minute read

AI Cyber Threat Intelligence Roundup: June 2024

Threat Intelligence

Author

Authors

Adam Swanda

Adam is an AI Security Researcher at Robust Intelligence.

Notable Threats and Developments: June 2024

Special Characters Attack for training data extraction

AI Lifecycle Stage: Production
Relevant Use Cases: AI Chatbots & AI Agents

MITRE ATLAS: AML.T0057 - LLM Data Leakage
Reference: https://arxiv.org/pdf/2405.05990

Chain of Attack jailbreak

Researchers demonstrated the effectiveness of the Chain of Attack against models such as Vicuna-13B, Llama-2-7B, and GPT-3.5 Turbo, with attack success rates detailed in the table below.

The iterative, branching nature of the CoA technique is reminiscent of other similar techniques we’ve covered in the past including our own Tree of Attacks with Pruning (TAP) algorithmic jailbreak.

AI Lifecycle Stage: Production
Relevant Use Cases: AI Chatbots & AI Agents

MITRE ATLAS: AML.T0054 - LLM Jailbreak
Reference: https://github.com/YancyKahn/CoA

Sleepy Pickle vulnerability

AI Lifecycle Stage: Development
Relevant Use Cases: Foundation Models, RAG Applications, AI Chatbots & AI Agents

MITRE ATLAS: AML.T0011.000 - User Execution: Unsafe ML Artifacts; AML.T0018 - Persistence: Backdoor ML Model
Reference: Trail of Bits Blog Part 1 & Part 2

More Threats to Explore

MITRE ATLAS: AML.T0054 - LLM Jailbreak
Reference: https://github.com/elder-plinius/L1B3RT45

Author

Authors

Adam Swanda

Adam is an AI Security Researcher at Robust Intelligence.

Blog

June 16, 2023

minute read

Bias Audits, NYC and Beyond

For:

Compliance Teams

January 23, 2023

minute read

Robust Intelligence Recognized in Gartner’s 2023 Market Guide for AI Trust, Risk and Security Management

For:

March 3, 2023

minute read

Effective AI Governance with Robust Intelligence

For:

No items found.

+ More Articles

Your Cookie Preferences

Essential Cookies

Provider: .providername.com

Provider: .providername.com

Analytics and Customization Cookies

Performance and Functionality Cookies

Advertising Cookies

Provider: .providername.com

Provider: .providername.com

AI Cyber Threat Intelligence Roundup: June 2024

Notable Threats and Developments: June 2024

Special Characters Attack for training data extraction

Chain of Attack jailbreak

Sleepy Pickle vulnerability

More Threats to Explore

Follow us on LinkedIn

Related articles

Extracting Training Data from Chatbots

Leveraging Hardened Cybersecurity Frameworks for AI Security through the Common Weakness Enumeration (CWE)

AI Governance Policy Roundup (August 2024)

Related articles

Ready to learn more?

AI Cyber Threat Intelligence Roundup: June 2024

Notable Threats and Developments: June 2024

Special Characters Attack for training data extraction

Chain of Attack jailbreak

Sleepy Pickle vulnerability

More Threats to Explore

Related articles

Bias Audits, NYC and Beyond

Robust Intelligence Recognized in Gartner’s 2023 Market Guide for AI Trust, Risk and Security Management

Effective AI Governance with Robust Intelligence

Achieve AI Integrity Today

Your Cookie Preferences

Essential Cookies

Provider: .providername.com

Provider: .providername.com

Analytics and Customization Cookies

Performance and Functionality Cookies

Advertising Cookies

Provider: .providername.com

Provider: .providername.com

Notable Threats and Developments: June 2024

Special Characters Attack for training data extraction

Chain of Attack jailbreak

Sleepy Pickle vulnerability

More Threats to Explore

Follow us on LinkedIn

Subscribe to our newsletter

Related articles

Extracting Training Data from Chatbots

Leveraging Hardened Cybersecurity Frameworks for AI Security through the Common Weakness Enumeration (CWE)

AI Governance Policy Roundup (August 2024)

Related articles

Ready to learn more?

Notable Threats and Developments: June 2024

Special Characters Attack for training data extraction

Chain of Attack jailbreak

Sleepy Pickle vulnerability

More Threats to Explore

Subscribe to our newsletter

Related articles

Bias Audits, NYC and Beyond

Robust Intelligence Recognized in Gartner’s 2023 Market Guide for AI Trust, Risk and Security Management

Effective AI Governance with Robust Intelligence

Achieve AI Integrity Today