Your Cookie Preferences

We use different types of cookies to optimize your experience on our website. Click on the categories below to learn more about their purposes. You may choose which types of cookies to allow and can change your preferences at any time. Remember that disabling cookies may affect your experience on the website. You can learn more about how we use cookies by visiting our

Essential Cookies

Provider: .providername.com

Name

Purpose

Type

Expires In

__cf_bm

Cloudflare places the cookie on end-user devices that access customer sites protected by Bot Management or Bot Fight Mode.

server_cookie

30 minutes

Provider: .providername.com

Name

Purpose

Type

Expires In

_tibcpv

Used to record unique visitor views of the consent banner.

http_cookie

1 year

Analytics and Customization Cookies

Name

Purpose

Marketo Munchkin

Marketo's custom JavaScript tracking code, called Munchkin, tracks all individuals who visit your website so you can react to their visits with automated marketing campaigns.

Name

Purpose

Google Tag

The Google tag (gtag.js) is a single tag you can add to a website to use a variety of Google products and services (e.g., Google Ads, Google Analytics, Campaign Manager, Display & Video 360, Search Ads 360).

Advertising Cookies

Provider: .providername.com

Name

Purpose

Type

Expires In

__cf_bm

Cloudflare places the cookie on end-user devices that access customer sites protected by Bot Management or Bot Fight Mode.

server_cookie

30 minutes

Provider: .providername.com

Name

Purpose

Type

Expires In

_tibcpv

Used to record unique visitor views of the consent banner.

http_cookie

1 year

August 17, 2023

minute read

Observations from the Generative Red Team Challenge at DEF CON

Author

Authors

Alie Fordyce

Alie is a Product Policy Manager at Robust Intelligence.

Dhruv Kedia

Dhruv is a Sr. Product Manager at Robust Intelligence.

The highly anticipated Generative Red Team (GRT) challenge took place last week as part of AI Village at DEF CON. Robust Intelligence was proud to sponsor the event, contribute to a talk, and send two members of our Product team, Alie Fordyce and Dhruv Kedia, as volunteers. The event was wildly popular, increasing awareness of vulnerabilities in generative models and the need for third-party testing. Below, we recap Alie and Dhruv’s experiences and key takeaways from the GRT.

About AI Village & DEF CON

The first DEF CON was held in 1993 as a convention founded on hacking and cybersecurity. Since then, DEF CON has evolved to become a meeting of AI and security experts, policymakers, reporters, students, researchers, and over 30, 000 members of the general public. Most notably, AI Village became an official part of this convention when it was founded in 2018, amassing an additional set of priorities for the event and attracting a broader set of attendees.

Generative Red Team Overview

This year’s DEF CON 31 AI Village was unique because it hosted a first-of-its-kind generative AI red teaming event,

AI Village (AIV) is hosting the first public generative AI red team event at DEF CON 31 with our partners at Humane Intelligence, SeedAI, and the AI Vulnerability Database. We will be testing models kindly provided by Anthropic, Google, Hugging Face, NVIDIA, OpenAI, and Stability with participation from Microsoft, on an evaluation platform developed by Scale AI. (Source: AI Village)

The AI Village GRT challenge drew participants and captured media attention from around the world, with special attention and support from the White House. Experienced and novice hackers jumped at the opportunity to attack some of the industry’s premier large language models (LLMs). At all hours of the convention, there was a long queue of eager hackers waiting their turn to attempt one of 24 challenges created by the organizers. Individuals were limited to 55 minutes per session on provided laptops.

The event was structured as a LLM jailbreaking contest: participants competed to collect points for any and as many of the challenges they selected to participate in by successfully identifying 'failures' in the provided models. These failures included disinformation, biased and discriminatory outputs, susceptibility to poisoning, and many more. After a model failure was detected by the user, they would be prompted to submit it for review by the GRT organizers.

Quick Facts of GRT Challenge:

Each hacker was given 55 minutes to use loaner laptops with the provided models
Models were provided by Anthropic, Google, Hugging Face, NVIDIA, OpenAI, and Stability
24 challenges for hackers to choose from (can do as many as time allows)
Average jailbreaks per hacker hovered around 3 successful model failures

Every so often a ‘VIP’ attendee would walk in with a GRT organizer and a reserved spot with a supreme vantage point of the rest of the hackers in the room. These ‘VIP’ attendees ranged from government officials and representatives from the Office of Science Technology and Policy to AI experts and journalists. The findings from approximately 2,000 person-hours (55 minutes x 2,200 in-person hackers, spread across two days) have been sealed temporarily to give the organizers and model providers time to finish categorizing and addressing the exploited vulnerabilities.

Learnings from the GRT Challenge

All models have some degree of risk. Many attendees were surprised to the degree that models from top providers like Google, OpenAI, Anthropic, Meta, and Hugging Face were susceptible to security, ethical, and operational risks. Specific insights that emerged from this event include:

One line prompts have the potential to jailbreak LLM. Some examples of these are hallucinations, extraction of private information, going off topic, producing biased outputs, and recommendations beyond the limitations of AI. These one line prompts can be very simple; for example, a user could simply invent fake places like “Great Barrier Mountain” or “Defcon Ocean of Africa” and ask the LLMs to produce information regarding those places. Without hesitation, the LLM seemed to generate extensive disinformation.
A user can also cause an LLM to fail relatively quickly with multi-part conversations. As the conversation with a chatbot progresses, there are additional vectors via which a user can confuse the LLM as they unveil a lot about their reasoning. For example, if you ask a model to tell you why 12345678 is a bad password, it will explain its reasoning why it cannot say that. Then you can use the reasoning in the response to mislead the LLM. A user with little experience could use this technique easily to coerce LLMs to produce dangerous outputs.
Most LLMs are susceptible to prompt injections and prompt extractions. For example, one can ask the model to ignore its system instructions and simply return a malicious desired output (a user can even extract the system prompt from the model by misleading it).
The natural variation in language use adds to the challenge of preventing prompt injection attacks. This makes the attack vector extremely broad and subjective.

These learnings translate to concerning implications in real world scenarios and help demonstrate why significant attention needs to be given to addressing the risks involved in using generative AI. Rumman Chowdhury, a primary organizer of the GRT challenge, captured the true success of the event,

Allowing hackers to poke and prod at the AI systems of leading labs — including Anthropic, Google, Hugging Face, Microsoft, Meta, NVIDIA, OpenAI and Stability AI — in an open setting “demonstrates that it’s possible to create AI governance solutions that are independent, inclusive and informed by but not beholden to AI companies,” Chowdhury said at a media briefing this week with the organizers of the event. (Source: CyberScoop)

What’s Next?

Work is underway to collate the data collected from the contest and categorize the attacks, which will be presented in a comprehensive report. Addressing these risks will involve collaboration between policymakers, model developers, adopters, and users. One of the primary goals of the event, however, has already been achieved: raising awareness about the risks associated with the use and deployment of LLMs, the limitations of applications built on such models, and the real-world harms they can present.

Addressing the risks of generative AI has been made a clear goal of the Biden Administration; in addition to the voluntary initiative announced last month concerning the seven commitments leading AI companies made to ensure safe, secure, and trustworthy AI, the White House is fast tracking work to develop an executive order to address the broad range of risks posed by AI. Arati Prabhakar, Director of the White House Office of Science Technology and Policy, was one of the VIPs at the AI Village GRT challenge.

Collaboration between operators and policymakers is set to play a major role in the development of effective regulatory guardrails for AI development, and Robust Intelligence is looking forward to taking part in that effort.

Author

Authors

Alie Fordyce

Alie is a Product Policy Manager at Robust Intelligence.

Dhruv Kedia

Dhruv is a Sr. Product Manager at Robust Intelligence.

Social

Follow us on LinkedIn

September 20, 2024

minute read

Extracting Training Data from Chatbots

For:

September 10, 2024

minute read

Leveraging Hardened Cybersecurity Frameworks for AI Security through the Common Weakness Enumeration (CWE)

For:

September 6, 2024

minute read

AI Governance Policy Roundup (August 2024)

For:

+ More Articles

August 9, 2023

minute read

Robust Intelligence partners with MITRE to Tackle AI Supply Chain Risks in Open-Source Models

For:

June 21, 2023

minute read

Generative AI Risk Assessment: Dolly 2.0

For:

March 31, 2023

minute read

Prompt Injection Attack on GPT-4

For:

+ More Articles

August 17, 2023

minute read

Observations from the Generative Red Team Challenge at DEF CON

Author

Authors

Alie Fordyce

Alie is a Product Policy Manager at Robust Intelligence.

Dhruv Kedia

Dhruv is a Sr. Product Manager at Robust Intelligence.

About AI Village & DEF CON

Generative Red Team Overview

This year’s DEF CON 31 AI Village was unique because it hosted a first-of-its-kind generative AI red teaming event,

AI Village (AIV) is hosting the first public generative AI red team event at DEF CON 31 with our partners at Humane Intelligence, SeedAI, and the AI Vulnerability Database. We will be testing models kindly provided by Anthropic, Google, Hugging Face, NVIDIA, OpenAI, and Stability with participation from Microsoft, on an evaluation platform developed by Scale AI. (Source: AI Village)

Quick Facts of GRT Challenge:

Each hacker was given 55 minutes to use loaner laptops with the provided models
Models were provided by Anthropic, Google, Hugging Face, NVIDIA, OpenAI, and Stability
24 challenges for hackers to choose from (can do as many as time allows)
Average jailbreaks per hacker hovered around 3 successful model failures

Learnings from the GRT Challenge

One line prompts have the potential to jailbreak LLM. Some examples of these are hallucinations, extraction of private information, going off topic, producing biased outputs, and recommendations beyond the limitations of AI. These one line prompts can be very simple; for example, a user could simply invent fake places like “Great Barrier Mountain” or “Defcon Ocean of Africa” and ask the LLMs to produce information regarding those places. Without hesitation, the LLM seemed to generate extensive disinformation.
A user can also cause an LLM to fail relatively quickly with multi-part conversations. As the conversation with a chatbot progresses, there are additional vectors via which a user can confuse the LLM as they unveil a lot about their reasoning. For example, if you ask a model to tell you why 12345678 is a bad password, it will explain its reasoning why it cannot say that. Then you can use the reasoning in the response to mislead the LLM. A user with little experience could use this technique easily to coerce LLMs to produce dangerous outputs.
Most LLMs are susceptible to prompt injections and prompt extractions. For example, one can ask the model to ignore its system instructions and simply return a malicious desired output (a user can even extract the system prompt from the model by misleading it).
The natural variation in language use adds to the challenge of preventing prompt injection attacks. This makes the attack vector extremely broad and subjective.

Allowing hackers to poke and prod at the AI systems of leading labs — including Anthropic, Google, Hugging Face, Microsoft, Meta, NVIDIA, OpenAI and Stability AI — in an open setting “demonstrates that it’s possible to create AI governance solutions that are independent, inclusive and informed by but not beholden to AI companies,” Chowdhury said at a media briefing this week with the organizers of the event. (Source: CyberScoop)

What’s Next?

Author

Authors

Alie Fordyce

Alie is a Product Policy Manager at Robust Intelligence.

Dhruv Kedia

Dhruv is a Sr. Product Manager at Robust Intelligence.

Blog

September 13, 2021

minute read

How NTT DATA Uses RIME to Increase Model Performance by 70%

For:

April 13, 2023

minute read

Security Risks Of Generative Al Open Source Software

For:

June 14, 2022

minute read

ML Security Evasion Competition 2022

For:

August 9, 2023

minute read

Robust Intelligence partners with MITRE to Tackle AI Supply Chain Risks in Open-Source Models

For:

June 21, 2023

minute read

Generative AI Risk Assessment: Dolly 2.0

For:

March 31, 2023

minute read

Prompt Injection Attack on GPT-4

For:

+ More Articles

Your Cookie Preferences

Essential Cookies

Provider: .providername.com

Provider: .providername.com

Analytics and Customization Cookies

Performance and Functionality Cookies

Advertising Cookies

Provider: .providername.com

Provider: .providername.com

Observations from the Generative Red Team Challenge at DEF CON

About AI Village & DEF CON

Generative Red Team Overview

Quick Facts of GRT Challenge:

Learnings from the GRT Challenge

What’s Next?

Follow us on LinkedIn

Related articles

Extracting Training Data from Chatbots

Leveraging Hardened Cybersecurity Frameworks for AI Security through the Common Weakness Enumeration (CWE)

AI Governance Policy Roundup (August 2024)

Related articles

Robust Intelligence partners with MITRE to Tackle AI Supply Chain Risks in Open-Source Models

Generative AI Risk Assessment: Dolly 2.0

Prompt Injection Attack on GPT-4

Ready to learn more?

Observations from the Generative Red Team Challenge at DEF CON

About AI Village & DEF CON

Generative Red Team Overview

Quick Facts of GRT Challenge:

Learnings from the GRT Challenge

What’s Next?

Related articles

How NTT DATA Uses RIME to Increase Model Performance by 70%

Security Risks Of Generative Al Open Source Software

ML Security Evasion Competition 2022

Robust Intelligence partners with MITRE to Tackle AI Supply Chain Risks in Open-Source Models

Generative AI Risk Assessment: Dolly 2.0

Prompt Injection Attack on GPT-4

Achieve AI Integrity Today

Your Cookie Preferences

Essential Cookies

Provider: .providername.com

Provider: .providername.com

Analytics and Customization Cookies

Performance and Functionality Cookies

Advertising Cookies

Provider: .providername.com

Provider: .providername.com

About AI Village & DEF CON

Generative Red Team Overview

Quick Facts of GRT Challenge:

Learnings from the GRT Challenge

What’s Next?

Follow us on LinkedIn

Subscribe to our newsletter

Related articles

Extracting Training Data from Chatbots

Leveraging Hardened Cybersecurity Frameworks for AI Security through the Common Weakness Enumeration (CWE)

AI Governance Policy Roundup (August 2024)

Related articles

Robust Intelligence partners with MITRE to Tackle AI Supply Chain Risks in Open-Source Models

Generative AI Risk Assessment: Dolly 2.0

Prompt Injection Attack on GPT-4

Ready to learn more?

About AI Village & DEF CON

Generative Red Team Overview

Quick Facts of GRT Challenge:

Learnings from the GRT Challenge

What’s Next?

Subscribe to our newsletter

Related articles

How NTT DATA Uses RIME to Increase Model Performance by 70%

Security Risks Of Generative Al Open Source Software

ML Security Evasion Competition 2022

Robust Intelligence partners with MITRE to Tackle AI Supply Chain Risks in Open-Source Models

Generative AI Risk Assessment: Dolly 2.0

Prompt Injection Attack on GPT-4

Achieve AI Integrity Today