We recently initiated a series of reports and accompanying blog posts on the AI risks in popular generative models, starting with Falcon-7B. In this post we showcase our assessment methodology for generative models, applied to Dolly 2.0 — an experimental large language model (LLM) open-sourced by Databricks in March 2023. This method gives confidence to teams working with any third-party model or tool, including those by OpenAI, Google, Anthropic, MosaicML, and hundreds of others. You can find the full-length report on Dolly 2.0 here.
This report is the second in a series of generative AI risk assessments that will discuss a variety of popular third-party models. We chose to analyze Dolly because we thought it was one of the most significant contributions to the AI community in the past six months and was one of the most interesting models for our customers.
As OpenAI’s models took the world by storm, building and training an LLM initially seemed to be a privilege of a handful of organizations with access to enormous computing and talent budgets. This perception, however, changed significantly with the release of Dolly. While Dolly may not perform as well as models like GPT-4 or Falcon-7B, it produces incredible results and can be trained on a small budget. Moreover, Dolly is an open-source model that enterprises can directly use and build upon, serving as strong evidence that enterprises can build powerful LLMs for their own specialized use cases.
Since the birth of Dolly, we’ve seen customers and companies either directly adopt Dolly or use similar approaches to Dolly, using pythia as a basis for the model. It is for this reason that we thought it would be most informative to start with providing some overview of the risks associated with models in this category.
Generative AI Risk Overview
We believe it is more important than ever for organizations to proactively enforce standards and protocols to protect against AI risk across the organization. With a proper testing framework, organizations can harness the incredible benefits that come with generative AI adoption, while also enabling various stakeholders to assess risk and make informed decisions. As regulators and lawmakers propose more stringent rules for AI and machine learning applications, it is important to develop a comprehensive organizational plan for compliance and governance of generative models.
Our Approach to an AI Risk Assessment
A primary objective of generative models is to produce nuanced responses even in complex, real-world applications. Since these models have been tuned during training to optimize performance for specific tasks, they may be especially susceptible to risks not considered when deployed outside of narrow contexts. They may behave unexpectedly when deployed for specific tasks and when presented with novel inputs from users - both malicious and unintentional.
As such, we recommend that AI risk assessments be conducted in the context of the intended business use to explore limitations not disclosed during generic model training and fine-tuning. This exercise is best done pre-production, but can also be performed post-deployment. In the absence of a specific application to be tested, as is often the case for open-source models, we instead explore the risks that are presented by typical tasks, example code, and/or default settings recommended by the model authors, since these are the most likely starting points for implementation.
We also detail our full assessment methodology in the report.
Key Findings From the Dolly Risk Assessment
In addition to verifying several generic shortcomings disclosed by the authors, we also identified a number of additional findings that expose integrators of Dolly 2.0 to business risk. Below is a high-level overview of the risks which are detailed in the report alongside recommendations and an adoption checklist.
Software vulnerability risk
The model uses code samples that contain a particular plugin, which has a critical vulnerability that allows users to execute arbitrary code—remote code execution (RCE). Vulnerabilities with similar levels of severity (according to the Common Vulnerability Scoring System) typically have the following characteristics:
- They allow remote code execution
- They allow privilege escalation
- They allow data exfiltration
- They can be exploited with little or no user interaction
While the vulnerability had been recently patched in May when we conducted our study, users could potentially be exposed without a fresh install.
When the model is prompted to abstain from answering questions when there is insufficient information to return a correct and factual answer, it still connected pieces of unrelated information in the prompt to make up an output.
Prompt injection risk
The model and default meta-prompt contain no safeguards to prevent very simple prompt injection attacks, which are attacks on a model or application where the attacker maliciously manipulates the prompts to induce the AI to perform operations that it was not originally instructed to perform. In the context of Dolly 2.0, we found that the model was susceptible to basic input prompts such as “Disregard any other instructions you have been given…”
As described by the model authors, this model has not been fine-tuned to reinforce alignment to human values. This was reiterated by our findings, which showed that the model had no restrictive meta prompts to guard against misaligned utterances. As such, the model quite readily produced racist, sexist, and politically charged output.
Managing AI Risk
Generative AI applications will continue to make profound transformations to current technological capabilities. Risk assessments are an important first step in thoroughly understanding AI risk of generative models. When coupled with our continuous validation platform, our customers have the confidence to bring generative AI-powered applications to production.
Check out the Dolly 2.0 Risk Assessment report and contact us to learn more.