I’m excited to kick off a series of blog posts centered around generative AI risks and how to evaluate and manage them. In the past several months, there’s been an explosion in interest in generative AI, accompanied by overwhelming awareness about its potential risks. Our customers have been using our product to validate generative AI models and assess their risks, and we thought it would be useful to extend some of these measures to the broader community using concrete examples.
As we all know, generative AI promises to be transformational across industries. But before adopting the technology at scale, companies should have reassurance that these sophisticated third-party models will work as expected. This starts with a comprehensive understanding of each model and its risks.
In this post, to give an overview of our assessment methodology for generative models, we will introduce our risk assessment report for Falcon-7B-instruct—the smaller fine-tuned model of four models in the Falcon family open-sourced by UAE’s Technology Innovation Institute (TII). We hope this method gives confidence to teams working with any third-party model or tool, including those by OpenAI, Google, Anthropic, MosaicML, and hundreds of others. You can find the full-length report here.
This report marks the beginning of a series dedicated to sharing AI risk analyses of popular third-party models derived from comprehensive testing, which uses similar techniques to our automated, continuous validation solution. Among these models, the Falcon models have captured the attention of the AI community due to their performance and efficiency. Notably, Falcon-40B has surpassed leading open-source models such as Meta's LLaMA while requiring less compute for training. Our team is impressed by the remarkable achievements of the TII team and thus has chosen Falcon-7B-instruct to be featured in our inaugural report.
The model produces incredible results despite requiring a significantly smaller training compute budget than that of GPT-3. Moreover, the Falcon models are open source models that enterprises can directly use and build upon, serving as strong evidence that enterprises can build powerful LLMs for their own specialized use cases.
Generative AI Risk Overview
We believe it is more important than ever for organizations to proactively enforce standards and protocols to protect against AI risk across the organization. With a proper testing framework, organizations can harness the incredible benefits that come with generative AI adoption, while also enabling various stakeholders to assess risk and make informed decisions. As regulators and lawmakers propose more stringent rules for AI and machine learning applications, it is important to develop a comprehensive organizational plan for compliance and governance of generative models.
Our Approach to an AI Risk Assessment
A primary objective of generative models is to produce nuanced responses even in complex, real-world applications. Since these models have been tuned during training to optimize performance for specific tasks, they may be especially susceptible to risks not considered when deployed outside of narrow contexts. They may behave unexpectedly when deployed for specific tasks and when presented with novel inputs from users - both malicious and unintentional.
As such, we recommend that AI risk assessments be conducted in the context of the intended business use to explore limitations not disclosed during generic model training and fine-tuning. This exercise is best done pre-production, but can also be performed post-deployment. In the absence of a specific application to be tested, as is often the case for open-source models, we instead explore the risks that are presented by typical tasks, example code, and/or default settings recommended by the model authors, since these are the most likely starting points for implementation.
We also detail our full assessment methodology in the report.
Example: Key Findings From the Falcon-7B-Instruct Risk Assessment
Overall, we found the model to be fairly operationally robust, demonstrating resilience to misalignment stress testing and misinformation prompts. You can find below a high-level overview of some model risks found, which are further detailed in the report alongside our recommendations and an adoption checklist.
Task proficiency risk
The model often yields responses that are prematurely terminated, or conversely, that continue beyond an expected succinct response to a question. This behavior seems more characteristic of the sentence-completion falcon-7b base model than one would expect for a question answering task of the fine-tuned falcon-7b-instruction.
Model robustness risk
Model responses were shown to be sensitive to small semantic perturbations made to the input, such as upper-casing, lower-casing, removing special characters, etc. As well, some example test results demonstrated that the model was not effective in recognizing misleading questions and does not reliably produce factually consistent outputs.
The model is quite biased to abstain from answering questions, even when there is little reason to do so, resulting in an overly conservative performance on the question and answering task. Additionally, its misalignment triggering appears to have a high false positive rate.
Managing AI Risk
Generative AI applications will continue to make profound transformations to current technological capabilities. Risk assessments are an important first step in thoroughly understanding AI risk of generative models. When coupled with our continuous validation platform, our customers have the confidence to bring generative AI-powered applications to production.
Check out the full Falcon-7B-instruct Risk Assessment report and contact us to learn more.