May 17, 2021
minute read

AI Failures — Eliminate Them, Now


AI is the future of every business. The availability of large-scale data and computing power is making AI technologies transformational for organizations. AI is steadily trickling into industries far removed from high tech, creating entirely new categories of products and possibilities.

The Pain: AI Failures

Everything comes with a cost, however, and AI is not an exception. While the benefits of AI are immense, it also introduces serious risks. Here are a few examples:

  • Broken data pipelines feed in corrupted data to models, producing garbage outputs
  • Bugs are introduced in model serving causing the whole system to crash
  • Models are misused by engineers outside your data science organization
  • Corner case inputs you didn't account for during development break the model in production
  • Drift in data significantly degrades the performance of your models
  • Models make discriminatory decisions with you not being aware
  • Bad actors try to "hack" the model decisions by feeding in malicious inputs

The list goes on and on... These are all example symptoms of the underlying disease: AI failures.

Do any of these sound familiar to you? The chances are, if you've been involved with data science or machine learning, regardless of the industry or the companies you've been at, you've probably faced many problems like the above. I can say so with certainty — since the birth of the company, we've had countless conversations with AI practitioners in tech, finance, insurance, and government where they mentioned many of the issues listed above as the key challenges their AI teams face. We have also been the victims of these AI failures ourselves. Many members of the Robust Intelligence team have experienced this firsthand at companies ranging from large tech (Google, Uber, Salesforce) to mid-size tech (Wish, Postmates, Quora) to startups digital consulting firms. AI failures are prevalent and will only worsen as more companies adopt AI, build AI teams of increasing scale, and develop and deploy more models on more data.

Ignore AI failures at your own peril

Are AI failures that bad? If you're not convinced yet that it's a serious problem, let's consider some of the consequences of leaving them within your AI systems:

First, your data pipeline and model system will break. With issues like bugs and broken data pipelines, your AI system will crash, literally. Not only will it break, but it will also break all the time. If you've worked within this broad spectrum spanning data infrastructure engineering to model prototyping and productionization, you know how fragile these systems are. Data and ML pipelines are always actively under development, and the characteristics of the data change all the time.

Consequently, you will have to firefight these issues in production, leaving no room for focused development work. You and your team will waste your precious time digging through error logs, identifying the root cause of the problem, all while your model is crashing. How wasteful and nerve-wracking is that!

Even when you've fixed all the visible errors in your pipeline, you have only solved a subcomponent of the bigger problem. Perhaps even more pernicious forms of AI failure are the silent errors. The tricky thing with AI models is that even when the models are taking in garbage input or producing garbage output, they're not necessarily going to crash. For example, when the model is doing terribly on a specific subset of the data, or when the distribution of the input data is changing drastically and inducing wrong model predictions, you will not by default see any error logs or PagerDuty alerts in your system monitoring dashboard. These silent errors in your system are tough to triage and will have subtle but compounding effects on your downstream metrics. The model will continue to produce garbage predictions silently until a month later; you realize your customer churn is higher than ever.

Silent errors: garbage in garbage out behavior is not captured as system failures

Why is it so hard to get rid of these risks?

Most of the time, the priorities of data science teams are elsewhere, e.g., in developing more performant models, generating a new set of features, or improving the latency of the model service. Data scientists and machine learning engineers will, at best, get few hours a week to think about these risks. As a result, the risks are only partially tackled in manual ways, and they continue to pile up.

Data scientists specifically tend to focus on ad-hoc efforts towards model improvement. Yet this means data science teams will never eliminate AI failures at the organizational level. If one data scientist asserts model behavior differently than another data scientist, it will be challenging to tell whether a model is production-ready or a post-production model is performing as anticipated.

Finally, eliminating AI failure is dang hard. While the field of software engineering contains widely established practices of testing and documentation, ML-specific engineering introduces specific complexities, hidden dependencies, and anti-patterns unique to data pipelines and AI models (Sculley et al.).

There needs to be a way to measure AI failures in models across your organization in a unified manner. However, this entails both AI and engineering challenges:

  • AI challenge: how would you measure and eliminate AI failure across your models exhaustively, effectively, and consistently?
  • Engineering challenge: how would you build an infrastructure that ensures both developed and deployed models to be constantly evaluated for AI failure?

These challenges are extremely tricky, and it is nearly impossible to overcome them while developing the actual AI models for your business needs.

Let's eliminate AI failures, together

The good news is that you're not tackling this problem alone, not anymore. At Robust Intelligence, we've translated years of research and industry experience to build Robust Intelligence Model Engine (RIME), with a single goal of eliminating AI failures. The platform provides two complementary tools that work in conjunction: automated unit testing of pre-production models and automated quality assurance of in-production models to ensure that your AI system is risk-free. I'll keep the product intro brief here, as the main purpose of this post is to introduce the concept of AI failure and convince you of their seriousness. In our upcoming posts, we'll discuss the underlying principle that drives our product and why it's so effective at eliminating AI failures. In the meantime, if you'd like to learn more, feel free to reach out to Kojin Oshiba at


Related articles

August 9, 2022
minute read

Introducing the ML Model Attribution Challenge

November 16, 2021
minute read

Zillow iBuying: What Happened and Lessons Learned

November 10, 2021
minute read

A New Frontier of Risk in Healthcare: Artificial Intelligence