October 20, 2021
minute read

Become an Early Champion of ML Quality Assurance


The world is demanding a safe and secure pathway to an AI-enabled society. Many companies wish to respond to this call-to-action, but find that adequate risk management is time-consuming, expensive, or distracting from more interesting priorities. Yet, as regulations appear on the horizon and more machine learning models are deployed in complex data environments, more and more companies are looking to implement quality assurance for their machine learning pipelines.

It turns out, thankfully, that prioritizing the security and robustness of your ML pipelines is a money-saving and smart move if done early.

Unlike testing manually coded models, AI systems testing is a much more complex challenge because ML-based behavior cannot be easily specified in advance to deployment in real-world situations.

This means that incoming data also needs rigorous testing and models should undergo production practices. Essentially, it’s important to treat your ML projects like an elite soccer team: it is advisable to play a couple of friendly games before the season starts and points seriously begin to matter. 

And yet, some companies are scoring own-goals with models rushed into production:

Illegal gender discrimination: companies have failed to address the operational risks associated with credit decision-making algorithms. In some cases, women were receiving 20x smaller credit limits than men, despite having better credit scores in general. The same risks exist for algorithms involved with insurance policies and loan applications.

Empty shopping carts: recommendations systems for e-commerce stores have been known to break. When this happens, customers either receive bad recommendations or no recommendations at all, which leaves businesses with empty shopping carts and unengaged shoppers.

Minor bodily harm: at the Tokyo Paralympics a self-driving vehicle hit a visually impaired athlete resulting in minor injuries. Although this occurred at 1mph, the slow speed crash proves that autonomous driving systems are very far away from being reliable and safe, especially at normal or fast traffic speeds. 

The available list of ML fails and risks goes on and is growing, but what they all have in common is a rush to deploy models without an equally enthusiastic commitment to eliminate its risks. Whether the models are inaccurate, buggy or biased, your stakeholder and customer interests are hurt.

Thankfully, customer churn still haunts many meeting rooms so it should be easy to convince your decision-makers to invest in products like RIME, especially if your ML outputs are driving business intelligence, strategy, and revenue. 

Speaking of the bottom line, there is no better waste of labor hours than a long climb up the garbage heap produced by unmonitored and untested models. The effort and time spent to fix misbehaving models will accrue technical debt that pokes all stakeholders in the ribs, haunts budget meetings, and erodes trust in a project’s direction and efficacy.

So how to avoid the trap of technical debt? Be a champion for stress testing and continuous monitoring for your data and models from day one. 

Doing so will save time and money. For example, you’ll want to figure out how your models perform on unseen data, how sensitive the model is to various input changes/drifts, and so on.

Of course, these considerations can be overwhelming, so it can be wise to ask for help

At Robust Intelligence, we find that our platform unveils the unknown unknowns when it comes to your model behavior. Things like data changes upstream (eg. nothing explicitly breaks or sets off any engineering alerts) have downstream effects that are not so visible to the model maintainers.

By investing in our automated quality assurance early, you will deploy models faster, eliminate redundancy, free up your budget for exciting new projects, and maybe (just maybe!) reward your team with a outing more adventurous than your local mini-golf.


Related articles