February 16, 2023
-
3
minute read

Fairness and Bias Testing with Robust Intelligence

Product Updates

Fairness in AI is important to protecting fundamental civil rights of all groups and individuals. AI is at risk of exacerbating biases that exist in society throughout the ML lifecycle by perpetuating detrimental decision-making outcomes, putting marginalized groups especially at risk. Managing risk in AI models is inherently challenging because models are often treated as black boxes, overlooking how their design and implementation can contribute to inequities. With a heavy dependence on data that is often embedded with systemic biases, models are susceptible to amplifying unjust and bias outcomes for far larger audiences than traditional modeling techniques. It is because of these reasons that fairness and bias testing is extremely important as part of the ML lifecycle, so that any discriminatory harm these tools may have can be accounted for and eliminated.

The past year has experienced a surge in national and international attention on fairness in AI, as part of larger efforts in Responsible AI, Trustworthy AI, and Ethical AI, to name a few. Wide-spread frameworks and proposals such as the White House Bill of Rights, NIST’s AI Risk Management Framework, the EU AI Act and event state-specific regulation like the New York City Local Law 144 (which requires bias auditing of automatic-employment-decision-tools), have all contributed to an emphasis on keeping enterprises accountable for potential discriminatory harm caused by AI.

For enterprises looking to both stay legally compliant as well as uphold high standards for anti-bias and building fair AI systems, Robust Intelligence provides a solution to audit and mitigate model biases that can be harmful to marginalized groups.

Robust Intelligence’s mission is to eliminate risk associated with deploying AI models. To this aim, we developed an end-to-end solution that continuously validates models and data at every stage of the ML lifecycle. From pre-deployment vulnerability detection and remediation to post-deployment monitoring and protection, Robust Intelligence gives organizations the confidence to scale models in production across a variety of use cases and modalities.

Examples of commonly used fairness tests

One key part of Robust Intelligence’s comprehensive, test-driven approach is our Fairness Risk test suite, within which we offer tools to comprehensively detect and monitor machine learning systems to prevent discriminatory model biases.

1. Demographic Parity:

Demographic parity is one of the most well-known and strict measures of fairness. It is meant to be used in a setting where we assert that the base label rates between subgroups should be the same, even if empirically they are different.  This test works by computing the Positive Prediction Rate (PPR) metric over all subsets of each protected feature (e.g. race) in the data. It raises an alert if any subset has a PPR that is below 80% of the subset with the highest rate. This test is in-line with the Four-Fifths law, a regulation that requires the selection rate (or Positive Prediction Rate) for any race, sex, or ethnic group to be at least four-fifths (80%), of the group with highest rate, or otherwise is treated as evidence of adverse impact.

2. Intersectional Group Fairness:

Most existing work in the fairness literature deals with a binary view of fairness - either a particular group is performing worse or not. This binary categorization misses the important nuance of the fairness field - that biases can often be amplified in subgroups that combine membership from different protected groups, especially if such a subgroup is particularly underrepresented in opportunities historically. The intersectional group fairness test uses positive prediction rate to ensure that all subsets in the intersection between any two protected groups perform similarly.

3. Disparate Impact Ratio:

The New York City AI Hiring Law (Local Law 144), set to go into effect in April 2023, lists this metric of disparate impact ratio as a requirement to be included in bias audits. Impact ratio is usually defined as the selection rate for a group divided by the selection rate of the most selected category. This test is run over all subsets of all protected features in the data provided, as well as all intersectional groups.

4. Discrimination by Proxy:

Removing bias from models is not as easy as removing protected features from training data such that the model does not learn from them. Rather, these features may have dependencies on other features that are included in the model, which could lead the model to still train over protected features by proxy. This test checks whether any feature is a proxy for a protected feature by using mutual information as a measure of similarity with a protected feature. This provides insights for any bias audit regulations that may require information about protected proxy features.

Additional Robust Intelligence Fairness Capabilities

The above tests are a subset of our test suite that can not only be run pre-deployment, but, more importantly, as a monitoring pipeline with built-in notifications and alerting features. With our newly-added Fairness Monitoring functionality, we provide customers the ease and transparency to be able to track model biases over features, subsets, and intersections on every batch of new data as it comes in. Altogether, our automated detection, root cause analysis, and emphasis on customizability ensures that customers are able to evaluate models on a specific set of performance expectations.

Robust Intelligence offers the ability to autogenerate model cards as a way to simplify and streamline documentation of these testing results for bias auditing, self-imposed regulation, and external compliance reporting requirements. Our model cards functionality makes it easier for governance, risk, and compliance teams to work together with data science teams to align technical insights with regulatory requirements.

At Robust Intelligence, we strive to mitigate AI risk and instill model integrity. This is especially important in the context of making ethically-charged decisions that directly impact marginalized groups. Our platform arms customers with a tool to understand model behavior, identify model biases, and incorporate insights to achieve fairness. Robust Intelligence continues to expand our capabilities by keeping our product up-to-date with important regulatory frameworks as they are released, ensuring our customers can maintain trust and value in their models.

To learn more, request a product demo here.

February 16, 2023
-
3
minute read

Fairness and Bias Testing with Robust Intelligence

Product Updates

Fairness in AI is important to protecting fundamental civil rights of all groups and individuals. AI is at risk of exacerbating biases that exist in society throughout the ML lifecycle by perpetuating detrimental decision-making outcomes, putting marginalized groups especially at risk. Managing risk in AI models is inherently challenging because models are often treated as black boxes, overlooking how their design and implementation can contribute to inequities. With a heavy dependence on data that is often embedded with systemic biases, models are susceptible to amplifying unjust and bias outcomes for far larger audiences than traditional modeling techniques. It is because of these reasons that fairness and bias testing is extremely important as part of the ML lifecycle, so that any discriminatory harm these tools may have can be accounted for and eliminated.

The past year has experienced a surge in national and international attention on fairness in AI, as part of larger efforts in Responsible AI, Trustworthy AI, and Ethical AI, to name a few. Wide-spread frameworks and proposals such as the White House Bill of Rights, NIST’s AI Risk Management Framework, the EU AI Act and event state-specific regulation like the New York City Local Law 144 (which requires bias auditing of automatic-employment-decision-tools), have all contributed to an emphasis on keeping enterprises accountable for potential discriminatory harm caused by AI.

For enterprises looking to both stay legally compliant as well as uphold high standards for anti-bias and building fair AI systems, Robust Intelligence provides a solution to audit and mitigate model biases that can be harmful to marginalized groups.

Robust Intelligence’s mission is to eliminate risk associated with deploying AI models. To this aim, we developed an end-to-end solution that continuously validates models and data at every stage of the ML lifecycle. From pre-deployment vulnerability detection and remediation to post-deployment monitoring and protection, Robust Intelligence gives organizations the confidence to scale models in production across a variety of use cases and modalities.

Examples of commonly used fairness tests

One key part of Robust Intelligence’s comprehensive, test-driven approach is our Fairness Risk test suite, within which we offer tools to comprehensively detect and monitor machine learning systems to prevent discriminatory model biases.

1. Demographic Parity:

Demographic parity is one of the most well-known and strict measures of fairness. It is meant to be used in a setting where we assert that the base label rates between subgroups should be the same, even if empirically they are different.  This test works by computing the Positive Prediction Rate (PPR) metric over all subsets of each protected feature (e.g. race) in the data. It raises an alert if any subset has a PPR that is below 80% of the subset with the highest rate. This test is in-line with the Four-Fifths law, a regulation that requires the selection rate (or Positive Prediction Rate) for any race, sex, or ethnic group to be at least four-fifths (80%), of the group with highest rate, or otherwise is treated as evidence of adverse impact.

2. Intersectional Group Fairness:

Most existing work in the fairness literature deals with a binary view of fairness - either a particular group is performing worse or not. This binary categorization misses the important nuance of the fairness field - that biases can often be amplified in subgroups that combine membership from different protected groups, especially if such a subgroup is particularly underrepresented in opportunities historically. The intersectional group fairness test uses positive prediction rate to ensure that all subsets in the intersection between any two protected groups perform similarly.

3. Disparate Impact Ratio:

The New York City AI Hiring Law (Local Law 144), set to go into effect in April 2023, lists this metric of disparate impact ratio as a requirement to be included in bias audits. Impact ratio is usually defined as the selection rate for a group divided by the selection rate of the most selected category. This test is run over all subsets of all protected features in the data provided, as well as all intersectional groups.

4. Discrimination by Proxy:

Removing bias from models is not as easy as removing protected features from training data such that the model does not learn from them. Rather, these features may have dependencies on other features that are included in the model, which could lead the model to still train over protected features by proxy. This test checks whether any feature is a proxy for a protected feature by using mutual information as a measure of similarity with a protected feature. This provides insights for any bias audit regulations that may require information about protected proxy features.

Additional Robust Intelligence Fairness Capabilities

The above tests are a subset of our test suite that can not only be run pre-deployment, but, more importantly, as a monitoring pipeline with built-in notifications and alerting features. With our newly-added Fairness Monitoring functionality, we provide customers the ease and transparency to be able to track model biases over features, subsets, and intersections on every batch of new data as it comes in. Altogether, our automated detection, root cause analysis, and emphasis on customizability ensures that customers are able to evaluate models on a specific set of performance expectations.

Robust Intelligence offers the ability to autogenerate model cards as a way to simplify and streamline documentation of these testing results for bias auditing, self-imposed regulation, and external compliance reporting requirements. Our model cards functionality makes it easier for governance, risk, and compliance teams to work together with data science teams to align technical insights with regulatory requirements.

At Robust Intelligence, we strive to mitigate AI risk and instill model integrity. This is especially important in the context of making ethically-charged decisions that directly impact marginalized groups. Our platform arms customers with a tool to understand model behavior, identify model biases, and incorporate insights to achieve fairness. Robust Intelligence continues to expand our capabilities by keeping our product up-to-date with important regulatory frameworks as they are released, ensuring our customers can maintain trust and value in their models.

To learn more, request a product demo here.

Blog

Related articles

March 9, 2022
-
4
minute read

What Is Model Monitoring? Your Complete Guide

For:
March 23, 2023
-
4
minute read

Customize AI Model Testing with Robust Intelligence

For:
November 23, 2021
-
6
minute read

Machine Learning Actionability: Fixing Problems with Your Model Pipelines

For:
No items found.