Pursuing machine learning for your business is a no-brainer. It drives efficiency and effectiveness. It helps meet consumers’ increasingly customer-centric demands, and it reduces your vulnerability to disruptors. Implementing machine learning, however, requires much more thoughtfulness. The effects of the ongoing AI revolution will be strongly felt across all industries, both in terms of its potential and the damage that algorithmic bias can do. If you understand what causes algorithmic bias and take steps to mitigate it at the outset of your projects, you can go a long way toward preventing harm to customers and your reputation. 


In a recent example of machine learning gone wrong, healthcare providers began using an insurance-sponsored algorithm to flag patients for high-risk care management. The algorithm was designed to determine which patients would benefit the most from preventive care, but it based recommendations on past healthcare needs. Patients that saw doctors more often had better data and were favored over those who sought less medical attention. This meant that affluent patients, mostly white, received more recommendations for preventive care, while those from lower socioeconomic status, many of them Black, received fewer recommendations for preventative care. The algorithm did what it was supposed to do. It observed and learned from patterns. When the AI was corrected and applied to a different dataset in which the Black patient group had more active chronic conditions than the white population, which in effect means more medical spending, recommendations for preventive measures for Black patients jumped from ~17% to 46% of the total budget. The impact of turning a blind eye to non-equitable access can be consequential.


Bias can manifest itself in varied ways. First, it can impact the allocation of resources, information or opportunities. Second, it can establish and strengthen the belief of an expected outcome, which can then become difficult to change and update. But bias isn’t just an unfortunate side effect of machine learning. It’s the duty of the team working on such problems to introduce variety and reduce bias even while forming the hypothesis for an expected outcome. But first you must understand what causes bias.

So, what are the sources of such bias? All forms of bias can be traced back to three issues.

  1. Understanding AI bias: This is the easiest problem to pick on. If the training set has biases, then even a high performing algorithm will capture and reproduce it. For example, if an organization has never made loans to customers from a certain ZIP code, then a loan-recommending algorithm will make biased recommendations that could discriminate against people from that area. Another problem can arise when we use too much data and use it indiscriminately. We know that in order to make algorithms more accurate, there are only two routes: The first is to change the structure of the models to make them better and the second is to feed them more data. It’s harder to change the models, while data is easily accessible and abundant, so most companies opt to pursue more data. However, consider this route carefully. Timnit Gebru and her co-authors argue in their paper, “On the Dangers of Stochastic Parrots,” that leveraging large online data sets for training can lead to multiple problems such as language and norm under-representation for countries and people with smaller online footprints. Also, vast swaths of internet data contain racist, sexist, and otherwise abusive language which, when made part of the training data, can lead to harmful output.
  2. Faulty problem formulation: Errors in problem formulation can lead researchers and organizations to the wrong conclusions. If the machine learning algorithm is used to find evidence for a biased hypothesis, then the data isn’t to blame. When your team isn’t diverse enough or doesn’t understand the problem well enough, you risk erroneous problem formulation. Our thinking can become shortsighted or boxed in when silos prevent collaboration between affected communities and social scientists. When teams aren’t diverse enough in background, we can also fail to consider real-world implications. As we saw in our example of accidental racial sorting, that kind of mistake can be prevented when a diversified team draws on its experience to anticipate problems.
  3. Poor choice of algorithms: Which kind of algorithm you use may also impact the fairness of recommendations. For example, many algorithms today are based on maximum likelihood estimates that look at global patterns to formulate one-size-fits-all recommendations. But we need to fine tune our algorithms to perform just as well with local patterns, such as in certain subgroups and demographics, race, ethnicity, gender, sexual orientation, etc. 

Given the long-term and wide impact of AI-driven decision making and multiple sources of bias, there can’t be just one solution to it. But a good start involves a relentless organizational commitment to analyze and course correct for AI bias, transparency and accountability. Consider these steps:

  • Establish a team for ethics guidelines and regulation.
  • Introduce protocols that pressure test the behavior of the system after unexpected or even noisy data has been inputted.
  • Choose the right data. 
  • Make it a habit to think about and communicate the limitations of the models, your approach to the problem and the answers that the models can provide to promote transparency.
  • Build a collaborative workplace where technologists can work with ethicists.

In short, you should begin modularly. Start with a narrow problem, identify the key questions, identify your data sets and any issues that come with them, get the right team to establish the ethics guidelines and get started. 


How might this look in practice? Let’s say you’re building a medical diagnosis system for patients with coronary artery disease (CAD) to increase access to quality healthcare services at a reduced cost. After analyzing the system, you realize that women are receiving CAD diagnoses from the AI only when they report the same symptoms that men do. You know this is problematic because women’s symptoms can present differently. How would you address this issue? The problem can be solved by choosing the right data and interviewing a set of physicians to ask them how they would respond if a patient reported CAD symptoms other than the chest pain, which is traditionally seen in males. You can also communicate the limitations by asking for a patient’s biological gender before they interact with your diagnosis system, so that you can educate them of biases and how much they should trust the system. This doesn’t solve the problem entirely, but it does avoid dangerous consequences for female patients while you work to remove the bias. Lastly, you can pressure test the system by varying the evidence that it needs to confidently predict the outcome. This way you can require the user to answer the right number of questions before providing a diagnosis. 


The responsibility that comes with algorithmic power can feel daunting, but when that power isn’t checked, it can do significant harm. Algorithms don’t worry about cultural insensitivities or the damage caused by misrepresentation. Fortunately, we do, therefore, the onus is on us to commit to prevention.