Today’s cars contain 100 million lines of code powering everything from collision warning systems and night vision cameras to automatic braking systems that will stop your vehicle—thankfully—before it hits something in its path.
With more code than a fighter jet, these automobiles still require an attentive human behind the wheel to assess dynamic conditions and react swiftly if needed. Likewise, artificial intelligence (AI) demands stewardship to keep it on track to meet its intended objectives. Despite its power to learn and get smarter with time, AI still requires human oversight and occasional recalibration to ensure optimal outcomes—and to avoid missteps that can occur for every industry that uses it.
In healthcare, red flags surfaced when hundreds of AI models developed to fight the pandemic were declared fatally flawed and of no potential clinical use. University of Cambridge researchers uncovered faulty methods, “Frankenstein datasets” and underlying biases among the 415 models it examined. The stakes of getting AI right are especially high in a field like healthcare, where lives depend on the right decision. Even when no lives are immediately at risk, AI that is drifting away from its intended purpose can lead to unfortunate results, such as perpetuating systemic biases.
Cautionary tales also can be found outside healthcare. Consider Zillow, the online real estate marketplace that relied on AI to predict future home prices—until flaws were detected and its home-buying spree came to a sudden halt. As a result, Zillow shut down its Zillow Offers operation, laid off 25% of its workforce, took a $300 million write-down and watched its stock tumble.
In a shareholder letter, the CEO pointed to an AI-generated algorithm that was unable to predict home pricing accurately, resulting in Zillow “unintentionally purchasing homes at higher prices than our current estimates of future selling prices.”
To be fair, Zillow’s modeling exercise was a very difficult one: Predict home prices three to six months into the future amid pandemic-induced uncertainty while considering stochastic and non-stochastic factors. With such a complex problem to solve, it’s essential to scrutinize not only what AI generates but also why one modeling technique produces certain outcomes while another technique leads to different answers.
In the Zillow case, the company knew something was wrong and accelerated its home-buying spree anyway, The Wall Street Journal reported. That is not technology’s fault; it was humans who decided to keep this momentum going in the face of growing competition from Opendoor, Redfin and others that were using algorithms to forecast home prices.
These lessons serve as a wake-up call that AI is not meant to run on autopilot. When an organization rolls out an AI innovation as a test, there is a high level of scrutiny. Everyone wants to add their feedback and take ownership. But when AI goes into production and scales up, important new considerations may go undetected because everyone’s focus has already turned to the next shiny new object.
That’s why deploying AI at scale demands a healthy respect for the power of momentum, the scope of responsibility that’s vital across an organization and a handle on the enigmatic nature of AI. These three unyielding forces influence the fate of every AI pursuit, no matter how big or small, no matter the industry:
- Momentum is great when it works to your advantage—and devastating when it doesn’t. When momentum keeps you moving in the wrong direction, it’s difficult to stop and redirect that action. It takes time for everyone to change course, build faith in the new direction and commit to good practices to keep it moving forward.
- Responsibility for efficient and ethical use of AI is a shared function across the enterprise. Executive leadership must understand AI’s limitations to make sound decisions, rather than blindly accept AI-generated conclusions and directives. Leaders should foster a culture that empowers everyone involved with AI to speak up if they spot something concerning. Everyone is responsible, not just management.
- Diversity in analytical thinking keeps AI models honest. AI is a black box whose inner workings are often misunderstood, difficult to explain and increasingly challenging to govern. Ensuring consistency by cross checking for confluences of evidence enables AI to perform as intended.
There are many business platitudes about building momentum but little fanfare about halting momentum when something starts to go awry. Once a project is moving at a fast clip and nailing every target and deadline, no one wants to be the one to step up to say, “Let’s stop and re-examine what we’ve all accepted as fact.” People are even less willing to sound an alarm about their concerns when goals are being missed.
Dysfunctional momentum—when teams continue to press toward an established goal while ignoring warning signs—is hard to recognize and harder to arrest. To scale and succeed with AI, this must change. Everyone involved—from coders to the C-suite—should be empowered to raise a hand and be heard if they suspect dysfunctional momentum, without fear of repercussions.
One example of dysfunctional momentum involved the space shuttle disaster that killed Challenger’s crew. Though an engineer urged delay of the launch, citing safety concerns stemming from a design flaw in the O-ring seals, the shuttle went up as scheduled and exploded about a minute after liftoff.
In her analysis of that fatal 1986 disaster, sociologist Diane Vaughan coined the term “normalization of deviance” to describe teams that become desensitized to unsound practices. The O-ring issue was not undocumented; engineers were aware of it. In a healthcare setting, this normalization of deviance from evidence-based practices is seen with lax adherence to infection controls like hand-washing or robotically shutting off alarms in patient rooms without investigating why an alarm was triggered.
Consider the question: Are my drivers for organizational change the same drivers that can be applied to slow down change that is undesirable? Can anyone who sees a problem arrest the momentum? The Andon cord concept is about halting momentum in a moving assembly line whenever anyone sees a problem. Access to this emergency brake is not a right—it is an obligation. Organizations should design their own Andon cord for AI at scale.
Can AI be trusted? With data modeling transparency, external validation and expert interpretation, yes. But only to a degree. While AI models learn and get smarter over time, they lack reasoning and the capacity to incorporate ethics, morality and human values.
That puts great responsibility on everyone involved with AI to understand what it can and cannot do. Decision-makers must recognize and weigh AI’s limitations when charting strategy and tactics. No single AI modeling technique is perfect. It’s important to encourage and reward more thoughtful and diversified analytical habits within organizations.
On another dimension, do not expect AI to do something a human would never be able to do well. We can appraise the fair price of a home today, but we are likely to do a poor job predicting inflation two years out. One can expect AI to power autonomous cars because billions of humans do that well.
With AI, we need to watch for warning signals. And when a questionable outcome surfaces, investigate and act quickly to halt unhealthy momentum and pivot to a better place.
As AI data modeling techniques grow more sophisticated, it becomes increasingly difficult to understand the inner workings and explain exactly how AI solves a challenging problem or why it draws a particular conclusion. This enigmatic nature of AI makes it dangerously easy to accept algorithm-driven solutions at face value, without discretion.
For AI, it’s possible to resolve logic lapses and conflicting predictions by using simpler and more transparent analytical techniques. Teams should examine irregularities that merit deeper study before moving forward with an AI-driven course of action. Some use a committee-of-models approach in which the results across the models can be considered as a vote for the answer. If the answers from the models are narrow in range, then there is less disagreement, signaling a high confidence in the answer. On the other hand, if the disagreement is pronounced, that can indicate a lower level of confidence in the answer.
Developing a culture that is able to lead in this way requires building teams that are not only willing to speak up and take responsibility to understand AI’s limitations, but that represent a diversity of perspectives and approaches to analysis. Yet a look at data science job postings suggests a loyalty to, and perhaps an overreliance on, one specific AI forecasting and analysis tool. Seeking out candidates who are competent in just one AI technique—to the exclusion of others—puts organizations at a disadvantage because it discourages diversity of analytical thought. Without a variety of AI techniques, we only encourage AI to mimic our own human biases rather than generate better solutions. Organizations like diversity.ai offer inspiration about achieving inclusive, bias-free AI.
Unfortunate missteps in the use of AI may continue to surface from time to time but these examples offer a teachable moment—an opportunity to look within our own organizations to find ways to improve our AI outcomes.
AI’s potential ignites the imaginations of creative problem-solvers who are passionate about hitting the accelerator to feel the power of innovation. While the fastest, slickest set of wheels on the road is irresistible, it’s important that we keep our eyes on the road ahead to make sure our ride gets us to our destination safely and on time, every time.