The ROI of Responsible AI Development in Healthcare

The use of AI in healthcare is receiving close attention from the White House, the FDA and care facilities across the world. It’s not hard to see why. According to the Association of American Medical Colleges, “the United States will face a physician shortage of up to 86,000 physicians by 2036.” Improved decision making in the healthcare space may help to save lives. However, the downside risks of poorly built or ineffectively deployed AI systems can also cause serious harm to patients, clinicians and caregivers.

The key question is “what separates a good healthcare AI system from a bad one?”

We recently spoke with three experts in the field of healthcare AI to learn more about how responsible AI practices have helped their organizations improve products and generate positive outcomes for patients and other stakeholders. Mednition and Decoded Health, both SF Bay Area companies, have included responsible AI practices in their design process throughout the history of the company. The University Health Network is a health care and medical research organization in Toronto, Ontario that generates research across a wide array of areas.

Christian Reilly, co-founder and CEO of Mednition, is the primary architect of KATE AI, the company’s flagship product. During our conversationwe spoke about how Mednition develops AI models, particularly for sepsis prediction.

Having followed the news about sepsis prediction models for the past several years, it was notable that Mednition’s KATE Sepsis model had received “Breakthrough Device Designation” from the FDA. Given the negative attention that other sepsis models have received over the past few years, the headline stood out.

Key to the success of the KATE sepsis model is the governance practices adopted by Mednition, which distributes responsibility for quality assurance and ethical practices across their team. Mednition employs a team of 10 clinicians to help its data science team make decisions about what data and features to include in their models, which are updated quarterly. This philosophy of continuous improvement helps to address the problem of dataset shift, which has plagued the proper functioning of other sepsis prediction models. Each clinician on the team has individual veto power if they believe that a proposed change will not lead to better model performance. Reilly tells me that “every person on that team has the right to prevent a release from being used in a production environment.” This collaborative design approach is at the core of responsible AI development techniques, and it’s also a key component of Mednition’s practices. As Reilly notes, “Most data scientists are not licensed clinicians,” so communication between model developers and clinicians is critical to improved model performance.

Reilly’s description of Mednition’s model development process is reminiscent of the 2011 documentary “Jiro Dreams of Sushi.” Just as master sushi Chef Jiro Ono meticulously sculpted each piece of sushi from fish selected with utmost care, so Reilly and his team have built each iteration of the KATE sepsis model using carefully curated datasets and machine learning techniques. Using a painstaking process of data selection and quality assurance, determining features and model weights through careful deliberation, the KATE sepsis team has delivered a model that engenders trust among clinicians, and it is this care that generates the value of the product. The FDA’s Breakthrough Device Designation is a bit like earning another Michelin Star. Thanks for bearing with the clunky extended analogy!

In addition to the collaborative design process between Mednition’s data science and clinical teams, the company also does rigorous QA on their datasets to ensure that every cell in the dataset is machine readable and consistent, which further contributes to model accuracy. In datasets containing millions or billions of data points, this work is extremely time consuming.

The extra time and effort this team has put in is paying off now. The additional curation, analysis and deliberation that Reilly and his team put into the process of model development have received positive regard from the clinical community and the FDA. The positive impacts of these tools accrue to the caregivers using the tools, the patients they serve and to Mednition itself as it grows greater trust with all these stakeholders.

The trust that this work builds with Medniton’s customers is powerful. Unsurprisingly, “what matters more than anything to clinicians is model accuracy.” The KATE Sepsis model achieves a sensitivity measure that is 30%-40% greater than competing models. This means greater predictive accuracy with fewer false positives.

Inaccuracies in prediction lead to increased labor on the part of clinicians such as alert fatigue, while leading to worse patient outcomes. Bad predictive models deployed by other companies have undermined clinician trust in AI models.

It’s easy to see how poorly developed models lead to bad press and loss of user trust for other companies in the AI healthcare space. However, avoiding bad PR is merely the most obvious economic benefit for companies that are willing to invest in ethical design.

Mark Hanson, co-founder and CEO of Decoded Health, expressed strong alignment with this view. Decoded Health designs generative AI systems that support doctors and caregivers in a variety of functions. Billing their platform as a “force multiplier for physicians,” Decoded products provide support for the entire clinical team, providing tools that assist in patient communication, medical intake, clinical documentation, etc.

When we interviewed Mark, he told us that one of the key benefits of responsible AI practices is that the “sales cycle is compressed because you’ve got a more refined, targeted articulation of value and differentiation in a language that your customer understands and believes.”

The products that Decoded Health deploys are designed with explainability as a core component. Mark tells us that“Explainability goes hand in hand with trust. Explainability is not just a recitation of all the information, it’s a cognitive thought process. That means that you have to spend a lot of time understanding the thought process of the user, and mapping your system to that thought process.”

Explainability, of course, is a key component of responsible AI practice and ethical design. By including this value as a core design principle, Decoded Health has been able to differentiate their product offering from similar tools that are built on top of LLM foundation models built by other companies. Mark continues, “the benefit is that if you design the system well, trust and explainability becomes a differentiable feature and capability in your system that improves the quality of its output.”

Whereas other companies might build healthcare support AI expecting that the foundation model will generate explainable results, Decoded Health has built their own models to ensure this. In addition to generating trust with users and customers, they also avoid the type of technical debt that would require them to retrain models for greater explainability in the future. This kind of technical debt amounts to what AI researchers Casey Feisler and Natalie Garrett have called “ethical debt.” They note that “When the bugs are societal harms, however, it isn’t seen as bad tech, but rather unethical tech.”

Christian Reilly from Mednition has similar things to say about the importance of customer and user trust when it comes to AI Healthcare systems. “The [clinician community] is very collegiate,” says Reilly. “They talk and they share and if we make a bad decision, push a bad model, everyone’s going to know about it,” This is why trust is so critical in medical AI development.

Reilly adds, “When you’re doing something in such a serious space as medicine, people will ask you to show them the evidence that this model is evaluated to do the right thing for the intended population. We started off by recording every [model we’ve ever built] and archiving it. So we have full traceability into every model we’ve ever built. I think that’s a natural part of the ethical development of AI for healthcare, which is part of our culture of truth, trust and transparency.”

In addition to the value of model accuracy, clinician input also supports the effective deployment of AI products in the clinical setting. It’s always important to remember that there is more to AI technology than the machine learning model itself. The product and the socio-technical context into which the product is deployed are also critical components of a system that leads to better patient outcomes.

As Madeleine Clare Elish and Elizabeth Anne Watkins said in their study of AI integration in the clinical context, “Technological systems don’t exist in a bubble. They require a complex interaction of humans, infrastructure, and organizational structure to work effectively.”

When AI tool developers work closely with their clinical teams to understand the context in which the tool will be deployed, caregivers have an easier time achieving positive outcomes.

Wanting to learn more about the role of clinician data scientists, I also interviewed Robert Grant of the University Health Network in Toronto, Ontario. Grant is a medical oncologist and clinician investigator in cancer research. He underscores a key point that Elish and Watkins identified in their research:

“In cancer care, what the model predicts and what the next action for the clinical team should be are not necessarily obvious,” says Grant “I spend more time worrying about the sociotechnical context of the clinical deployment than anything else. The system needs to make the life of the clinician easier in order to be effective. That’s especially true in a context where other AI products have been deployed ineffectively, leading to frustration on the part of clinicians.”

Just as Hanson and Reilly have pointed out, the explainability of decision making for AI systems is critical in order to engender trust. These systems don’t make predictions or recommendations in a vacuum, they take place in a sociotechnical context that includes a massive number of variables. In order to be used effectively, they must be designed and deployed in such a way that their outputs make sense to all the decision makers in the situation.

The cost of developing these technologies carefully amongst a community of clinicians is non-trivial. Although it would be impossible to develop any product without feedback from users, both Reilly and Grant commit significant additional time and resources to ensure that their products are developed so that clinicians have provided critical input.

Reilly is open about the additional costs Mednition accepts for their attention to detail. Additional headcount, a lengthier quality assurance process, additional time to market, additional storage, and additional compute costs are all part of the overhead that Mednition accepts in order to realize its commitment to ethical design.

Despite — or perhaps because of — these additional diligence costs, Reilly says Mednition’s growth is strong, projecting to be aligned with high performing SaaS companies in the next five years.

In recent years several large consulting firms have talked about the economic benefits of responsible AI practices. Bain & Company, for example, have shown that “Companies with a comprehensive, responsible approach to AI earn twice as much profit from their AI efforts.”

When companies have earned the trust of their stakeholder community through careful, collaborative design, they reduce the cost of customer acquisition, increase lifetime value per customer, reduce customer churn and increase overall market share consistently. The overheads associated with ethical design are part of Mednition’s organizational culture, and instantiated in their team’s practices. Although these costs are non-trivial, they pay dividends in the positive economic impacts of user trust. When companies are willing to engage in these practices as part of their basic functioning, this cost center becomes part of the organizational culture. It is reasonable to expect that these companies might need additional runway as they grow, but the likelihood of long-term economic success increases when companies and investors are willing to change their expectations around the speed of return.

Looking at the negative press that large EHR companies in this space receive when they push bad models and misrepresent their clinical accuracy, it’s not hard to predict what kinds of companies are going to be leading this industry in five years. This kind of careful development is critical in healthcare, where lives are on the line, but we don’t have to look far to see how careful design will positively impact any AI model. Developers who are willing to put the extra time and effort into ensuring that their models perform better than their competitors will reap economic rewards. Responsible AI practices such as ethical co-design and stakeholder impact mapping should be at the center of any AI development organization.

The ROI of Responsible AI Development in Healthcare

The ROI of Responsible AI Development in Healthcare

More articles

6 great keyboards that your development team will completely fall in love with

Code refactoring best practices: when its time (and when its not) to do it