ChilCast: Healthcare Tech Talks

Building Responsible AI for Healthcare with Dr. John Halamka, Suchi Saria, and Jody Ranck, DrPH

Chilmark Research

On this episode of the Chilcast, we tackle the main topic of 2023 with three of the leading voices in the industry – the responsible development of AI for healthcare use cases with John Halamka, Suchi Saria, and moderator Jody Ranck.

Key takeaways from this episode:
- Our experts highlight the need for a balanced approach to regulation and innovation – need to ensure safety and ethical development, but don't want to stifle new advancements either.
- It is proven to be critical that diverse perspectives and voices are included in both the regulatory and development processes.
- Collaboration and continuous learning are crucial for responsible AI implementation in healthcare.

Chapters:
0:00 Introductions
4:25  Mayo Clinic Platform
7:02  Overview of The Coalition for Health AI (CHAI)
8:31  Bayesian Health Overview
11:12  Current Regulatory Discourse
23:24  Managing Tradeoffs and Social Impact
30:19  Accelerating Evidence-Based Medicine
34:30  Who Gets a Seat at the Table?
40:41  Can "Hallucinations" be Valid Hypotheses?
42:25  Evaluation as Part of the Innovation Cycle
50:41  What is the Threshold for AI Accuracy to be Implemented in Care Decisions?
54:25  Regulation Impact on Innovation
56:39  Regulation of Applications vs Models

Building Responsible AI with John Halamka Suchi Saria and Jody Ranck.wav


John III: [00:00:14] Hello everyone and welcome back to the Chillcast. My name is John Moore. I am the managing partner of Chilmark Research, which is a digital health focused industry analyst firm, and I'll be your host for today's broadcast. We're trying something new today, recording this episode with a live audience so we can incorporate questions from our viewers on this mission critical topic of responsible AI development and deployment in health care. For those watching via LinkedIn, please drop your questions into the discussion and we will work them in when appropriate. I am honored to bring together our experts for today's discussion Jodi Ranck, Suchi Saria, and John Halamka. So quick intros to each of these panelists. Jodi Ranck, Doctor of Public Health, is a senior analyst here at Chilmark Research and our in house expert for topics at the intersection of technology and public health. Dr. Ranck has nearly 30 years of experience working in the global health arena. He has been a frequent advisor to large health care companies and start ups, focused on providing more patient centric care and transitioning to value based care in the past. He has been appointed as a member of an Institute of Medicine Committee on ICTs in Global Health and Violence Prevention, and helped launch a major global e-health initiative with the Rockefeller Foundation.

John III: [00:01:27] He is the author of two books on digital health and a recognized global thought leader on the topic. Jody has written on the growth of AI and health care for almost five years for Chilmark Research, and was the primary author for our recent book collecting some of these articles, Building Responsible AI in Health Care A journey into Automation, Ethics, and Trust. Given his deep knowledge and anthropological perspective on the impact of technology in society, he is going to be our moderator for today's discussion. Suchi is the CEO, Chief Scientific Officer and co-founder of Bayesian Health, whose adaptive AI platform integrates multimodal, structured and unstructured data streams and forecasts declining trajectories, sending actionable clinical signals via EMR integrated Copilot workflows, delivering many fold higher accuracy and adoption than common tools. They are recently selected two time Best Innovations inventions. In addition to leading Bayesian health, Suchi is a chaired and tenured associate professor at Johns Hopkins University, directing the Machine Learning and Healthcare Lab, and is the founding research director of the Malone Center for Engineering and Health Care. She has collaborated with the FDA, CDC, NIH, DARPA, and leading federal agencies on advancing the fundamentals of AI to improve safety, quality, and trustworthiness.

John III: [00:02:41] For this work, she has received many awards, including being selected as both an emerging leader in health and medicine by the National Academy of Medicine and a Young Global leader by the World Economic Forum. She's on the National Academy of Medicine's Code of Conduct, and along with our other guest today, John Halamka Suchi is one also one of the founding members of the coalition for health AI, currently serving on its steering committee. And finally, I'm pleased to introduce John Halamka, who is currently the president of the Mayo Clinic platform and a co-founder of the coalition for health AI. Both efforts very much driven by the need for responsible development of these technologies within the health care vertical. Trained in emergency medicine and medical informatics, Dr. Halamka has been developing and implementing health care information strategy and policy for more than 25 years. Prior to his appointment at Mayo, he was Chief Information Officer at Beth Israel Deaconess Medical Center, where he served governments, academia and industry worldwide as the International Healthcare Innovation Professor at Harvard Medical School. Dr. Halamka helped the George W Bush administration, the Obama administration, and governments around the world plan their health care information strategies. Without further ado, I will let Jody take over.

Jody Ranck: [00:03:52] Well, thank you, John, and welcome everybody to our webinar today. And before we launch into some of the nitty gritty of responsible AI in health care, why don't we begin with each of you going into a little bit of an introduction to your respective institutions, and kind of what are some of the main highlights of the work that you're doing? It has some any implications for responsible AI. So I thought maybe we could begin with John and then Suchi follow, and then we'll launch into our discussion.

John Halamka: [00:04:25] All sounds great. And so, Jody, so wonderful to see you again. As you know, when you talk to hospital CEOs around the world, they don't say, you know what, I need an algorithm. You know, they say I have three challenges. One challenge is margins are negative. Supply chain costs and labor costs are high. Do you have anything that can help me with that? I'm having a hard time hiring and retaining specialty doctors and nurses. Do you have anything that can augment their efficiency? Oh, and you know, everyone's burned out. Oh, do you have anything that can reduce burnout? And so what you start to see is there's this need for what our friends at the AMA would call augmented intelligence, not artificial intelligence, to help our clinicians practice the top of their license. And all of it depends on data. So Mayo recognizes that every country has regulatory constraints and cultural expectations about the secondary use of data. So if you're going to create fair, appropriate, valid, equitable and safe models and you're going to curate the world's data, you have to be extraordinarily sensitive to each society's constraints around that use of data. 

John Halamka: [00:05:41] So Mayo Clinic Platform was founded with the notion that we could de-identify place data in secure containers and bring collaborators together on a global basis. We started with the US, then went to Brazil, Canada, Israel, the last seven days Denmark, Netherlands, Belgium, Norway, Sweden. A few days ago, Singapore, South Korea so far assembling about 42 million longitudinal multimodal records for the generation of models and the testing of models. And if you look at the next year, it's 22 additional countries in Asia, the Middle East and Africa. Each of these countries keeps data in their own locale, under their own control, and they just simply use a common set of tooling to ensure it's normalized. It's in a standard schema and a standard vocabulary. So truly, the role that Mayo Clinic has played is just a convener and facilitator to ensure that the world can be a collaborative ecosystem for the model development that is going to solve some of the basic business problems we hear that are certainly not specific to the United States.

Jody Ranck: [00:07:01] Great. And did you want to give them sort of a 20,000 foot view of what try you guys helped launch to address some of these issues around responsible AI, just where where Chai is right now, and what are some of the goals that you see coming out of Chai? The coalition for Responsible Health Care AI.

John Halamka: [00:07:23] Yeah. So about two years ago, we recognized if there was going to be this explosion of predictive models. How do we know how to judge them? How do we build credibility? Casey Ross at STAT wrote this wonderful article saying, you know, AI is a credibility problem in health care. Or if you ask clinicians, they say, wait, there's no transparency. I have no idea how this thing was created, and it certainly doesn't seem like it's reliable and consistent. So how do you get to the place we need to be? So these things are adopted. And that's where the coalition for health I brought together now about 1200 organizations public, private, government, academia and industry to come up with a guidance and guardrails so that as a society we can move forward. And how about this do no digital harm. And that's what we try to do every day.

Jody Ranck: [00:08:17] Great, great. Well thank you and Suchi, since you're pretty much at the forefront of implementing ethics and AI on the front lines. When you tell us a bit about Bayesian health in the work you guys have been doing.

Suchi Saria: [00:08:31] Yeah, absolutely. So going back to the priorities John laid out in terms of where provider organization, health systems, key health care stakeholders are struggling, declining margins, burnout, shortage, staffing shortages, need for ways to unburden frontline clinicians to make care more efficient. Effective. Part of that, as John laid out, is our ability to leverage the data that exists and the work that we're doing to kind of lay out guardrails. The work John is doing through Mayo Clinic platforms to bring people together, to be able to start thinking about how they put their data to use. And then at Bayesian, what we're doing is are implementing, building and implementing solutions that leverage this data live and locally at institutions to put it to use within. And how do we put it to use. So it's the ability to leverage structured, unstructured data to be able to identify in a number of these, you know, high value use cases where you can augment frontline care teams with high quality, validated, trustworthy care signals, along with workflows that make it easy for them to do the right thing. And in doing so, some of this is, you know, as I've worked in this field over the last two decades, and when I say this field means AI and building, you know, what does like AI, technical foundations and then putting AI to use in the context of care delivery.

Suchi Saria: [00:09:57] The data that we see within EMR and clinical data are far more complicated than the kinds of data we see that AI has been put to use elsewhere. And so in the last decade, a lot of our research has been around what does the AI done right look like? How do you draw high quality, reliable, trustworthy inferences from this kind of data? How do you deliver it in a way so that you can enable the kind of human machine, collaborative inter inferencing that needs to happen in order to make sure we're kind of leveraging the strengths of what is good at versus what the clinical teams who are experts are good at. And how do we done right actually improve quality, efficiency, efficacy. And so Bayesian is very much partnered, you know, in in areas where we build tools that are rigorously validated. Some of these are, you know, go through the regulatory framework in many clinical areas and some clinical areas. There's an existing framework through the FDA in software as a medical device, and go through that framework to be able to really build end to end rigorous pipelines that are health systems or user groups can leverage in a high quality, reliable, trustworthy fashion.

Jody Ranck: [00:11:11] Great. I think mentioning the FDA might be a good time now to maybe shift to talking a bit about regulatory discourses that are out there and, you know, that's been top of mind this year with the sort of explosive growth of generative AI and all that doom and gloom around that. So I thought maybe we could have a more nuanced discussion, perhaps beginning with the Biden executive order and what the two of you, what your takes are on what the white House is thinking, and then any thoughts about where the FDA is with respect to AI? 

John Halamka: [00:11:44] Sure. And as you know, I'm an apolitical person. You know, I've served both Republican and Democratic administrations. I will tell you that the process by which that executive order was generated was extraordinarily inclusive. And Suchi was there, I was there there was a series of meetings that took place this fall, every other Friday in the executive office, the Eisenhower or in the West Wing. Hearing from multiple stakeholders as to what, you know, what's the society do to make sure that this doesn't get out of control, because you've seen many authors say, really, the future of AI, it's all up to us. And so what you see in the executive order, 111 pages, health care mentioned 33 times. And in there it suggests that in 180 days, HHS should set up some public private constructs, should start to look at how we might create a network of assurance labs. How do we ensure that, you know, I know that probably your listeners won't look under their lamps for the Underwriters Laboratories seal, but the sort of idea was, your lab isn't going to blow up because it has an Underwriters Laboratory test. And maybe 100 years ago that was important and people knew about it. Ai is now at that stage where people are warning it's going to blow up, it's going to take over the world, it's going to cause harm.

John Halamka: [00:13:09] It needs a sticker. And in effect, the executive order starts to lay out, not the how but the what. The fact that we're going to need some sort of oversight, public private partnership coming together for that very much aligns doesn't enumerate chai in any way. I mean, we wouldn't expect any government, either executive or regulatory or legislative activity to name an organization, but it's the sort of function that Chai performs that seems to be laid out in the executive order. And so we look forward to these next few months, hopefully serving as a convener. And potentially we'll see the naming of an, you know, an authorizer, who can then go select testing labs. And then those labs produce an artifact and end up with a nationwide registry of all these AI algorithms used in health care so that transparently, we can understand if they're going to be good enough for the patient in front of me. And executive order obviously does include things outside of health, like let's never use AI to create a biological weapon. And so some pretty straightforward recommendations that will ensure that AI serves man and does not harm man.

Suchi Saria: [00:14:30] Yeah, I think as so to me, kind of very much adding to agree with everything John said a couple of additional pieces. It was really exciting to see. I mean, really, AI has been making very consistent, rapid progress over the last decade. And, you know, it's there's just so much good that can come from doing it well and doing it right. And it's had a little bit of fits and starts before. At times when a few organizations have come overhyped, it added too many marketing claims without sort of the rigor and the guardrails behind it where things have gone awry. So to me, it's one very exciting to see President Biden, you know, the across the board agency participation in recognizing the good I can do and therefore really thinking of it more as therefore, how do we make sure this time around we shepherd it so that we really can bring the good with safe, responsible implementation so that we don't run into sort of walls where, you know, or false starts. And so it really is about like a trade off, putting enough guardrails in place so that we really can safely innovate. I think going back to health care, health care in particular, is actually ahead of the game compared to other sectors in health care. We've had the FDA for a long time, and as a result, we've had a rubric of like thinking about safety, quality, efficacy, things that are core to validating.

Suchi Saria: [00:16:01] So as an AI researcher, when I look at. You know, tools of AI implemented in other sectors versus health care. I kind of see already in clinical, the framework is very mature and quite rigorous. So there's an opportunity here. I mean, I think what the EO does is expands well beyond clinical to all areas of AI and within health care, you know, administrative tools, all sorts of patient facing tools that traditionally wouldn't fall under the rubric of FDA. They want to understand how do we have a do we have a game plan? Because there's just as much harm you can do from poor implementation of administrative tools. And part of it is like, let's just approach every one of these in a thoughtful fashion. And that's really what the EU is about. And, you know, now the the key part is like the devil is in the million details of like how this is going to play out. You know, Chai of course is an organization. Hopefully, you know, there's a lot of things that chai is doing well, but really it's hopefully going to be a very inclusive approach to kind of laying out the how.

Jody Ranck: [00:17:05] You know, one of the things that's coming up for me recently in attending some of these meetings and sitting in on the discussions around frameworks, is that we're beginning to see the proliferation of certifications. For example, I triple E, responsible AI Institute Chai and National Academy of Medicine are all looking at this. And is there are there any dangers in having like everyone has their own standards for certification for different use cases and so forth. And then you have the differences between the EU and the US. And there's always this tension between like what's the appropriate level of regulation so that we don't cause harm, that, you know, we eliminate many of these biases that still continue to plague us. I mean, just the recent issue with Unitedhealth's algorithm to deny claims for Medicare Advantage to Optum, to the. Kidney function, liver function, all these different algorithms, some of them are AI, some are not, that have racial bias in or gender bias. Different sorts. So let's maybe talk about what are your what's your take on these certifications. But then at the end of the day, even if we have frameworks and so forth, what are other things we can do to make sure we don't have some of these mess ups like we've had, you know, epic sepsis algorithm. You know, we just go down a whole list of them that have been damaging trust in the eyes of many people.

John Halamka: [00:18:42] Well, Suchi, why don't you start this one? I mean, Suchi and I are like brother and sister. You know we like live this every day.

Suchi Saria: [00:18:51] I think to me what is again really, really exciting is the rapid pace at which key stakeholders are coming together to collaborate to make sure we have a framework ecosystem in place to ensure that we're moving towards a culture of done right. What this means is, you know, it's almost better to have too much participation than no participation, right? So lots of different groups have stepped up. Lots of groups are bringing together experts. Sometimes it almost feels like maybe there are more certifiers than there are experts who actually know how to lay out what is even right or wrong. So in a way that it does feel overwhelming. But on the flip side, it's really exciting to see so much energy. I think very fundamentally, there are a couple of pieces that I think it's very important for us to keep remembering one. A learning based mindset. It's very important for us to, you know, stay open. We don't want this to turn into. Here are the seven things. And the reality is the field is evolving and the seven things ought to be evolving. So it's really important for us to kind of keep an open mind set on what the metrics are, not to overengineer it in a way that we basically kind of stifle innovation. But we, you know, in a learning mindset, you're curious, you're monitoring, you're watching, you're measuring, and you're understanding as you go as opposed to checking off the box.

Suchi Saria: [00:20:13] And so that's the approach we take. And again going back to my research roots on the clinical side, there's a lot more rights than not. And if you actually follow best practices we have very nice mature frameworks in place. So the question is more about standardizing the implementation of it. Like can we make sure all developers are following these as opposed to skirting these? And from a certification perspective, I think it is to be seen at least what keeps me awake at night is do we end up in a scenario where we bring a group of people together, come up with 200 metrics that we're asking every single developer to obligating them to, and it becomes more of an exercise of checking the box than actually doing the right thing. I don't know if anyone has the answer to that, and I think we have to make sure in the next few years, we are really weighing the trade off between the cost of developing and implementing these, you know, rubrics and sort of the burden it puts on the developers so that there's a good balance and it's practical. John, I'll let you sort of respond. Go ahead.

John Halamka: [00:21:27] Yeah. So if we ask the question, what is the role of government? Right. So people always debate where is government going to help us without slowing innovation. Government can be really good at helping us understand what to measure and then turning it over to private industry to do the measurements. And we'll see who's measurements, who's reporting, who's services are going to provide the greatest impact on industry. I think back, and I know you know, John and Jodie, this is not the best metaphor. But back in the electronic health record meaningful use era, there was a notion that there should be multiple labs that help you with the evaluation of your record, and you go to the lab that you think is going to help you the most. And that's okay, because the government has told you what are the standards for measurement without prescribing the sort of how or the services. So I think what you're going to see is just a guess, right? We know there's a executive order, but there's also coming December 14th and 15th at the ONC annual meeting, presumably the formalization of one that proposed rule from last April will be finalized. I'm guessing right. No one ever knows these things. And so with these formalization of rules, we'll start to see the more granular what these labs, what these firms, what these certifiers should do. And I think it's going to be, hey, I'm guessing Mayo will do one, Duke will do one, Stanford will do one, Bayesian could do one. I mean, there's going to be a whole ecosystem of folks, but we'll all follow the same rubric and just add different kinds of visualizations and services to it. So it will ultimately all be okay.

Jody Ranck: [00:23:25] You know, one of the things that I'm a social scientist. We look at how things actually get implemented versus how people say they get implemented and how institutions change and so forth. And sometimes when you talk to data scientists, you know, they're building these models. You see, if we look at responsible AI, for example, and you have the validation and then explainability bias audits down to looking at health equity and fairness and so forth. And a lot of the really interesting ethical decisions that may fall below, like grand frameworks and so forth, will be the decisions about, okay, if I if we really do the extra work to get rid of certain amount of bias, how does that impact the performance of the algorithm? So different thresholds that in ranges of things that we need to tolerate and accept or sometimes not just do it if it doesn't work out. You know, I think if we could get more insights on how people make these decisions and who makes them for whom, for what outcome at the sort of granular, everyday of building models, I think that would beyond the large frameworks and so forth, that would be really helpful to people in the trenches developing the models and developing that implementation science. I don't know, Suchi, if you've had experiences like that where, you know, it's a difficult decision to make around what threshold of whatever, whether it's around bias or fairness or performance of a model.

John Halamka: [00:25:09] Yeah.

Suchi Saria: [00:25:10] I think one example way to think about this is ultimately. What is or isn't ethical or biased is partly a cultural question. Like you want to go into an institution you want to think about, like, what are the principles by which you know, the key stakeholders want to practice and what does or doesn't make sense. So let me give you a very simple example. If a solution works very good, very well in a majority population, but doesn't work so well in a very particular minority population, there are two ways to think about it. One way is we're not going to use it at all, because it doesn't work well in a particular minority population. Or another way to say is we're going to use it in the populations where it works well and put a constraint or a sort of, you know, communication around it doesn't work well here. Therefore, we won't be using it here. Right. And those two are two different value frameworks from which people are operating. And you have to we have to figure out like what makes sense in that particular context. You could imagine maybe it's more equitable if you don't use it at all for anybody. Or you could imagine it's more fair when we have data to support it works in particular subpopulations. We should use it in them, and then we should make sure wherever we're using it, we're using it correctly.

Suchi Saria: [00:26:31] And so those are the kinds of value framing statements that are very, very important to identify upfront. And then you're working backwards to design solutions that are supporting it. And to me, the three key places where I often see things go wrong is one we've never understood exactly. What is it the intended use was we don't understand what we were hoping to achieve in the first place. Second, that even if we understood what we were hoping to achieve, we're not designing the solution to achieve that. And third, we don't have any measurement in place like we don't measure things rigorously enough to know and understand. Are we getting there? A lot of the bias related questions are often going back to, you know, our data is biased, health care data is biased. And part of the reason healthcare data contains all these biases because our current practice is biased. But to me, one of the most exciting opportunities here is that again, wearing my researcher hat, there are all these techniques for debiasing data and debiasing algorithms, which are which actually bring forward sort of this ideal of how done right it can actually augment care teams in the right way to de bias current human practice, which I think is kind of where the real opportunities lie.

John Halamka: [00:27:45] Mhm. So Jerry, let me ask you a controversial question. Okay. Suppose there is a biased algorithm because every algorithm has biases. And we have a wonderful cardiology algorithm that works great for people with a body mass index less than 31. And really not so great for people with the body mass index above 35. Should you throw away the algorithm and say it should never be used? Or should you put a big warning label on the algorithm that says only for use in patients with body mass index less than 31, where its AUC is 0.92. So I think what we're going to see this is early, early days, but I am hearing commercial companies saying, wow, we'll go to validation. We'll do subgroup analysis on race, ethnicity, age, gender geography will identify the bias and then we'll mitigate the bias. It may be actually, to your point, Jodi, impossible to create something that works for every single person in every single circumstance. Sort of. Okay, just use it for these people in this circumstance and you'll get a good result.

Jody Ranck: [00:28:58] I think that sounds great. I just think the main thing we need to think about will there be populations that don't get algorithms that can't, that, you know, the fairness overall? Will the other populations get the same level of care?

Suchi Saria: [00:29:14] I think we already have a framework in place in a way. When you think about drug development, right. Rare diseases like basically subpopulations that and we need to learn from the past. Right. Like rather than replicate things as is, we need to think about what has and hasn't worked. In a way, this notion of which populations like labels, labels on drugs help you identify which populations is intended for and which populations are excluded. And what John is describing kind of fits right into that framework. I think one of the questions here is, how do we incentivize the adoption of AI across the board, as opposed to just the majority groups? And part of that now comes back to reimbursement. I think today there are areas where there is incentive for the use of AI, like, you know, problems that are already burdening provider systems. But there are all sorts of areas where we really need to create a reimbursement based rubric. Like today, a big opportunity is to reimagine how reimbursement could help incentivize the build of AI for all and good AI for all. 

Jody Ranck: [00:30:19] And while both of you are talking it just in my mind, it brought up this issue of evidence based medicine where we often talk about it as that's the solution. But the evidence itself often follows a bell curve, right. So different people could be on different tails of that bell curve. And I think with some of the AI tools we can better, you know, provide better precision. So it's not just treating people at the mean but treating them where they are. You know, and another kind of evidence based medicine, right?

John Halamka: [00:30:52] Yeah. So let me tell you a story which Suchi will find amusing. So I have glaucoma, I have glaucoma because my father had glaucoma and his father had glaucoma. And you can't choose your parents. And so I went to Mayo Clinic because my vision in my left eye is declining a bit faster than you would like. And Mayo Clinic did full MRI imaging of my orbits and brain, and they came back and said, your brain is abnormal, which Suchi could have told you anyway, right? But I see what you.

Suchi Saria: [00:31:24] Said, John, that I would find it amusing.

John Halamka: [00:31:27] I said to a mayo clinic, do you have any idea what it means to have an abnormal brain? And they said, well, you know, our best radiologists have looked at a few thousand images in their lives, and compared to those few thousand images in their lives, yours is different. Well, okay, there's 8 billion people on the planet. What if we took every MRI in the history of mankind, put it into a giant cloud, and ran TPUs on it to create three dimensional visualizations of ventricular size sulci, you know, various kinds of measures, and then said, this is the mean. And then you would understand if you were in a Gaussian distribution, if you were at the edges, as you point out. Oddly enough, we did that. It took 20,000 TPUs three weeks, and now Mayo Clinic has the largest in history model of what is a normal brain. And it turns out my brain is completely normal. I mean, it just doesn't happen to sit right at the mean, but it certainly sits within one standard deviation of variance. And the problem is, as doctors, right, we see a few thousand patients in our careers. We don't have the benefit, as AI does, of looking at millions to billions and understanding what is acceptable variation. You also asked a question about the underserved, because everything I've just said presupposes there's data on the humans that you want to serve.

John Halamka: [00:33:01] What if you don't get electronic health record? You know, it's interesting. I just had an exercise done by one of my staff at sub-Saharan Africa, where we actually looked at every country and said, where is there no data? And what we found? I mean, we originally looked around South East Asia, Middle East and Africa, 75 countries. We actually only found 22 countries where there is a robust representation of patients digital data. So there's huge gaps in populations. I'll leave you maybe with one story. Ten years ago, a large tech company asked to come to Boston to visit the Medicaid clinics and see how the underserved receive their care. These people, often undocumented, not so good with English, not trusting of government. And these engineers walked up to an 85 year old homeless man and said, what's your favorite wearable? And he said, socks. And so this is like a person for which you're not going to have an iPhone 17 and an Apple Watch Series 12 gathering telemetry. He will have probably little to no data for which an algorithm can be developed. So this notion of being so careful about countries and people that are just outside of the system is so important.

Jody Ranck: [00:34:28] I don't. They refer that as data poverty. That and the way we collect the data as well. Sometimes the categories and framings of it, they're wrong and we suck them up into our algorithms. You know, that's what we already passed through here a little while ago to kind of build on that. You know, a lot of these meetings around these frameworks and approaches, the responsible AI, the ones I've seen, they tend to have big institutions, big tech companies. What about historically black colleges being there? And, you know, the small, smaller universities that don't have the computing power to engage with AI. And then maybe we can then shift the discussion before we go into questions about what generative AI means for this, because that's the one the computing power hog at the moment. So like, who's not at the table? That needs to be at the table in some of these discussions? And then what do we do with generative AI in health care and medicine too?

Suchi Saria: [00:35:30] Two great questions. I can I can start with the first one. Um, I guess, I mean, honestly, as someone who's often in the room, sort of a minority voice like, you know, I got into tech very, very young and very early and was very often sort of probably 20 years younger than most people around me, often the rare person of color and female, because most of tech is led by men. And so it's been sort of almost what feels like a lifelong experience of being the minority in the room and kind of understanding sort of how that can foster. I think to me here, one of the big, big things we absolutely want to make sure we do is to bring the voices of people from very diverse communities to participate. I'll just give some examples. So as part of Chai, what we've been doing is creating somewhat of a hub and spoke model, right. Like you could imagine, like John referred to 1200 or so organizations that are now part of it. You know, they each have a seat at the table. Now, part of it is how do you go to a room and listen to 1200 people? That's pretty hard to do. Right. So some of it is in the governance model.

Suchi Saria: [00:36:44] We'll have to figure out how do we give everybody an equal voice? How do we make sure that perspectives are represented? Shifting gears, talking a little bit about Bayesian, the way we've thought about it is when we're going and implementing, we're bringing this kind of model that can be implemented not just at the top leading AMC's, where they're resource rich. They have leading researchers sitting at the table critiquing everything. But really community organizations where they may not have experts in house. And what they really need is the ability to solve their problems confidently and in a way that they can trust that we're going down a path, that we're going to succeed together. And some of that then is, you know, we're bringing rigorous, validated tools and part of partnering with them to deliver it within their organization. And what does that mean? That means partly a governance framework. That partly means, you know, we're not just sort of leaving them with an algorithm. It's a solution where we get to monitor, you know, when we first go implement integrate, we first measure in the background. How is the solution doing in their care environment? Where is the gap in their care environment. Then we can tune to their care environment to improve within their cultural context.

Suchi Saria: [00:37:53] What makes sense? Educate around it to say what we're seeing in terms of what's working, where the limitations are, and then in an ongoing way, monitor to understand where things are staying performant. And, you know, as things like drifts and shifts occur, how we're going to adapt over time. And so that adaptive framework is what I think, you know, is very much sort of where the field is headed overall. And I'm excited to see healthy AI overall heading there, where sort of the points to where, you know, all of the sort of new regulations that are coming out or ideas or recommendations are coming out are all going there. And so some of this is like, how do we go to a rural or health system, you know, rural hospital and like bring this kind of infrastructure to them at a cost that's affordable. So they're not sitting there saying, now, do I need to stand up a big group and pay for all sorts of people to become experts in AI and start implementing this from their point of view? They need solutions. They don't need to become experts in every piece of this puzzle.

John Halamka: [00:38:51] So answer to your question a couple of ways, which is she invited the HBCUs. Morehouse, certainly a active participant in our work, so that you really try to get all voices. But here's a challenge. There are some 5000 community hospitals in this country. And how do you get them to adopt? We'll talk predictive AI because generative AI, as you say, kind of a separate question. What Mayo has done. We have 73 affiliated community hospitals to two 400 bed non-academics. We bring them an I starter pack and we started with things like radiology override, early cancer diagnosis, diabetic management, and you actually instantiate that in their workflow and say, does this help you with some of those three business problems that I outlined at the beginning. And they say, you know, it really did. And oh, do you have anything for neurology? You know, oh, do you have anything in behavioral health? Because once you start instrumenting the organizations and they see the value, then they get on board. We've in effect had to give them that starter pack. As we look to expand to some of the Mayo Clinic platform work in sub-Saharan Africa, the countries were working in our Nigeria, Ghana, Senegal, Kenya, Rwanda, Uganda and Ethiopia. These are folks for which if you say we need $500,000 for data curation. Don't have $500,000 available. So we have had to seek philanthropy. So philanthropists are funding data curation in Africa. So, you know, you've got to put this on your radar screen. You have to look at the haves and the have nots.

John Halamka: [00:40:27] In some ways, the haves pay for the have nots and we have to do this globally. Now generative AI, you've asked this big question, which, you know, the good news is we have three hours for this, so Suchi and I can really go on and on and on. You know, Suchi, were you in the meeting with Atul Butte when he described his recent cancer? Yes. Yeah. So a tool who was one of our great friends and is the chief data officer of the University of California, is quite publicly stated that he has a very rare cancer. And he went to generative AI and asked, what should I do given this is the sequence of my tumor? Now, generative AI has never seen this tumor sequence. It's so rare. There are only 2 or 3 humans that have it, but a tool would argue the hallucinations are hypotheses. And that is, you should look at them and decide if they're good or bad, but they're ideas. So he actually went to his oncologist and said, wow, you know, these are good ideas. Why don't we actually try some of that? And today a tool is actually cured. So, you know, generally problematic because say 80% good, 20% bad. I think all of us at this stage are saying it has to be B to B to C, right? As a human, to a human, to a human, and not just say, here's a generative AI. Caveat emptor. Good luck. It's early.

Jody Ranck: [00:42:02] And then you think you see some of the smaller models trained on very specific biomedical data sets and so forth. You have you know, I think there are some folks architecting things differently to cut down on those hallucinations as well. So we'll probably see a lot of innovation in the coming year or two on that front. But Suchi, you had something you wanted to add. 

Suchi Saria: [00:42:25] Yeah, I think I think there's an element of going back to that culture of evaluation and, you know, rigorously measuring what is and isn't working. And ultimately, you know, over the last decade, as we've seen, advances in AI in new areas like each one of them, one of the key things we've had to do is to think about rubrics for evaluation, and I. What's hard is people often coming up with platform, broad use API without sort of a clear notion of how is it exactly going to be used. And once you know exactly how it's going to be used, then you can set up a benchmark or a rubric for being able to evaluate, is it working well there? So I think it's early days, but it'll be exciting to see people putting together, you know, broader and broader rubrics for being able to evaluate it. I think in narrow use cases, when we use some of the underlying technology that's helped build ChatGPT, you can reuse that same technology in productive ways that allow you to build powerful, productive, advanced applications in a way that fully falls within the rubric of the FDA. And you can evaluate. And it all works really, really well. So I think that's nothing specific to the technology. It's more that the most common form of it that you're seeing is ChatGPT and chat bot like manifestations, where it's very broad, you know, the intended use is not clear. So it's hard to test kind of and, and I think narrow use cases will probably be the place where we see most promising near term outputs.

John Halamka: [00:43:59] But to be determined. So sushi, you know, you and I each talked with hundreds of people about this and some say, oh, superintelligent going to be really wonderful cardiologist of the future. Others say, idiot savant. It'll be somewhere between the two.

Suchi Saria: [00:44:18] Yeah, I think either way, though, I think, John, you'd agree with this, right? Like to me, like there are a million use cases where we can be productive today within the rubric of like, best practices and guidelines where things work well. So the hard part, it's not sort of all or nothing. It's more like it's in the gray. So it's like kind of you want to know what the use case is, how are you validating it? How are you measuring performance? How are you measuring performance locally? How are you measuring performance in terms of use. Like is it our you know, who's using it? What is the intended risk? What is the intended harm like sorry, what are the risks and what are the harms and what is your mitigation plan and what's your measurement plan. And if you can do all of that, then there are narrow use cases where really you can, you know, the same platform technology can be usable. But I don't know if like like today, a lot of the focus is on out of the box use of ChatGPT without defining intended use, without defining population, without defining harm, without defining risk. And so that's what's to me, very problematic. And I think as we start to define use cases, we can also tailor the technology to really make it work well. And that's where I think some of the most high value returns will come in terms of value created.

John Halamka: [00:45:38] Yeah. And maybe, you know, as our friend Peter Lee says, we'll have Llms reading the output of Llms and telling you whether they're good or bad. And then, of course, Suchi and I asked Peter Lee, and then what? What oversees the editor O and LLM it's.

Suchi Saria: [00:45:54] Turtles all the way down.

Jody Ranck: [00:45:55] Yeah, yeah. And this question of innovation in the role of the state or government occasionally I've been hearing a bit about. You know what should the beyond just the regulatory piece but infrastructure. And given that with generative AI how expensive it is there are folks saying, well, what should the government have some sort of public goods or commons based role to play that enables innovation to flourish, you know, and not just all of it have to come from OpenAI, Microsoft and Google. What do you guys think about that with health care is, you know, have you seen any interesting proposals of sort of commons, cooperative public goods, types of approaches to AI in health care? Or do you think this is just pie in the sky?

John Halamka: [00:46:52] It's an interesting question. I mean, there are a whole variety of companies entering the space. And so John and Jody, you know, I have no conflicts of interest and nothing to disclose, nor do I endorse any company or service. But I've seen such a company like Cerebrus, which says, you know, we'll bring you a data center on a chip this big, and it's really a lot less expensive than a CPU. And oh, maybe that is like a way to start creating innovation for those who can't go buy these highly expensive, power consumptive devices. So we're going to see a lot of innovation in this industry.

Suchi Saria: [00:47:31] I think one of the opportunities is, you know, kind of almost where we started the meeting with data. I think, like I saw this beautiful talk again, actually referencing a tool, a friend. He talked about like early on, people sort of advanced this idea that data is the new oil, but in reality, data, we should think of data as the new soil, unlike, you know, when we think of it as oil, there's a notion of either you have it or I have it, but only one of us can have it. And as a result, there's a fight for who has the data or the resource. But with data you can copy the data, you can make multiple copies of it. So with data, it's more the opportunity to nurture what can good can we create with it? And who are the people who can come in and work in this with this data to build sort of grow productive vegetation and plants and forests and food that will feed people. And so I think to me, the opportunity to really for the federal government to play a role in liberating the data in a way that we do it in a, you know, that's secure, that's private, that respects individuals and their rights, but also enables, like, I lost my nephew to sepsis.

Suchi Saria: [00:48:45] Like, honestly, I mean, I would want for my mom, for my dad, for my family, my friends, for them to have the best health care possible. And for that to happen, I think it's very clear the way our system is architected today. We're not giving our caregivers the best resource possible for them to be effective with the data that they have, because today's systems kind of create more, I mean, countless experiences where it's more hurting their performance than helping them because of alarms and alert fatigue and pain and bureaucracy and click and click and click. It's like nonstop, endless. And what we really want is to design technology that helps them. And part of that is using the data to augment their practice in a way that they see as unburdening them is helping them do the right thing. Right. You know, the the reason why most come into medicine in the first place. So I think the opportunity to really figure out a way to make data more of a soil and how do we kind of create, you know, laws and practices and best practices and legislation that allows us to liberate and bring more productive people into the framework to leverage this data to create good, I think, is a huge area for where hopefully we'll see more. 

Jody Ranck: [00:50:01] As in the UK, they're experimenting with them data trusts, which different types of data commons and we know how to regulate those. I mean, Elinor Ostrom won the Nobel Prize for, you know, the rules of the Commons and how to to manage those types of resources. So I think, I think you're you're right on there that that's going to be an essential part of sort of democratizing innovation, not democratizing AI, but democratizing innovation and making it more accessible to small startups, to the big guys. And so we only have a couple more minutes. John Moore, do we have any questions that we.

John III: [00:50:40] Yeah. So so Ed Buxton, our good friend, actually came into the chat and was asking about our take on is AI better than current medical error rates, which I think both you and John Halamka have touched on in the past. And I'd love to hear from all three of you about that. Like, at what point do we actually see that being the trade off? Like, you know, you had the podcast with Mark Coeckelbergh Jodie where you guys talked about the philosophy of people are terrible drivers. Ai is a better driver than most people already, but we still aren't comfortable with letting cars be driven by computers. So, you know, how do we apply that to health care? What's that inflection point?

John Halamka: [00:51:17] So do you have a funny story? I was sitting next to a guy and my flight back from Norway, and I said, well, what do you do? And he said, I like the outdoors. That would be a little bit like sitting next to Tom Hanks. And Tom says, I like movies. So this is the world's foremost Arctic and Antarctic explorer who climbed Everest, then walked to the South Pole, then walked to the North Pole. And I talked to him about the use of AI in resource poor settings. He said, you know, if I had an AI that was half right at the South Pole, it would be extraordinary. You know, I'm very willing to accept some things that are better than nothing. And I think you're starting to ask the question, well, what's the positive predictive value of a human? I think you could probably say it's probably 50%. So some of these AIS with an AUC of 0.7 may actually be okay.

Suchi Saria: [00:52:19] I think a lot of it is the loss of control, right. So we did this. We did a collection of studies that we released last year. Part of it was measuring how does the system do against. Very good. So these studies were done at Hopkins. I mean, Hopkins isn't, you know, sort of a place with like really good people and great standard of care and really focus on safety, quality. And what we were able to see was dramatic improvements in performance against clinicians and physicians at Hopkins. But more interestingly, a piece that we focused on was what will it take for clinicians and experts to adopt AI? And part of it was this notion of autonomy. Like, do they feel it's going to lead to a loss of loss of autonomy? And part of that is like, how do you build it in a way that it enables teaming between the human and the machine or the human expert? Like, is there a way you can provide transparency so it feels more like you're augmenting them as opposed to overriding them? And ideally, if you can build the right kind of augmentative interfaces, you can now do the best of both as opposed to just one or the other.

Jody Ranck: [00:53:27] So the answer is see both. Can I just add one little thing to that? Like if we think broader more broadly to social determinants, I think we're we have to think about sort of the algorithmic determinants of health beyond the health systems too. People get denied social welfare benefits or loans and things like that. Communities are structured certain ways increasingly through algorithms. And so what recourse do they have when these things work against them? You know, I think that's another piece of the equation that we don't talk about much that is going to be increasingly important.

John III: [00:54:09] Yeah. Grace Cordovano posted about that the other day, about how she was able to see that an AI helped interpret her mammogram, and she really liked, just as a patient, knowing that I was involved in that decision making so she can make a judgment call on her end whether or not she's comfortable with that. Okay. We've got a question from J.D. Whitlock here. He says we can agree that the EO will hopefully increase safety of AI in health care. Of course, there are also concerns that the regulatory implementation will clamp down too much on innovation. What can you say to reassure innovators, VC folks that this will not happen?

John Halamka: [00:54:48] Again, I reflect on 40 years of experience with policy making. This process is totally inclusive. Again, you've never quite seen so much public private collaboration. So how about the answer to the question being what if you wrote the regulations? You know, obviously they'll be written with your input. But I mean, in effect, it's going to probably stipulate this is just a guess, that the government will take all of our input and tell us the what, and then it'll be up to all of us working together to create the how. And I feel pretty good about that.

Jody Ranck: [00:55:28] Suchi, do you have any final thoughts? 

Suchi Saria: [00:55:30] I think I mostly agree with John, but if I had, you know, wearing going back to my scientific roots, I think the question remains open. We don't know that. I think thus far the process has been phenomenal. And over the next couple of years, how this gets rolled out and implemented will be crucial. And I think the ERA doesn't exactly spell out in detail how that will take place. And so in some sense, the hope here is it will continue to be a very inclusive process. It will continue to be a process where external experts are pulled in experts with diverse perspectives. And most importantly, I think, sort of providing a way by which we address that. We don't have all the answers today, so we don't want to clamp down in a way where we're saying, you know, like it needs to be a learning framework. It needs to be one where we can adapt rapidly over time as we learn more. And, you know, how do we set that up?

Jody Ranck: [00:56:23] Right. All right. Well, thank you, Suu Kyi. And John, I think we're at time now. So I just wanted to thank you and our audience for attending today and hand over to you, John, to John Moore to close it.

John III: [00:56:35] So I actually want to just get one more in, which is kind of a follow up to that last question, if we can. Morgan Jefferies from Geisinger posted, some people, Jeremy Howard comes to mind, but there are many others think that regulating models risks consolidation of power, and that it's better to focus on applications between those models and applications. Where do you think regulators should focus and where do you think the puts the focus?

John Halamka: [00:56:58] Wow, what a good question. So I've just created the most amazing algorithm ever. It's got an AUC of 0.99 and I give it to clinicians and no one uses it. Or we give it to clinicians and the patients don't have better outcomes. That's not a success story. So to me it's one of those not either or right. We need to validate the models. And then we need to study the usability of those models and ultimately the outcomes caused by those models.

Suchi Saria: [00:57:34] I think ultimately, you know, one way to think about this is where should gatekeeping start and stop, right? Like, let's just go back to drugs, right? The FDA gives approval to say a drug is safe and effective, but still some payers got to decide whether they want to pay for it. And in order to do that, they have to collect additional data in order to show that it's effective and in which populations it's effective. So ultimately, this framework done right, what we need is a way to know like we need it to. We need the models to work. We need it to get adopted. And we need to show results. And all three are vital. Now the question is how much of it is mandated by federal agencies versus how much of this is, you know, creating awareness and education among all users so that they can choose, you know, in their own evaluation process? How do they accrue the data necessary to make sure they're making a good decision?

John III: [00:58:28] Seems like we keep coming around to the fact that we don't exactly have the models for evaluation in place yet to track those outcomes, and that's really the first step to figuring all of this stuff. All right, everyone, well, like Jody said, thank you all three for joining us today on this very important conversation. I think that we got some really solid nuggets there. I really appreciate you taking the time this afternoon to share your insights and your expertise in this area with the rest of the community. I will be sharing the podcast in the next week or so, and if everybody could just stay tuned to the Chillcast, you'll see that go live when it goes up. Hey.

John Halamka: [00:59:02] Well, thanks so much. This was fun.

Suchi Saria: [00:59:04] Thanks, everybody.



People on this episode

Podcasts we love

Check out these other fine podcasts recommended by us, not an algorithm.

Podnosis Artwork

Podnosis

Fierce Healthcare
Straight Outta Health IT Artwork

Straight Outta Health IT

Straight Outta Health IT
Beyond Clinical Walls Podcast Artwork

Beyond Clinical Walls Podcast

Dr. Bayo - Dr. Curry-Winchell, M.D.
The Heart of Healthcare Artwork

The Heart of Healthcare

Halle Tecco, Michael Esquivel, & Steve Kraus
The Race to Value Podcast Artwork

The Race to Value Podcast

Institute for Advancing Health Value
In AI We Trust? Artwork

In AI We Trust?

Miriam Vogel
InteropNow! Podcast Artwork

InteropNow! Podcast

HLTH and CHIME
Leveraging Thought Leadership Artwork

Leveraging Thought Leadership

Peter Winick and Bill Sherman
99% Invisible Artwork

99% Invisible

Roman Mars
Equity Artwork

Equity

TechCrunch, Mary Ann Azevedo, Kell, Theresa Loconsolo, Rebecca Bellan, Kirsten Korosec, Devin Coldewey, Margaux MacColl
HBR IdeaCast Artwork

HBR IdeaCast

Harvard Business Review