podcast-1.json

{"podcast_details": {"podcast_title": "The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)", "episode_title": "Explainable AI for Biology and Medicine with Su-In Lee - #642", "episode_image": "https://megaphone.imgix.net/podcasts/35230150-ee98-11eb-ad1a-b38cbabcd053/image/TWIML_AI_Podcast_Official_Cover_Art_1400px.png?ixlib=rails-4.3.1&max-w=3000&max-h=3000&fit=crop&auto=format,compress", "episode_transcript": " All right, everyone. Welcome to another episode of the TwiML AI Podcast. I am your host, Sam Charrington, and today I'm joined by Suin Lee. Suin is a professor at the Paul G. Allen School of Computer Science and Engineering at the University of Washington. Before we get going, be sure to take a moment to hit that subscribe button wherever you're listening to today's show. Suin, welcome to the podcast. Thank you for the introduction. I'm looking forward to digging into our talk. You are an invited speaker at the 2023 ICML Workshop on Computational Biology, and we'll be talking about your talk there, which is really centered around your research into explainable AI, an important topic. Before we jump into that, I'd love to have you share a little bit about your background and how you came to work in the field. Thank you so much. My lab is currently working on a broad spectrum of a problem, for example, developing explainable AI techniques, that's core machine learning. And then we also work on identifying cause and treatment of challenging diseases such as cancer and Alzheimer's disease, so that's computational biology. And then also we develop clinical diagnosis or auditing frameworks for clinical AI. And then you asked about how I got into this field. So I was trained as a machine learning researcher. When I was a PhD student, I was working on the problem of dealing with high dimensional data. And then at that time, when I was a PhD student at Stanford, in the field of computational biology, there was something really exciting happened, something called a microarray data. So it's a gene expression data that measures expression levels of 20,000 genes. And I suddenly thought that if machine learning researchers develop a powerful and effective method to identify cause of diseases such as cancer and then therapeutic targets for those diseases, then as a machine learning researcher, it can contribute hugely to the science and also medicine. And I just fell in love with this field. So that's how I got into the research at the intersection of computational machine learning and computational biology. After I got a job at the University of Washington that has a very strong medical school, and then I had wonderful colleagues, amazing people who had medical data, electronic health records, and then introduced me to this field of EHR data analysis in various clinical departments, anesthesiology and dermatology, and then emergency medicine. And I just got really interested into the possibility, the potential that AI researchers or machine learning researchers myself and my students can contribute to medicine. That's how I got into this field of largely three fields. So one is machine learning and AI, and the second is computational biology and then clinical medicine. You probably thought that you had to deal with messy data when you were in clinical biology and computational biology, and so you saw some of that EHR data. That data can be very messy. It is. The goals of the fields are slightly different to each other, but in the future, I strongly believe that those two fields will merge, biology and medicine. So in a clinical side, researchers are already generating the biological, molecular biology data from patients. So for example, for cancer patients, you can think about measuring the gene expression levels or genetic data from those cancer patients, and then what you want is the treatment. You want the AI or machine learning models to tell you which treatment, which drug, anti-cancer drugs are going to work the best for that particular patient. For that, you definitely need the biological knowledge and then actual mechanistic understanding of cancer. And what says to you that the fields will merge as opposed to collaborate closely? Clearly they need to collaborate closely, but when I think of merge, and maybe I'm taking this too far, I'm thinking of single models that operate in both domains. Yeah, I know what you're saying. So I tell my students or other young people that to actually move the field forward, to advance this field of biology, medicine, or biomedical sciences, you really need to become a bilingual researcher, or even trilingual these days, computer science plus biology plus medicine. And you have one brain that really thinks like machine learning researchers and biologists and then clinical experts. Usually that really helps to come up with a creative approach and that can really move the field to benefit patients. And then at the end, the ultimate goal of a biology and molecular biology is to understand life better so that you can advance the health of humans. So I think collaborations definitely help, but at the end, we really need to think about how to produce these young researchers so that they really think like experts in this area. These things already happened earlier in computational biology than clinical medicine. And when I was doing the PhD, it was usually based on collaborations, people who were trained primarily as a machine learning researcher and people who were trained as molecular biologists who hold pipettes and they work in the wet labs and then they form a collaboration and then write papers. But then later, we see a lot of departments that's named computational biology or biomedical science departments. So it's a really healthy move for this kind of interdisciplinary fields. It makes total difference. Yeah. Your research and again, your presentation at the conference are focused on explainable AI, XAI. Tell us a little bit about some of the things that you think are most important about explainability as applied to these fields. I think we get that machine learning and models in general can be opaque and make important high stakes decisions. You need some degree of explainability. But what's unique about your take in applying applicability in your field? Right. Okay. Thank you. That's an excellent question. So the core part of explainable AI, at least this theoretical framework, it basically means the feature attributions. So imagine you have a black box model, you have a set of input, a vector X, and then you have an output Y. And then when you have a prediction, you want to find a way to attribute to features. You want to know which features contributed the most. And then there are mathematical frameworks. Our particular approach, that's called a SHAP framework, it is based on game theory. So you want to find a way to understand which features are important. So that's the core of the technical side of explainable AI. And then on the other hand, if you just apply this explainable AI technique, off the shelf, explainable AI algorithm to biology, mostly it's useless. It's not very useful. It's not useful in terms of biological insights. What you really want to understand is how these features collaborate with each other. Imagine that you have a set of genes as a feature. You have 20,000 genes, 20,000 expression levels are the input of the black box model. And then your prediction is which cancer drug is going to work the best for each patient. And then individual genes' contributions and then gene importance scores, by themselves, they are not going to be really useful. It will be only useful when some explainable AI model, explainable AI algorithm can tell you which pathway, how genes collaborate with each other and then how genetic factors play a role into that. And then also how that leads to the good prognosis of the cancer patient and also sensitivity, the good responsiveness to that drug. So there is something missing there. And then the uniqueness of my research is that we want to develop this explainable AI method for biology and then also clinical medicine such that it can make real meaningful contribution to these fields. Another example in the medicine side is that imagine that you have a deep model, deep neural network that's going to take you a dermatology image. So say that you find something unusual in your skin and then you take a picture. That's your dermatological image. And then let's say that you want to know that has features of melanoma or not. So the prediction results itself is not going to be really useful. And then even the current explainable AI methods that's going to tell you which pixels, which parts of the images led to the prediction of melanoma or not, those are not going to be very useful to understand how this black box model really works. When you try, for example, that you modify the image and then generate a counterfactual, small changes to the image such that it changes the prediction. Let's say that that changes the prediction from melanoma to normal. Only then you can understand how this model works, what the reasoning process of this black box machine learning model is like. So those examples, I'm going to show many examples like that. Basically the message there is going to be that the current state of the art explainable AI that tells you theoretically supported importance values for the features are not going to be enough to make meaningful contributions to both biological science and then also clinical medicine as well. It sounds like you're calling out a broad deficiency in the approach and kind of saying that as opposed to this feature level explainability, we need more system level or process level explainability that is more grounded in the use cases or the application than what we have available today. Exactly. Right? Yeah. The question is how to do that. For that, we need a new explainable AI method. In the first part of the talk, I'm going to show many examples of what explainable AI, almost as is, can do. Those are the papers that were republished a couple of years ago and then addressed so that it addresses new scientific questions. One explainable AI or feature attribution methods as is can be useful. I'm going to show many examples like that in both biology and medicine. In the second part of the talk, I'm going to show how explainable AI can even open new research directions specifically for biology and healthcare. Those examples I showed you, the systems level insights or this counterfactual image generation that can facilitate collaboration with humans, in this case, clinical experts. In the second part of the talk, I'm going to show how this explainable AI can open new research directions. Then part of the second part will be I'm going to have a deep dive into our recent paper to highlight how explainable AI can help cancer medicine design, cancer therapy design. Specifically how to choose two chemotherapy drugs that's going to have a synergy for a particular patient. That's the paper that was recently published in Nature Biomedical Engineering. Before we dig into that paper, the most recent paper, can you talk us through in a little bit more detail some of the examples of the foundational machine learning research and how they contribute to the problems you're trying to solve? Okay. Some of the foundational AI methods we developed I'm going to talk about, it can be summarized into three parts. One is principled understanding of current explainable AI methods, specifically feature attribution methods. For example, in one work, we showed that our feature attribution method, that's a SHAP, it was published in Eurips in 2017, we showed that it unifies a large portion of the explainable AI literature and 25 methods following the exact same principle and all explaining by removing features. It turned out that 25 methods, feature attribution methods that are widely used in the field and machine learning applications, they all go by the same principle. You want to assess the importance of each feature by removing them or removing subsets of them. That helps us understand what goes on. For example, when they fail, you want to understand what goes on and also improve and then develop new explainable AI methods. I'm going to introduce a couple of unifying frameworks. This is about how to understand the principled understanding of feature attribution methods. On a computational side, we have explored many avenues to make this SHAP computation even feasible and faster. SHAP stands for Shapely Editive, I suddenly forgot, I can't forget this. Explanations. Yes, the Shapely Editive Explanations. It's kind of weird because they chose the third letter of the word. It's the first author, my student, Scott's choice. I love the name, by the way. Computing SHAP values is theoretically very well supported. But then computation-wise, it's not really easy to compute. It involves exponential computation. We need to develop approximation methods such that we can compute them in a feasible manner. We developed many fastest statistical estimation approaches. Then you want to make sure that there is a convergence and all the desirable theoretical properties are already there. Then also, we developed approaches for specific model types. For example, ensemble tree models and then also deep neural networks. We have a deep SHAP and then tree SHAP. More recently, we also have a vision transformer, Shapely. It's a way to compute the Shapely values for vision transformers. Then there is another one that's called the Fast SHAP. The one way to make the SHAP computation more feasible is to focus on specific particular aspects of models. For example, tree ensembles or deep neural network. They have some particular model types. There is a way to make this computation a little faster, basically make... So model specific versions of SHAP implementation. Yes, yes. Yeah. That's another line of research. More recently, we also started to understand the robustness of the SHAP values, adversarial attack. A few years ago, in the field of machine learning, researchers have tried to understand how robust the machine learning model itself, the prediction results are toward adversarial attacks. Then now we are looking into this issue in terms of the model explanations. How feature attributions are robust. In our most recent paper, we basically showed the removal-based approaches, including SHAP. Earlier, I said many of the feature attribution methods turned out to have the same principle, which is explaining by removal. So that method is more robust to this kind of adversarial attacks. And then multimodality, those other kinds of issues, we are actively doing this research in terms of foundational AI algorithms also. And SHAP, as you've mentioned, is broadly used, both the original algorithm as well as the related algorithms as you described. But it's also one of the first explainability approaches to be popularized. Where does it sit in terms of relevance? Are there different, kind of wholly different approaches that have overtaken it in popularity or applicability based on kind of today's models and applications? Or is SHAP still kind of a core approach to the way explainability is looked at in practice? So more on the later side, we believe that this removal-based approach and in this cooperative game theory, we believe in that. And then also it has the desirable properties, first of all. And then we, in our many experiments, we still see that removal-based approaches are more robust, as I said, the adversarial attacks. And then also in terms of various evaluation criteria, we still think that those methods are more robust than the other class, which we characterized as a propagation-based approach or gradient-based approaches. So we would prefer just removal-based approaches. But on the other hand, those approaches are very computationally very intensive. So the way SHAP works is basically that you try all subset of features and then you add a feature of interest and then see the model, check the model output and you average, of course, all subsets of features. So as you can imagine, it's computationally very intensive. So when we now think about foundational models or large language models, these really large models of a ton, a lot of parameters, and then deep neural network and gradient computation is perhaps easier than trying all subsets of features. So practically, it's not as easy as the other class in terms of the computation. But we still want to make this computation more feasible. We want to develop various clever approaches to reduce the computation and then still maintain the desirable theoretical properties that this removal-based approach or SHAP in particular has. Got it. And so that is an example of the foundational research that your lab does that contributes not only to your work on the biological science side or computational biology side, but broadly to the field. And then your more recent paper is an example of the kind of contributions you're making on the medicine side. Can you talk a little bit about that cancer paper? Yeah, sure. It is about AML. So we chose AML as an example application. So it's acute myeloid leukemia. It's aggressive blood cancer, and it's relatively common for older people. So to give you a bit of a background in general, the cutting edge in the treatment of cancers such as AML has increasingly become combination therapy. So the rationale here is that by choosing drugs that target complementary biological pathways, we can achieve greater anti-cancer efficacy. So basically you choose two or three chemotherapy drugs and then use them together so that when there is a synergy, usually there is a very good anti-cancer efficacy. But the issue is that choosing optimal combinations of drugs is a really hard problem. So there are about hundreds of individual FDA-approved anti-cancer drugs, which means that there will be tens of thousands of possible combinations. But when you consider pairwise combination, and there could be even more if you consider non-FDA-approved experimental drugs in development or consider a combination of more than two drugs. And then the different patients, even patients who have the same type of cancer may respond differently to exact same drugs because of this individual, the particular genomic characteristics. So then formulate this problem as a machine learning problem. So you take this AML patient's gene expression levels. So you get the blood of the patient and then purify the cell so you have only cancer cells. And then say you measure expression levels of 20,000 genes. So mathematically, this is 20,000 dimensional vector. And then also, let's say you consider a pair of drugs, drugs A and B, and then you use various information about this drug. For example, structure of these drugs or their biological targets. There are many data sets that can tell you that information. And then you take those as a machine learning input, and then you want to predict the synergy between the drugs A and B. So in this kind of a problem, and then as I said, there will be tens of thousands of pairwise combination of those drugs. And so in this kind of situation, not only the prediction, but also explanations will be extremely important. So say you want to be able to say that drug A and B is going to work well, are going to have a synergy together because this patient X has gene expression levels of A, B, and C high. And then, or you say expression levels of a certain biological pathway, those genes are highly expressed. So you need a set of explanation to do that. And then more importantly, if you think about all pairs of drugs, if there is an underlying principle in terms of when two drugs are likely to have a synergy, then it's going to be even more useful. So what we did in this paper was that we got the explanations. We computed a shaft values for many combinations of drugs from the machine learning model, and then we analyzed that, and then we identified the unifying principle in terms of when, in what case, any pair of drugs A and B have a synergy, and then we identified a pathway. It is called stemness pathway. So it is also called, trying to find in that part of the slide, this hematopoietic stem cell-like signature. Cansers are sometimes more differentiated or less differentiated. If you had a family member with cancer, you probably understand this term. So usually, less differentiated cancers have worse prognosis than more differentiated cancers. So we identified this pathway that's really relevant to this stemness mechanism, and then found the underlying principle, which basically says that it's good to have two drugs, one drug targeting less differentiated. The other one targeting more differentiated cancer are likely to work the best. So in this project, not only our algorithm can tell oncologists or biological scientists which genes are important, which feature attributions, which features are important for drug synergy, but also by analyzing many model explanations from many patients, we can have an understanding of these underlying principles in terms of what makes a successful drug combination therapy. Cancer therapy design, I would say this is an example where we can see how explainable AI can be effective in cancer therapy design. Is AML unique in having a well-understood pathway, or is that a bottleneck for the application of this technique to the broader set of cancers? Oh, so AML is just one example. This kind of a principle can be applied to many data sets. Computational biologists often need to work on the problem where the data are available. So as you can imagine, blood cancers, those tissues are relatively easy to obtain, blood tissues compared to other kinds of tissues. So there are many available data sets, and then also the measurement of the drug synergy from many samples. So we happen to choose this cancer type because of the data availability. But this approach can be broadly applicable to other types of cancer. So this is one of the- I'm maybe asking- Yeah, go ahead. I'm going to get a broader question, which is the explainability method is explaining over a set of known features and pathways and processes and things like that. And my sense is that for many of the potential applications, the pathways are still a subject of research themselves. Meaning, maybe there's some aspect of pathway that's known, but there are others, or some diseases for which there aren't pathways. And I guess I'm wondering the way you think about applying techniques like this in a- A, is that actually the case, or am I all wrong there? But otherwise, how will you apply techniques like this in rapidly evolving fields that are very complex? Meaning- That's an excellent question. You're giving an explanation, and the explanation is based on the pathway as you understand it, but there are so many other things going on in the system that you really have not accounted for. Yeah, exactly. Right. So first of all, pathway is not unique to disease. So when we say pathway databases, it basically tells you the members of the genes in each pathway. That's it. I mean, it's like many sets of genes. We also sometimes call it gene sets. It doesn't depend on the disease, and then the way we view is that it's not like all genes need to be activated for the pathway needs to be activated. It will be only a subset of genes. We would expect only a subset of genes to be highly expressed to say that pathway is activated. And then it's really extremely important for a computational biologist when we develop methods like this to get biological insights from large scale data sets. When we develop such a method, we need to make sure that it does not fully depend on any sort of prior knowledge. And then the algorithm needs to be flexible. So that's of key importance. So in this particular example, we didn't use a pathway actually from the beginning. When the model training happens, we used genes as individual features, and then we analyzed the feature attributions and then did the statistical test to see which pathways seem to be more activated. You made a really good point. In all computational biology methods, it's really important not to make it too rigid for the existing knowledge. It needs to be flexible. And so how do you evaluate your results in this particular paper? Oh, so say that you have a feature attributions for all genes for a certain patient and then for a certain combination of drugs. And then say you will have a lot of feature attributions then, right? Combining all patients and all pairs of drugs you considered. And then we perform the statistical test. So for example, it's a simple, you know, features exact test kind of statistical test where you see whether there is significantly large value of attribution values for certain set of genes defined by certain pathway. And then you do multiple hypothesis testing and then see whether that significance is indeed relevant. So the pathway-based analysis was done in a post-hoc manner after model training and then obtaining all, you know, model explanations. So another challenge we ran into in that project was that was really not addressed properly by this foundational AI field was feature correlation. So in many biomedical data sets, you will see lots of features that are correlated with each other. Many genes are correlated. It's really modular gene expression, you know, levels are very modular. So you easily see, you know, subset of genes that are very highly correlated with each other. So in that kind of case, SHAP values are not going to be extremely accurate because, you know, imagine that there are two genes that are perfectly correlated with each other, then there will be infinite ways to attribute to these two genes, right? So in that paper, in that Nature Biomedical Engineering paper, we addressed it by considering ensemble model. So we ran many ensemble of model explanations. So we ran the model. In this case, it was not your deep neural network. It was three ensembles. And then we averaged. We averaged the feature attributions that are from many models. And then we showed that it gives you more robust feature attributions when the features are correlated with each other. Awesome. So talk a little bit about where you see the future of your research going. That's a really important question. So in all three ways. So first of all, in the foundational AI method, as I briefly mentioned, this robustness issues and then also multi-modal data. Let's say that you have a set of features and each feature belongs to different category. They are in different modality. And then how to attribute to these features that are in different modalities. So that's an open problem. So it was actually motivated by biomedical problem, but it's widely applicable to other applications. And then also these emerging models of LLMs or other foundational models. And in this kind of really large models, how to actually compute the feature attributions properly. And then also we are really interested in sample-based importance to say that you transpose. The matrix transpose of your feature matrix. So I've been talking about this feature attributions a lot, but you can also apply Shapley values to gain insights into which samples are important for your model training. So that can help us understand how foundational models in various fields or large language models rely on which training samples. So that can be really important for model auditing perspective, first of all, and then to gain insight in terms of which samples were important for this large models to behave a certain way. So sample-based explanation is also one of the things that we are mainly working on. In the biomedical side, there are many projects. So, single cell data science is one of the big themes in my lab now. So you obtain gene expression levels or other kinds of molecular level information at a single cell level. So the advantage is that you will have a ton of samples. So one experiment is going to give you many samples, which is really appropriate for large scale models these days based on the neural networks. So for example, the researchers started looking into foundational model for single cell data sets. So in this kind of data set that have still high dimensional, and then researchers are now obtaining multi-omic data, so not only gene expressions, you can also obtain other kinds of genomic information. So that's going to increase the dimensionality also. And then large sample sizes, how to learn the biologically interpretable representation space. So that's one of the big questions in my lab, in the research in my lab. So all feature attribution methods at the end in the downstream prediction test, you attribute to features. And then the assumption is that each feature is an interpretable unit. In biology, as I mentioned earlier, it's not the case in biology. So the functional unit in biology is much more interpretable than in any individual genes. So how to learn the features that have more broadly representation, feature representation space that's biologically more interpretable. And then also how to make foundational models learned based on single cell data sets. So researchers started publishing those papers that are about applying this foundational model approach to single cell data sets. And then how to make it biologically interpretable so that you can gain scientific insights from the model results, and then also audit those models to make sure that users can actually safely use them for scientific discoveries. So attribution methods for this kind of modern machine learning models so that you can gain biological insights. So that's another theme. In a clinical side, we are really interested in this model auditing. In our most recent paper that's in review, we are focusing on dermatology examples. So the dermatological image is inputted into deep neural network, and then you want to know whether the prediction result is melanoma or not. There are many algorithms out there, some published in very high profile medical journals, and then also some available through the cell phone apps. So there are many algorithms. And then we recently tested them with separate the held out test samples, and then got the result that's a little concerning in terms of usage. So our analysis showed that explainable AI was extremely helpful. So for example, in the skin image, which part of the image led to that kind of a prediction? Or as I said, using this counterfactual image generation. So you make small changes to the input dermatology image such that it changes, it crosses the decision boundary of the classifier, and then see what features were changes. So that way you can see the reasoning process of this classifier, the clinical AI model. So for that, there needs to be some technological development there because the feature attributions themselves are not going to be enough. It shows only very small part of the inner workings of the machine learning model. So developing methods for auditing clinical AI models, that's the research we are currently performing in the clinical area. So all three areas we are doing exciting research. Well, so when it sounds like you've got a lot of work ahead of you. Yes. Yeah. Very busy. I bet. Thanks so much for joining us. Thank you. Thank you for inviting me. Thank you. All right, everyone. That's our show for today. To learn more about today's guest or the topics mentioned in this interview, visit twimlai.com. Of course, if you like what you hear on the podcast, please subscribe, rate, and review the show on your favorite podcatcher. Thanks so much for listening and catch you next time."}, "podcast_summary": "In this podcast episode, Sam Charrington interviews Suin Lee, a professor at the University of Washington. Suin discusses her work in explainable AI (XAI) and its applications in computational biology and clinical medicine. She explains that while current XAI methods focus on feature attributions, they are not very useful in biology and medicine unless they can explain how features collaborate with each other. Suin also talks about her research on cancer therapy design, where she applies XAI to identify the underlying principle for drug synergies in acute myeloid leukemia (AML). She emphasizes the importance of developing flexible and robust AI methods, as well as studying single-cell data science and auditing clinical AI models. Overall, Suin believes that XAI has the potential to make meaningful contributions to biology and healthcare, but further research is needed to improve current methods and address emerging challenges.", "podcast_guest": "Suin Lee, University of Washington", "podcast_highlights": "In this podcast episode, host Sam Charrington interviews Su-In Lee, a professor at the Paul G. Allen School of Computer Science and Engineering at the University of Washington. Su-In Lee discusses her work in the field of explainable AI (XAI), specifically in the context of computational biology and clinical medicine.\n\nLee explains that XAI aims to understand the feature attributions of black box models, such as which factors contribute the most to a prediction. She emphasizes that in the fields of biology and medicine, the current state of the art XAI methods are not sufficient. Lee's research focuses on developing XAI methods that can provide meaningful contributions to these fields.\n\nLee discusses her recent paper on cancer therapy design, specifically for acute myeloid leukemia (AML). The goal is to choose combinations of drugs that have synergy in treating the disease. Lee's team used XAI techniques to identify gene expression levels and pathways that are important for drug synergy in AML patients.\n\nLooking ahead, Lee outlines the future of her research, including ongoing work on robustness, multi-modal data, and sample-based importance in foundational AI methods. She also highlights the importance of XAI in single-cell data analysis and model auditing in clinical settings.\n\nOverall, this podcast episode provides insights into the current challenges and advancements of XAI in the fields of computational biology and clinical medicine."}