In this project, we try to answer a very practical question that is how much our data infrastructure is ready for AI models and machine learning models in the future. As you know, AI models and machine learning are changing the landscape of cancer screening, risk stratification, and therapy decision-making. In myeloma, M-spike or monoclonal protein is the main lab variable that measures disease volume and we assess treatment response based on that...
In this project, we try to answer a very practical question that is how much our data infrastructure is ready for AI models and machine learning models in the future. As you know, AI models and machine learning are changing the landscape of cancer screening, risk stratification, and therapy decision-making. In myeloma, M-spike or monoclonal protein is the main lab variable that measures disease volume and we assess treatment response based on that. Therefore, assessing monoclonal protein or M-spike across the board in all electronic medical records is very important to train very robust models.
We asked this question that how ready are we to use this M-spike across the U.S. Institutes. We looked at 384 institutes across the U.S. with more than 1,700 patient charts reviewed through the Health Tree Foundation. And we found out that only half of the patients really have capturable M-spikes in all electronic medical records. Worse than that, if you look at the time around diagnosis, only one-fifth of patients have discernible lab values that can be captured for training AI models. So this is a challenge. This will be a challenge that limits the AI model’s robustness. Worse than that, we looked at the socioeconomic factors of these patients, and we found out that if you are from a high socioeconomic status, the probability of having discrete lab values that can be used in AI is much higher. It means that if we at the current moment train AI models, we miss the low socioeconomic patients and therefore AI models will be suboptimal for them. So this is the bad news. However, the good news is it’s a very fixable situation. Easily, for a pathology department, they can move the value from text-embedded to a discrete lab value, and that will solve the problem. I think we have to raise awareness, and that’s the purpose of our project, that we raise awareness to make sure the future AI models are trained for all of the patients.
This transcript is AI-generated. While we strive for accuracy, please verify this copy with the video.