EBMT 2026 | Synthetic data generation with AI in the context of transplant and cellular therapy in hematology

So during the recent EBMT meeting in Madrid, we discussed the application of synthetic data generation to accelerate the generation of evidence and clinical research in hematology. So we know that in order to generate new evidence in hematology, we need data. And with respect to this urgent need, we are facing the situation that the great majority of healthcare data still remains unused for different reasons associated with privacy limitations, to the fact that there is a lack of data harmonization from different sources, and the data is dispersed across different institutions...

So during the recent EBMT meeting in Madrid, we discussed the application of synthetic data generation to accelerate the generation of evidence and clinical research in hematology. So we know that in order to generate new evidence in hematology, we need data. And with respect to this urgent need, we are facing the situation that the great majority of healthcare data still remains unused for different reasons associated with privacy limitations, to the fact that there is a lack of data harmonization from different sources, and the data is dispersed across different institutions. So I believe that artificial intelligence can provide a reliable solution for this urgent need. And in particular, synthetic data are artificial data that are generated by an algorithm that is trained to learn all the essential clinical and statistical characteristics of a real data set. And importantly, since they are not real data, synthetic patients are not associated with particular limitations, so they can be easily accessed and shared. The potential properties and application of synthetic data generation is to increase data sharing, to solve class balance and missing information in terms of data harmonization, to provide data augmentation, to offer the validation of built algorithms, and even more importantly, to generate new evidence.

This is our experience in terms of generating synthetic data in a critical context of a rare disease associated with large biological and clinical heterogeneity, such as myelodysplastic syndromes. As you can see here, starting from a publicly available data set of 2,500 patients with MDS with more than 250 features, we generated an equal amount of synthetic data in order to compare the clinical and biological characteristics of synthetic versus real patients. As you can see here, the great majority of statistical, clinical, and genomic properties of real data are captured in an efficient manner by synthetic data generation. So according to the available technology, as you can see here, the great majority of the clinical value of a real patient are captured by the generation of synthetic patients by artificial intelligence with very high privacy preservability. That means that the technology is robust in terms of application of privacy regulation, and it’s very efficient in terms of capturing efficient and specific information at individual patient level.

What is the most relevant potential application of this technology? In my personal view, it’s to try to accelerate the conduction of randomized clinical trials in specific clinical scenarios in which a standard clinical study can be associated with potential limitations. And this scenario includes rare and ultra-rare diseases, patients with major medical needs in which the best available therapy is not effective, and diseases arising in elderly people. And importantly, most hematological diseases match at least one of these critical scenarios. The opportunity with synthetic data generation is to try to replace the need to enroll patients in the so-called control arm, that means patients receiving the standard of care therapy, and to replace this information that is already available through real-world data or through data coming from already existing clinical trials by synthetic patients. And in order to define a possible proof of concept for this perspective, again, we refer to myelodysplastic syndromes as a prototype for a rare disease associated with unmet clinical needs and large clinical heterogeneity. And we applied, again, the generation of synthetic data to an academic cohort of real patients with low-risk MDS treated with luspatercept in order to improve anemia. And as you can see here, synthetic data were able to recapitulate all the clinical endpoints of the real data set and to generate the longitudinal perspective of the patient in terms of the most relevant clinical outcomes. So to conclude, I think that since most of the clinical needs are associated with the presence of rare diseases in medicine and specifically in hematology, artificial intelligence can be associated with a real opportunity to accelerate clinical innovation and to improve the possibility to offer to the patient a more effective and reliable treatment.

This transcript is AI-generated. While we strive for accuracy, please verify this copy with the video.

EBMT 2026 | Synthetic data generation with AI in the context of transplant and cellular therapy in hematology

Transcript

Related Videos

EBMT 2026 | Synthetic data generation with AI in the context of transplant and cellular therapy in hematology

Transcript

More from Matteo Della Porta

Related Videos

Cookie settings