OBJECTIVE
This study aims to investigate the efficacy of incorporating blastocoel area into AI model training to enhance the accuracy of predicting trophectoderm (TE) grades.
MATERIALS AND METHODS
A dataset comprising 2,163 single static images of day-5 fresh embryos was gathered from six in vitro fertilization (IVF) clinics spanning the period from June 2011 to May 2022.The study focused on embryos at stages 4 and 5, excluding any at the cleavage or hatched stages. Five experienced embryologists conducted manual annotations of the inner cell mass (ICM), ring-shaped parts of TE (rTE), and blastocoel boundaries. We evaluated model performance using a 3-fold cross-validation approach, comparing three configurations: rTE regions only, rTE combined with blastocoel and ICM, and rTE combined with blastocoel but excluding ICM. Additionally, Grad-CAM analysis was used to identify the model's focal points during training.
RESULTS
Comparative analysis of predictive accuracy among the three model configurations revealed that the model integrating both rTE and blastocoel regions, while excluding the ICM area, demonstrated superior and consistent performance, yielding a mean AUROC of 0.885 with a mean standard deviation of 0.012. Including the ICM slightly reduced performance. The Grad-CAM analysis showed that the model occasionally focused on the ICM area when trained with both ICM and TE information.
CONCLUSIONS
TE grading is an essential part of blastocyst assessment. While many researchers focus on the cross section of TE for evaluation, and AI predominantly analyzes this through segmentation, our findings emphasize the importance of including blastocoel information alongside TE regions to improve TE grade predictions. Excluding the ICM from training helps mitigate bias and enhances predictive performance, focusing solely on the TE. This approach increases precision in embryo assessment.
IMPACT STATEMENT
This study advocates for the integration of blastocoel area information into AI model training to optimize TE grade prediction accuracy. Furthermore, the deliberate exclusion of the ICM region offers a strategy to ameliorate errors stemming from inadvertent learning of irrelevant features, thereby enhancing model reliability and efficacy in clinical settings.