Hye Jun Lee, M.D. , Sung Han Woo, B.S. , Jong Hyuk Park, Ph.D. , Ji-Ye Jung, M.D. , Sungwook Choi, M.D. Hyung Min Kim, M.S. , Taehoon Ko, Ph.D.
OBJECTIVE: To assess deep learning algorithms to predict implantation in in vitro fertilization
while improving performance of the CNN models trained on embryo images with high variance.
MATERIALS AND METHODS: We retrospectively collected single static images of 1,741 day 5
blastocysts from 1,068 patients who underwent embryo transfer at a single in vitro fertilization
(IVF) clinic between January 2015 and March 2021. The images were collected from standard
optical light microscopes and matched with pregnancy data such as gestational sac (G-sac) and
fetal heartbeat (FHB). We built two convolutional neural network (CNN) models with different
pregnancy outcomes; G-sac and FHB, and compared the accuracy and the area under the
receiver-operating curve (AUROC). We also observed high variance in visual properties such as
color, brightness and contrast as the embryo images were taken under various environments. We
applied the MixUp data augmentation method known to maintain the performance of models
trained on datasets with high variance. The dataset was splitted into a training set and a test set
with a ratio of 8:2.
RESULTS: The AUROCs of the CNN models predicting G-sac and FHB were 0.78 and 0.72,
respectively. After MixUp augmentation, the AUROCs improved to 0.80 and 0.79, respectively. The
accuracies of the CNN models predicting G-sac and FHB were 0.75 and 0.63. After applying
Mixup, the accuracies went up to 0.75 and 0.68, respectively.
CONCLUSIONS: The CNN models built based on day 5 embryo images successfully predicted Gsac
and FHB with high accuracy. Overall, the performance of the G-sac prediction model was
better than that of the FHB model. It is expected that non-embryo factors like uterus and
immunology become more important as pregnancy advances. Further study to include nonembryo
factors in the CNN model may improve the predictive model. We also acknowledge that
a high variance in images can reduce performances of the CNN models. In this study, we
demonstrated that data augmentation methods like Mixup can improve the performance of the
models using images in high variance. The limitation of this study is that it was a retrospective
study performed on embryo image data from a single IVF clinic. Cross validation using training
data from one clinic and test data from another clinic is warranted.
IMPACT STATEMENT: In this study, we demonstrated that the CNN model successfully predicted
clinical pregnancy with high accuracy. However, the further pregnancy advances, the more factors
influence the maintenance of pregnancy. To improve the prediction models, further studies are
required to identify additional non-embryo factors to include in the models.