Taehoon Ko, Ph.D, Jiye Jung, M.D., Jong Hyuk Park, Ph.D., Hyung Min Kim,M.S., Sunghan Woo, B.S., Jaseong Koo, M.D., Sung-Hun Min, Ph.D, Miran Kim, M.D., Ph.D., HyeJin Chang, M.D., Ph.D., Mi Kyung Chung, Ph.D, Moon Kyoung Cho, M.D., Ph.D., Jisun Lee,M.D., Ph.D, Hyejun Lee, M.D.
AI research projects to automatically grade embryos and predict pregnancy has been a popular topic. However, it is challenging to collect standardized embryo images and clinical data from multiple medical institutions using different electronic health records (EHR).Also, clinics use slightly different criteria to evaluate embryos. In South Korea, the cost of infertility treatment is covered by the National Health Insurance and fertility data has been accumulating. We would like to introduce the government funded project to collect and label fertility data at the national level.
MATERIALS AND METHODS:
A total of 20 hospitals are participating in this project to minimize the bias in data generated by different types of microscopes and incubators. We plan to collect data for clinical information of parents undergoing fertility treatments, fertility treatment cycles and embryos. Physicians, embryologists and biomedical informaticians collaborated to create the standardized data structure. Embryo images were collected from microscopes and time-lapse incubators, and matched with gestational sac (G-sac) data.
We developed a standardized protocol to collect fertility treatment data. Clinical information consists of three data tables. The first data table contains patient information at the embryo level including date of birth, maternal height, weight, obstetric and medical history of parents. The second data table contains information about embryos such as date of oocyte collection, date of embryo transfer, cryopreservation, preimplantation genetic test (PGT) and intracytoplasmic sperm injection (ICSI). The last data table contains fertility treatment cycle information such as number of oocytes and mature/fertilized eggs. Embryo images will be consensus-labeled by embryologists with more than 10 years of experience. In total, 13,000 day 3and 7,000 day 5 microscopic images and 1,500 time-lapse videos will be collected. In addition, outcomes such as G-sac and fetal heartbeat (FHB) will be achieved. To check the completeness of the data collection protocol, a deep learning model was trained on a sample batch of the dataset and the area under the receiver operating characteristic curve (AUROC) was 0.86 for automatic embryo grading.
After multiple discussions with fertility experts, we developed a standardized protocol to build datasets for fertility treatment. The pilot AI model developed with the first batch of the sample data showed similar outcomes as the previous literatures. Using our standardized protocol, we are currently collecting data from 20 fertility clinics in South Korea.
We developed a standardized data collection protocol to generate big data for fertility treatment across multiple fertility clinics. This large-scale dataset will be a foundation to discover clinical evidence and develop robust artificial intelligence models to support fertility treatment.