Impact of Transfer Learning Using Local Data on Performance of a Deep Learning Model for Screening Mammography.

James J J Condon, Vincent Trinh, Kelly A Hall, Michelle Reintals, Andrew S Holmes, Lauren Oakden-Rayner, Lyle J Palmer

Radiology. Artificial intelligence. 2024 May 9

"Just Accepted" papers have undergone full peer review and have been accepted for publication in Radiology: Artificial Intelligence . This article will undergo copyediting, layout, and proof review before it is published in its final version. Please note that during production of the final copyedited article, errors may be discovered which could affect the content. Purpose To investigate the issues of generalizability and replication of deep learning (DL) models by assessing performance of a screening mammography DL system developed at New York University (NYU) on a local Australian dataset. Materials and Methods In this retrospective study, all individuals with biopsy and surgical pathology-proven lesions and age-matched controls were identified from a South Australian public mammography screening program (January 2010 to December 2016). The primary outcome was DL system performance, measured with the area under the receiver operating characteristic curve (AUC), in classifying invasive breast cancer or ductal carcinoma in situ ( n = 425) from no malignancy ( n = 490) or benign lesions ( n = 44) in age-matched controls. The NYU system, including models without (NYU1) and with (NYU2) heatmaps, was tested in its original form, after training from scratch (without transfer learning; TL), after retraining with TL. Results The local test set comprised 959 individuals (mean age, 62.5 years [SD, 8.5]; all female). The original AUCs for the NYU1 and NYU2 models were 0.83 (95%CI = 0.82-0.84) and 0.89 (95%CI = 0.88-0.89), respectively. When applied in their original form to the local test set, the AUCs were 0.76 (95%CI = 0.73-0.79) and 0.84 (95%CI = 0.82-0.87), respectively. After local training without TL, the AUCs were 0.66 (95%CI = 0.62-0.69) and 0.86 (95%CI = 0.84-0.88). After retraining with TL, the AUCs were 0.82 (95%CI = 0.80-0.85) and 0.86 (95%CI = 0.84-0.88). Conclusion A deep learning system developed using a U.S. dataset showed reduced performance when applied 'out of the box' to an Australian dataset. Local retraining with transfer learning using available model weights improved model performance. ©RSNA, 2024.

Full text links

We have located links that may give you full text access.

Show additional links to paperHide additional links to paper

PubMed

Add to Saved Papers

Get 1-tap access

Related Resources

For the best experience, use the Read mobile app

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app

All material on this website is protected by copyright, Copyright © 1994-2024 by WebMD LLC.
This website also contains material copyrighted by 3rd parties.

By using this service, you agree to our terms of use and privacy policy.

Your Privacy Choices

You can now claim free CME credits for this literature searchClaim now

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app