Add like
Add dislike
Add to saved papers

Automated data extraction and ensemble methods for predictive modeling of breast cancer outcomes after radiation therapy.

Medical Physics 2019 Februrary
PURPOSE: The purpose of this study was to compare the effectiveness of ensemble methods (e.g., random forests) and single-model methods (e.g., logistic regression and decision trees) in predictive modeling of post-RT treatment failure and adverse events (AEs) for breast cancer patients using automatically extracted EMR data.

METHODS: Data from 1967 consecutive breast radiotherapy (RT) courses at one institution between 2008 and 2015 were automatically extracted from EMRs and oncology information systems using extraction software. Over 230 variables were extracted spanning the following variable segments: patient demographics, medical/surgical history, tumor characteristics, RT treatment history, and AEs tracked using CTCAEv4.0. Treatment failure was extracted algorithmically by searching posttreatment encounters for evidence of local, nodal, or distant failure. Individual models were trained using decision trees, logistic regression, random forests, and boosted decision trees to predict treatment failures and AEs. Models were fit on 75% of the data and evaluated for probability calibration and area under the ROC curve (AUC) on the remaining test set. The impact of each variable segment was assessed by retraining without the segment and measuring change in AUC (ΔAUC).

RESULTS: All AUC values were statistically significant (P < 0.05). Ensemble methods outperformed single-model methods across all outcomes. The best ensemble method outperformed decision trees and logistic regression by an average AUC of 0.053 and 0.034, respectively. Model probabilities were well calibrated as evidenced by calibration curves. Excluding the patient medical history variable segment led to the largest AUC reduction in all models (Average ΔAUC = -0.025), followed by RT treatment history (-0.021) and tumor information (-0.015).

CONCLUSION: In this largest such study in breast cancer performed to date, automatically extracted EMR data provided a basis for reliable outcome predictions across multiple statistical methods. Ensemble methods provided substantial advantages over single-model methods. Patient medical history contributed the most to prediction quality.

Full text links

We have located links that may give you full text access.
Can't access the paper?
Try logging in through your university/institutional subscription. For a smoother one-click institutional access experience, please use our mobile app.

Related Resources

For the best experience, use the Read mobile app

Mobile app image

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app

All material on this website is protected by copyright, Copyright © 1994-2024 by WebMD LLC.
This website also contains material copyrighted by 3rd parties.

By using this service, you agree to our terms of use and privacy policy.

Your Privacy Choices Toggle icon

You can now claim free CME credits for this literature searchClaim now

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app