Development and external validation of automated ICD-10 coding from discharge summaries using deep learning approaches

Item request has been placed!

Item request cannot be made.

Processing Request

Read More Add to Saved list

Author(s): Wanchana Ponthongmak; Ratchainant Thammasudjarit; Gareth J McKay; John Attia; Nawanan Theera-Ampornpunt; Ammarin Thakkinstian
Source:
Informatics in Medicine Unlocked, Vol 38, Iss , Pp 101227- (2023)
Subject Terms:
Deep learning; Natural language processing; International classification of diseases; Patient discharge summaries; PuBMedBERT; Computer applications to medicine. Medical informatics; R858-859.7
Document Type:
article
Language:
English
Online Access:
https://doaj.org/article/40e3abbbec02490ea16e783f300d80aa

Additional Information
- Publication Information:
  Elsevier, 2023.
- Publication Date:
  2023
- Collection:
  LCC:Computer applications to medicine. Medical informatics
- Abstract:
  Objectives: To develop an automated international classification of diseases (ICD) coding tool using natural language processing (NLP) and discharge summary texts from Thailand. Materials and methods: The development phase included 15,329 discharge summaries from Ramathibodi Hospital from January 2015 to December 2020. The external validation phase included Medical Information Mart for Intensive Care III (MIMIC-III) data. Three algorithms were developed: naïve Bayes with term frequency-inverse document frequency (NB-TF-IDF), convolutional neural network with neural word embedding (CNN-NWE), and CNN with PubMedBERT (CNN-PubMedBERT). In addition, two state-of-the-art models were also considered; convolutional attention for multi-label classification (CAML) and pretrained language models for automatic ICD coding (PLM-ICD). Results: The CNN-PubMedBERT model provided average micro- and macro-area under precision-recall curve (AUPRC) of 0.6605 and 0.5538, which outperformed CNN-NWE (0.6528 and 0.5564), NB-TF-IDF (0.4441 and 0.3562), and CAML (0.6257 and 0.4964), with corresponding differences of (0.0077 and −0.0026), (0.2164 and 0.1976), and (0.0348 and 0.0574), respectively. However, CNN-PubMedBERT performed less well relative to PLM-ICD, with corresponding AUPRCs of 0.7202 and 0.5865. The CNN-PubMedBERT model was externally validated using two subsets of MIMIC-III; MIMIC-ICD-10, and MIMIC-ICD-9 datasets, which contained 40,923 and 31,196 discharge summaries. The average micro-AUPRCs were 0.3745, 0.6878, and 0.6699, corresponding to directly predictive MIMIC-ICD-10, MIMIC-ICD-10 fine-tuning, and MIMIC-ICD-9 fine-tuning approaches; the average macro-AUPRCs for the corresponding models were 0.2819, 0.4219 and 0.5377, respectively. Discussion: CNN-PubMedBERT performed second-best to PLM-ICD, with considerable variation observed between average micro- and macro-AUPRC, especially for external validation, generally indicating good overall prediction but limited predictive value for small sample sizes. External validation in a US cohort demonstrated a higher level of model prediction performance. Conclusion: Both PLM-ICD and CNN-PubMedBERT models may provide useful tools for automated ICD-10 coding. Nevertheless, further evaluation and validation within Thai and Asian healthcare systems may prove more informative for clinical application.
- File Description:
  electronic resource
- ISSN:
  2352-9148
- Relation:
  http://www.sciencedirect.com/science/article/pii/S2352914823000692; https://doaj.org/toc/2352-9148
- Accession Number:
  10.1016/j.imu.2023.101227
- Accession Number:
  edsdoj.40e3abbbec02490ea16e783f300d80aa

Comments

No Comments.

Development and external validation of automated ICD-10 coding from discharge summaries using deep learning approaches

Contact

Follow us