Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

Benchmarking ensemble machine learning algorithms for multi-class, multi-omics data integration in clinical outcome prediction.

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • Additional Information
    • Source:
      Publisher: Oxford University Press Country of Publication: England NLM ID: 100912837 Publication Model: Print Cited Medium: Internet ISSN: 1477-4054 (Electronic) Linking ISSN: 14675463 NLM ISO Abbreviation: Brief Bioinform Subsets: MEDLINE
    • Publication Information:
      Publication: Oxford : Oxford University Press
      Original Publication: London ; Birmingham, AL : H. Stewart Publications, [2000-
    • Subject Terms:
    • Abstract:
      The complementary information found in different modalities of patient data can aid in more accurate modelling of a patient's disease state and a better understanding of the underlying biological processes of a disease. However, the analysis of multi-modal, multi-omics data presents many challenges. In this work, we compare the performance of a variety of ensemble machine learning (ML) algorithms that are capable of late integration of multi-class data from different modalities. The ensemble methods and their variations tested were (i) a voting ensemble, with hard and soft vote, (ii) a meta learner, and (iii) a multi-modal AdaBoost model using hard vote, soft vote, and meta learner to integrate the modalities on each boosting round, the PB-MVBoost model and a novel application of a mixture of expert's model. These were compared to simple concatenation. We examine these methods using data from an in-house study on hepatocellular carcinoma, plus validation datasets on studies from breast cancer and irritable bowel disease. We develop models that achieve an area under the receiver operating curve of up to 0.85 and find that two boosted methods, PB-MVBoost and AdaBoost with soft vote were the best performing models. We also examine the stability of features selected and the size of the clinical signature. Our work shows that integrating complementary omics and data modalities with effective ensemble ML models enhances accuracy in multi-class clinical outcome predictions and produces more stable predictive features than individual modalities or simple concatenation. We provide recommendations for the integration of multi-modal multi-class data.
      (© The Author(s) 2025. Published by Oxford University Press.)
    • References:
      PLoS Comput Biol. 2023 Jul 6;19(7):e1011224. (PMID: 37410704)
      Front Mol Biosci. 2022 Oct 11;9:962743. (PMID: 36304921)
      J Pers Med. 2022 Apr 08;12(4):. (PMID: 35455716)
      Trends Genet. 2023 Jan;39(1):46-58. (PMID: 36137835)
      Front Genet. 2020 Dec 10;11:610798. (PMID: 33362867)
      PLoS One. 2022 Feb 23;17(2):e0263248. (PMID: 35196350)
      BMC Genomics. 2021 Mar 24;22(1):214. (PMID: 33761889)
      Comput Struct Biotechnol J. 2022 Nov 08;20:6149-6162. (PMID: 36420153)
      Nature. 2022 Jan;601(7894):623-629. (PMID: 34875674)
      Comput Struct Biotechnol J. 2021 Jun 22;19:3735-3746. (PMID: 34285775)
      Bioinformatics. 2022 Oct 31;38(21):4908-4918. (PMID: 36106996)
      Semin Perinatol. 2021 Oct;45(6):151456. (PMID: 34256961)
      Brief Funct Genomics. 2024 Sep 27;23(5):549-560. (PMID: 38600757)
      IEEE Trans Pattern Anal Mach Intell. 2010 Nov;32(11):1921-39. (PMID: 20847385)
      Cell Rep Methods. 2021 Sep 15;1(5):100071. (PMID: 35474667)
      Nat Microbiol. 2019 Feb;4(2):293-305. (PMID: 30531976)
      Nat Commun. 2021 Jan 8;12(1):187. (PMID: 33420074)
      Comput Biol Med. 2023 Jan;152:106373. (PMID: 36462367)
      Nat Med. 2023 Mar;29(3):700-709. (PMID: 36823301)
      NPJ Digit Med. 2022 Nov 7;5(1):171. (PMID: 36344814)
      Bioinformatics. 2019 Jul 15;35(14):i501-i509. (PMID: 31510700)
      IEEE Trans Neural Netw Learn Syst. 2012 Aug;23(8):1177-93. (PMID: 24807516)
      Cell. 2020 Nov 25;183(5):1436-1456.e31. (PMID: 33212010)
    • Grant Information:
      2008996 Medical Research Future Fund
    • Contributed Indexing:
      Keywords: cancer; clinical outcome prediction; hepatocellular carcinoma; late integration; machine learning; multi-class; multi-modal; multi-omics
    • Publication Date:
      Date Created: 20250321 Date Completed: 20250514 Latest Revision: 20250514
    • Publication Date:
      20260130
    • Accession Number:
      PMC11926982
    • Accession Number:
      10.1093/bib/bbaf116
    • Accession Number:
      40116658