Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

Benchmarking datasets for assembly-based variant calling using high-fidelity long reads.

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • Additional Information
    • Source:
      Publisher: BioMed Central Country of Publication: England NLM ID: 100965258 Publication Model: Electronic Cited Medium: Internet ISSN: 1471-2164 (Electronic) Linking ISSN: 14712164 NLM ISO Abbreviation: BMC Genomics Subsets: MEDLINE
    • Publication Information:
      Original Publication: London : BioMed Central, [2000-
    • Subject Terms:
    • Abstract:
      Background: Recent advances in long-read sequencing technologies have enabled accurate identification of all genetic variants in individuals or cells; this procedure is known as variant calling. However, benchmarking studies on variant calling using different long-read sequencing technologies are still lacking.
      Results: We used two Caenorhabditis elegans strains to measure several variant calling metrics. These two strains shared true-positive genetic variants that were introduced during strain generation. In addition, both strains contained common and distinguishable variants induced by DNA damage, possibly leading to false-positive estimation. We obtained accurate and noisy long reads from both strains using high-fidelity (HiFi) and continuous long-read (CLR) sequencing platforms, and compared the variant calling performance of the two platforms. HiFi identified a 1.65-fold higher number of true-positive variants on average, with 60% fewer false-positive variants, than CLR did. We also compared read-based and assembly-based variant calling methods in combination with subsampling of various sequencing depths and demonstrated that variant calling after genome assembly was particularly effective for detection of large insertions, even with 10 × sequencing depth of accurate long-read sequencing data.
      Conclusions: By directly comparing the two long-read sequencing technologies, we demonstrated that variant calling after genome assembly with 10 × or more depth of accurate long-read sequencing data allowed reliable detection of true-positive variants. Considering the high cost of HiFi sequencing, we herein propose appropriate methodologies for performing cost-effective and high-quality variant calling: 10 × assembly-based variant calling. The results of the present study may facilitate the development of methods for identifying all genetic variants at the population level.
      (© 2023. The Author(s).)
    • References:
      Nat Rev Genet. 2020 Oct;21(10):597-614. (PMID: 32504078)
      Nature. 2022 Apr;604(7906):437-446. (PMID: 35444317)
      BMC Genomics. 2015 Apr 11;16:286. (PMID: 25886820)
      Nat Methods. 2011 Dec 18;9(2):176-8. (PMID: 22179552)
      Nat Rev Genet. 2012 Jul 18;13(8):565-75. (PMID: 22805709)
      Nat Rev Genet. 2018 Jun;19(6):329-346. (PMID: 29599501)
      Science. 2000 Mar 24;287(5461):2185-95. (PMID: 10731132)
      Genome Res. 2020 Sep;30(9):1291-1305. (PMID: 32801147)
      Nucleic Acids Res. 2019 Jan 8;47(D1):D807-D811. (PMID: 30395283)
      Am J Hum Genet. 2012 Jan 13;90(1):7-24. (PMID: 22243964)
      Nature. 2000 Dec 14;408(6814):796-815. (PMID: 11130711)
      Gigascience. 2020 Dec 15;9(12):. (PMID: 33319909)
      Nat Commun. 2016 Jun 30;7:12065. (PMID: 27356984)
      Genome Res. 2017 May;27(5):677-685. (PMID: 27895111)
      Nature. 2023 May;617(7960):312-324. (PMID: 37165242)
      Bioinformatics. 2021 Apr 1;36(22-23):5519-5521. (PMID: 33346817)
      Plant J. 2024 Sep 6;:. (PMID: 39239888)
      Genet Med. 2018 Jan;20(1):159-163. (PMID: 28640241)
      Nat Methods. 2018 Jun;15(6):461-468. (PMID: 29713083)
      Genome Res. 2019 Jun;29(6):1023-1035. (PMID: 31123081)
      Bioinformatics. 2021 Dec 7;37(23):4572-4574. (PMID: 34623391)
      Front Plant Sci. 2018 Jul 11;9:995. (PMID: 30050550)
      Genome Biol. 2016 Nov 25;17(1):239. (PMID: 27887629)
      Genome Biol. 2020 Feb 7;21(1):30. (PMID: 32033565)
      Nat Rev Genet. 2002 Aug;3(8):611-21. (PMID: 12154384)
      Bioinformatics. 2019 Sep 1;35(17):2907-2915. (PMID: 30668829)
      Bioinformatics. 2015 Oct 1;31(19):3210-2. (PMID: 26059717)
      Gigascience. 2021 Feb 16;10(2):. (PMID: 33590861)
      Ann Hum Genet. 2020 Mar;84(2):125-140. (PMID: 31711268)
      Nat Biotechnol. 2019 Oct;37(10):1155-1162. (PMID: 31406327)
      Genome Res. 2017 May;27(5):722-736. (PMID: 28298431)
      Nature. 2016 Oct 13;538(7624):243-247. (PMID: 27706134)
      Nucleic Acids Res. 2004 Jan 1;32(Database issue):D411-7. (PMID: 14681445)
      BMC Bioinformatics. 2021 Nov 12;22(1):552. (PMID: 34772337)
      Genome Biol. 2019 Nov 20;20(1):246. (PMID: 31747936)
      Nat Methods. 2021 Feb;18(2):170-175. (PMID: 33526886)
      Front Bioeng Biotechnol. 2015 Jun 25;3:92. (PMID: 26161383)
      Mol Autism. 2012 Apr 02;3:2. (PMID: 22472195)
      Nat Rev Genet. 2020 Mar;21(3):171-189. (PMID: 31729472)
      WormBook. 2005 Jun 25;:1-7. (PMID: 18023116)
      Brief Bioinform. 2022 Sep 20;23(5):. (PMID: 35580841)
      Science. 1998 Dec 11;282(5396):2012-8. (PMID: 9851916)
      Hum Mol Genet. 2015 Oct 15;24(R1):R102-10. (PMID: 26152199)
      Hum Mol Genet. 2018 Aug 1;27(R2):R234-R241. (PMID: 29767702)
      Hum Immunol. 2021 Nov;82(11):801-811. (PMID: 33745759)
      Bioinformatics. 2010 Mar 15;26(6):841-2. (PMID: 20110278)
      Nucleic Acids Res. 2021 Apr 6;49(6):3338-3353. (PMID: 33693840)
      Nat Rev Genet. 2013 Feb;14(2):125-38. (PMID: 23329113)
    • Grant Information:
      SSTF-BA1501-52 Samsung Science and Technology Foundation; 2019R1A6A1A10073437 National Research Foundation of Korea
    • Contributed Indexing:
      Keywords: Benchmark; Genetic variant; High-fidelity long reads; Long-read sequencing; Variant calling
    • Publication Date:
      Date Created: 20230327 Date Completed: 20230329 Latest Revision: 20240915
    • Publication Date:
      20260130
    • Accession Number:
      PMC10045170
    • Accession Number:
      10.1186/s12864-023-09255-y
    • Accession Number:
      36973656