Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

Large-scale cryptic proteome mining revealed potential phage-mediated host-pathogen genetic exchange in Mycobacterium tuberculosis.

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • Additional Information
    • Source:
      Publisher: Public Library of Science Country of Publication: United States NLM ID: 101285081 Publication Model: eCollection Cited Medium: Internet ISSN: 1932-6203 (Electronic) Linking ISSN: 19326203 NLM ISO Abbreviation: PLoS One Subsets: MEDLINE
    • Publication Information:
      Original Publication: San Francisco, CA : Public Library of Science
    • Subject Terms:
    • Abstract:
      Background: Due to inevitable evolution, clinical strains of Mycobacterium tuberculosis (Mtb) exhibit distinct phenotypes and differ significantly from laboratory strains. This divergence is driven by the acquisition of diverse mutations, intragenomic recombination, and potentially phage-mediated genetic exchange. Further investigation is therefore required to better understand these differences, especially regarding the emergence of novel open reading frames (ORFs).
      Methodology: A large-scale whole-genome sequencing (WGS) dataset from tuberculosis (TB)-endemic countries was assembled into contigs. These contigs were then used to mine the cryptic proteome through in silico predictions, using an emerging state-of-the-art deep learning protein language model, ProtBERT. Structures of emerging ORF proteins were predicted using colabfold2. In addition, Ramachandran plot analysis and molecular dynamics simulation (MDS) were performed to assess structural validity and stability.
      Results: Most small cryptic proteins were derived from PE, PPE, and PE-PGRS family genes. Notably, a protein cluster consisting sequences of 101-300 amino acids showed no similarity with the proteins from the reference Mtb H37Rv strain but contained phage-derived domains and sequences homologous to host (primates) DNA. Sub-stratification of this cluster revealed the presence of domains like reverse transcriptases and other phage-associated proteins. BLASTp hits of these proteins showed similarity between these proteins and the host proteome. Structural validation showed that the Phi/Psi angles of modelled proteins were within the accepted ranges. Moreover, MDS of medoids of subclusters displayed stable root mean square deviation (RMSD) and radius of gyration (RGYR) profiles, supporting the structural plausibility of these proteins. Importantly, one sub-cluster showed a higher presence in drug-resistant Mtb strains. The co-occurrence of phage-related domains and host DNA strongly suggests illegitimate phage-mediated lateral transfer of host nucleic acids into the genome of Mtb.
      Conclusion: The majority of the small ORFs are found to be nested within annotated genes. Particular emphasis has been given to the incidental finding of phage-host chimeric ORF signatures within the Mtb genome. This study provides computational evidence supporting the structural stability of these proteins. Thus, it can be speculated that such proteins may contribute to pathogenicity, survival within the host, or molecular mimicry mechanisms.
      (Copyright: © 2026 Bhalla, Angrish. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.)
    • Abstract:
      The authors declare no conflict of interest.
    • Accession Number:
      0 (Proteome)
      0 (Bacterial Proteins)
    • Publication Date:
      Date Created: 20260518 Date Completed: 20260518 Latest Revision: 20260520
    • Publication Date:
      20260520
    • Accession Number:
      PMC13183242
    • Accession Number:
      10.1371/journal.pone.0348602
    • Accession Number:
      42149897