Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

Data access optimisation at CERN and in the Worldwide LHC Computing Grid (WLCG) ; Optimisation de l'accès aux données au CERN et dans la Grille de calcul mondiale pour le LHC (WLCG)

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • Additional Information
    • Contributors:
      Network Engineering and Operations (NEO); Inria Sophia Antipolis - Méditerranée (CRISAM); Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria); Université Côte d'Azur (UniCA); Université Côte d'Azur; Giovanni Neglia
    • Publication Information:
      CCSD
    • Publication Date:
      2024
    • Collection:
      HAL Université Côte d'Azur
    • Abstract:
      The Worldwide LHC Computing Grid (WLCG) offers an extensive distributed computing infrastructure dedicated to the scientific community involved with CERN's Large Hadron Collider (LHC). With storage that totals roughly an exabyte, the WLCG addresses the data processing and storage requirements of thousands of international scientists. As the High-Luminosity LHC phase approaches, the volume of data to be analysed will increase steeply, outpacing the expected gain through the advancement of storage technology. Therefore, new approaches to effective data access and management, such as caches, become essential. This thesis delves into a comprehensive exploration of storage access within the WLCG, aiming to enhance the aggregate science throughput while limiting the cost. Central to this research is the analysis of real file access logs sourced from the WLCG monitoring system, highlighting genuine usage patterns.In a scientific setting, caching has profound implications. Unlike more commercial applications such as video streaming, scientific data caches deal with varying file sizes—from a mere few bytes to multiple terabytes. Moreover, the inherent logical associations between files considerably influence user access patterns. Traditional caching research has predominantly revolved around uniform file sizes and independent reference models. Contrarily, scientific workloads encounter variances in file sizes, and logical file interconnections significantly influence user access patterns.My investigations show how LHC's hierarchical data organization, particularly its compartmentalization into datasets, impacts request patterns. Recognizing the opportunity, I introduce innovative caching policies that emphasize dataset-specific knowledge, and compare their effectiveness with traditional file-centric strategies. Furthermore, my findings underscore the "delayed hits" phenomenon triggered by limited connectivity between computing and storage locales, shedding light on its potential repercussions for caching ...
    • Relation:
      NNT: 2024COAZ4005
    • Online Access:
      https://theses.hal.science/tel-04620247
      https://theses.hal.science/tel-04620247v1/document
      https://theses.hal.science/tel-04620247v1/file/2024COAZ4005.pdf
    • Rights:
      info:eu-repo/semantics/OpenAccess
    • Accession Number:
      edsbas.923DB237