Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

CaAdam: Improving Adam optimizer using connection aware methods

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • Additional Information
    • Contributors:
      Dauphine Recherches en Management (DRM); Université Paris Dauphine-PSL; Université Paris Sciences et Lettres (PSL)-Université Paris Sciences et Lettres (PSL)-Centre National de la Recherche Scientifique (CNRS); Centre de Recherche en Économie et Statistique (CREST); Ecole Nationale de la Statistique et de l'Analyse de l'Information Bruz (ENSAI)-École polytechnique (X); Institut Polytechnique de Paris (IP Paris)-Institut Polytechnique de Paris (IP Paris)-École Nationale de la Statistique et de l'Administration Économique (ENSAE Paris)-Centre National de la Recherche Scientifique (CNRS)
    • Publication Information:
      CCSD
    • Publication Date:
      2025
    • Collection:
      GENES (Groupe des Écoles Nationales d'Économie et Statistique): HAL
    • Abstract:
      International audience ; We introduce a new method inspired by Adam that enhances convergence speed and achieves better loss function minima. Traditional optimizers, includingAdam, apply uniform or globally adjusted learning rates across neural networks without considering their architectural specifics. This architecture-agnosticapproach is deeply embedded in most deep learning frameworks, where optimizers are implemented as standalone modules without direct access to the network’s structural information. For instance, in popular frameworks like Keras or PyTorch, optimizersoperate solely on gradients and parameters, without knowledge of layer connectivity or network topology. Our algorithm, CaAdam, explores this overlooked area by introducing connection-aware optimization through carefully designed proxies of architectural information. We propose multiple scaling methodologies that dynamically adjust learning rates based on easily accessible structural properties such as layer depth, connection counts, and gradient distributions. This approach enables more granular optimization while working within the constraints of current deep learning frameworks. Empirical evaluations on standard datasets (e.g., CIFAR-10, Fashion MNIST) show that our method consistently achieves faster convergence and higher accuracy compared to standard Adam optimizer, demonstrating the potential benefits of incorporating architectural awareness in optimization strategies.
    • Relation:
      info:eu-repo/semantics/altIdentifier/arxiv/2410.24216; ARXIV: 2410.24216
    • Online Access:
      https://hal.science/hal-04923974
      https://hal.science/hal-04923974v1/document
      https://hal.science/hal-04923974v1/file/2410.24216v1.pdf
    • Rights:
      info:eu-repo/semantics/OpenAccess
    • Accession Number:
      edsbas.78ECAFE4