Item request has been placed! ×
Item request cannot be made. ×
loading  Processing Request

Towards model-agnostic active learning in regression via identification of problem-intrinsic properties ; Hin zu Modell-agnostischem aktiven Lernen in der Regression durch Identifikation von Problem-intrinsischen Eigenschaften

Item request has been placed! ×
Item request cannot be made. ×
loading   Processing Request
  • Additional Information
    • Contributors:
      Müller, Klaus-Robert; Technische Universität Berlin; Blanchard, Gilles; Sugiyama, Masashi
    • Publication Date:
      2023
    • Collection:
      TU Berlin: Deposit Once
    • Abstract:
      The recent advances of machine learning methods in scientific domains such as chemistry and physics brought up data intensive inference problems, where the acquisition of labeled training data is expensive as these labels typically stem from computationally involved numeric simulations or even laboratory experiments. For example, solving the Schrödinger equation for a studied molecular system at a high level of accuracy to obtain a single label requires hours to days of computation time. The data from such real-world applications often exhibits inhomogeneities that are not addressed by standard machine learning models and naive training data selection techniques. While such models will eventually perform reasonably accurate at large enough training sizes, it is possible to achieve the same level of performance at a considerably smaller training size when adjusting to the true structural properties of the learning problem in both regards, the design of the model and the construction of the training dataset. For some complex learning problems, these sample savings are essential to render the application of machine learning possible in first place. The process of guiding the construction of the training dataset is known as active learning. Whenever we are confronted with a new learning problem where domain knowledge is scarce, active learning must be conducted in a robust way: On the one hand, the data selection criterion should work under mild regularity assumptions on the problem, since we else risk to encounter a training data quality that is poorer than the quality of a naive construction under violation of the assumption of the active learning approach. On the other hand, the acquired training dataset should remain meaningful under model change in hindsight, since the state-of-the-art for new learning problems is evolving rapidly. The existing literature on robust active learning approaches for regression is centered around uninformed selection criteria, which means that they ignore label information even ...
    • File Description:
      application/pdf
    • Accession Number:
      10.14279/depositonce-17738
    • Online Access:
      https://depositonce.tu-berlin.de/handle/11303/18938
      https://doi.org/10.14279/depositonce-17738
    • Rights:
      https://creativecommons.org/licenses/by-nc-sa/4.0/
    • Accession Number:
      edsbas.2D53A231