Item request has been placed!
×
Item request cannot be made.
×

Processing Request
Overcoming Set Imbalance in Data‐Driven Parameterization: A Case Study of Gravity Wave Momentum Transport.
Item request has been placed!
×
Item request cannot be made.
×

Processing Request
- Author(s): Yang, L. Minah1 (AUTHOR); Gerber, Edwin P.1 (AUTHOR)
- Source:
Journal of Advances in Modeling Earth Systems. Feb2026, Vol. 18 Issue 2, p1-23. 23p.
- Subject Terms:
- Additional Information
- Abstract:
Machine learning for the parameterization of subgrid‐scale processes in climate models has been widely researched and adopted in a few models. A key challenge in developing data‐driven parameterization schemes is how to properly represent rare, but important events that occur in geoscience data sets. We investigate and develop strategies to reduce errors caused by insufficient sampling in the rare data regime, under constraints of no new data and no further expansion of model complexity. Resampling and importance weighting strategies are constructed with user defined parameters that systematically vary the sampling/weighting rates in a linear fashion and curb too much oversampling. Applying this new method to a case study of gravity wave momentum transport reveals that the resampling strategy can successfully improve errors in the rare regime at little to no loss in accuracy overall in the data set. The success of the strategy, however, depends on the complexity of the model. More complex models can overfit the tails of the distribution when using non‐optimal parameters of the resampling strategy. Plain Language Summary: Subgrid‐scale parameterizations are a part of climate models that represent effects of processes that cannot be directly modeled. In recent years, there have been many efforts to improve upon these parameterizations by applying machine learning (ML) techniques. Since these methods rely heavily on the data set they are learning from, it is important to consider the frequency at which important events occur within the data set because they are adept at learning frequent events at high accuracy but are prone to learning rare but important events at low accuracy. To remedy this data imbalance problem, we developed a resampling methodology that can be easily adjusted by tuning just two parameters. We find that a right combination of those parameters can improve the accuracy of an ML model at the rare event regime while keeping the accuracy high in the frequent regime. However, a "wrong" combination can actually increase the errors at the rare event regime by overfitting to that regime. Key Points: Unresolved geophysical processes often exhibit long‐tail distributions, which leads to imbalanced data sets for data‐driven parameterizationsTwo strategies to overcome data imbalance are presented, where either the sampling or loss function is modified to better capture the tailsProof of concept is demonstrated by using a wind range metric to improve a machine learning emulator of a physics‐based gravity wave parameterization [ABSTRACT FROM AUTHOR]
- Abstract:
Copyright of Journal of Advances in Modeling Earth Systems is the property of Wiley-Blackwell and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
No Comments.