Physics-Informed Data Augmentation for Sediment Concentration Prediction in Data-Scarce Himalayan Rivers
DOI:
https://doi.org/10.3126/injet-indev.v2i2.95706Keywords:
Sediment Transport Prediction, Data Augmentation, Physics Informed Modeling, Himalayan Hydrology, Hydropower Engineering, Rating CurvesAbstract
The correct prediction of sediment concentration has prime significance in the development of sustainable hydropower in the Himalayan region because of the large sediment loads being hazardous to the turbines as well as the reservoirs' longevity. This study tackles the challenges of the scarcity of data and the complexity of physics through the organized comparisons of the conventional statistical and advanced physics-based data augmentation techniques which can be used again for the sediment prediction driven by the ML approach in the Himalayan region characterized by the scarcity of data. This study pursued two interrelated objectives: first, organized comparisons of conventional and advanced data augmentation schemes; and second, the development of new physics-informed schemes for sediment transport to create a reproducible analytics framework suitable for data-scarce Himalayan watersheds. In this study, the effectiveness of ten data augmentation techniques: five classical statistical ones (forward-backward fill, linear interpolation, seasonal mean approach, simple rating curve models, and ensemble averaging) and five advanced models founded on physics (seasonal stochastic rating curve models, k-nearest neighbor discharge analogs, STL decomposition models, physics-based constraints models, and weighted ensemble) was investigated using the same number of observed monthly sediment data points. Conservative pre-processing of the data resulted in the preservation of about 99.8% of the data points via the consensus approach of three methods of outlier detection. The advanced models based on physics were greatly superior to the classical statistical models for the augmentation of sediment concentrations regarding the enhancement of value performance metric (by 5.5%), the Root Mean Squared Error (RMSE—by 24.3%), and the Mean Absolute Error (by 47.6%) all tested through rigorous 5-fold cross-validation. This study makes various contributions. Firstly, this study can be classified as research in the field of hydrology due to its subject of addressing various data scarcity challenges in this discipline. Additionally, this study lays the groundwork for further machine learning analysis by making sure that missing sediment data can be imputed effectively with minimal imputation errors.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 International Journal on Engineering Technology and Infrastructure Development

This work is licensed under a Creative Commons Attribution 4.0 International License.
This license enables reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.