19–22 May 2026
Europe/Paris timezone

Data-Driven Prediction of Oil Removal Efficiency in Surfactant-Enhanced Remediation

20 May 2026, 10:05
1h 30m
Poster Presentation (MS02) Environmental Porous Media: Water, Agriculture, and Remediation Poster

Speaker

Ehsan Hajibolouri (Department of Mechanics, Al-Farabi Kazakh National University, 050040, Almaty, Kazakhstan)

Description

Surfactant-enhanced remediation (SER) is an effective method for removing petroleum hydrocarbons from contaminated soils by increasing solubilization and desorption. However, SER efficiency is governed by complex, nonlinear interactions between soil properties, contaminants, and surfactants that are not fully captured by conventional empirical or mechanistic models. This complexity necessitates the development of advanced modeling approaches to improve remediation outcomes and reduce the reliance on expensive trial-and-error experimental methods. This study evaluated the performance of three regression algorithms, light gradient boosting machine (LGBM), extra-trees regression (ETR), and k-nearest neighbors (KNN), to predict oil removal efficiency based on various operational and environmental parameters.
The study utilized a comprehensive database initially containing 2394 experimental records collected from approximately 50 SER studies. A rigorous preprocessing stage was implemented to improve data quality, involving the removal of 503 outliers (representing 21% of the raw data) to result in a cleaned dataset of 1891 records. Preprocessing steps included screening for multicollinearity using a Spearman correlation heatmap, scaling inputs, and excluding redundant feature sets or those with negligible predictive value, such as asphaltene fraction and sand content. The final feature set included variables such as surfactant concentration, hydrophilic-lipophilic balance (HLB), molecular weight, critical micelle concentration (CMC), silt and clay content, cation exchange capacity (CEC), soil pH, organic matter, agitation speed, washing time, temperature, and liquid-to-soil ratio. The cleaned database was split into 80% for training and 20% for testing, with GridSearchCV employed for hyperparameter tuning.
All three algorithms demonstrated strong predictive capabilities, though the ensemble methods showed superior stability. While KNN predictions displayed a greater degree of scatter in cross plots, ETR and LGBM predictions aligned closely with a 1:1 line. The Extra-Trees Regression (ETR) model emerged as the best-performing algorithm, outperforming both LGBM and KNN. For the entire dataset, the ETR model achieved best-reported performance metrics of R² = 0.984, RMSE = 2.658, and MAE = 1.257.
These results highlight the practical value of data-driven modeling for optimizing surfactant-enhanced remediation. By accurately predicting removal efficiency, these models can identify optimal surfactant types and operational parameters, thereby encouraging cost-effective and sustainable remediation strategies. The application of such machine learning tools significantly reduces the need for extensive trial-and-error experiments in the field, facilitating more efficient cleanup of contaminated soil sites.

Country Kazakhstan
Acceptance of the Terms & Conditions Click here to agree

Authors

Ehsan Hajibolouri (Department of Mechanics, Al-Farabi Kazakh National University, 050040, Almaty, Kazakhstan) Bakbergen Bekbau (Department of Mechanics, Al-Farabi Kazakh National University, 050040, Almaty, Kazakhstan) Sagyn Omirbekov (National Laboratory Astana, Nazarbayev University, 010000, Astana, Kazakhstan) Dinara Turalina (Department of Mechanics, Al-Farabi Kazakh National University, 050040, Almaty, Kazakhstan) Masoud Riazi (Department of Petroleum Engineering, Nazarbayev University, 010000, Astana, Kazakhstan)

Presentation materials

There are no materials yet.