Speaker
Description
Surfactant-enhanced remediation (SER) is an effective method for removing petroleum hydrocarbons from contaminated soils by increasing solubilization and desorption. However, SER efficiency is governed by complex, nonlinear interactions between soil properties, contaminants, and surfactants that are not fully captured by conventional empirical or mechanistic models. This complexity necessitates the development of advanced modeling approaches to improve remediation outcomes and reduce the reliance on expensive trial-and-error experimental methods. This study evaluated the performance of three regression algorithms, light gradient boosting machine (LGBM), extra-trees regression (ETR), and k-nearest neighbors (KNN), to predict oil removal efficiency based on various operational and environmental parameters.
The study utilized a comprehensive database initially containing 2394 experimental records collected from approximately 50 SER studies. A rigorous preprocessing stage was implemented to improve data quality, involving the removal of 503 outliers (representing 21% of the raw data) to result in a cleaned dataset of 1891 records. Preprocessing steps included screening for multicollinearity using a Spearman correlation heatmap, scaling inputs, and excluding redundant feature sets or those with negligible predictive value, such as asphaltene fraction and sand content. The final feature set included variables such as surfactant concentration, hydrophilic-lipophilic balance (HLB), molecular weight, critical micelle concentration (CMC), silt and clay content, cation exchange capacity (CEC), soil pH, organic matter, agitation speed, washing time, temperature, and liquid-to-soil ratio. The cleaned database was split into 80% for training and 20% for testing, with GridSearchCV employed for hyperparameter tuning.
All three algorithms demonstrated strong predictive capabilities, though the ensemble methods showed superior stability. While KNN predictions displayed a greater degree of scatter in cross plots, ETR and LGBM predictions aligned closely with a 1:1 line. The Extra-Trees Regression (ETR) model emerged as the best-performing algorithm, outperforming both LGBM and KNN. For the entire dataset, the ETR model achieved best-reported performance metrics of R² = 0.984, RMSE = 2.658, and MAE = 1.257.
These results highlight the practical value of data-driven modeling for optimizing surfactant-enhanced remediation. By accurately predicting removal efficiency, these models can identify optimal surfactant types and operational parameters, thereby encouraging cost-effective and sustainable remediation strategies. The application of such machine learning tools significantly reduces the need for extensive trial-and-error experiments in the field, facilitating more efficient cleanup of contaminated soil sites.
| Country | Kazakhstan |
|---|---|
| Acceptance of the Terms & Conditions | Click here to agree |








