Scholarly Article

Predictive Modeling of Iron Concentration in Groundwater Using Machine Learning Techniques: A Case Study in Part of Yenagoa, Bayelsa State

Akajiaku, Charles U., Agbabiaka, Comfort Oyindamola, Imoni, Okes, Chukwuemeka, Prince, Eteh, Desmond R., Amos, Meremu Dogiye

2025-09-09 · Journal of Computational Systems and Applications · Cultech Publishing Sdn. Bhd.

Download PDF

Abstract

This study aimed to model and predict iron concentrations in groundwater within Yenagoa, Bayelsa State, Nigeria, using machine learning techniques. It focused on evaluating spatial variability and determining the most influential predictors to support groundwater quality management. A total of 50 groundwater samples were collected from spatially distributed boreholes across multiple towns in Yenagoa. Geolocation data and iron concentrations were recorded. Two supervised machine learning models Multiple Linear Regression (MLR) and Random Forest Regression (RFR) were implemented. One-hot encoding was applied to categorical town data, and models were evaluated using R², MAE, and Root Mean Square Error (RMSE) metrics. Feature importance was assessed to identify key predictors. A geospatial heatmap was developed using Inverse Distance Weighting (IDW) to visualize spatial trends. The MLR model slightly outperformed the RFR, achieving an Coefficient of Determination (R²) of 0.92, Mean Absolute Error (MAE) of 0.13 mg/L, and RMSE of 0.15 mg/L. Longitude and specific towns (notably Beta and Opolo) emerged as dominant predictors, confirming spatial clustering of high iron concentrations in the eastern region of the study area. Cross-validation confirmed the models' robustness. The findings support the use of machine learning (ML) techniques for cost-effective water quality prediction and spatial monitoring. This study introduces a hybrid geo-categorical modeling approach, integrating both spatial coordinates and administrative town identifiers into ML frameworks. It demonstrates the feasibility of lightweight, interpretable models like MLR for real-time deployment in low-resource settings, offering a replicable solution for groundwater quality assessment in data-scarce regions. Future research should expand datasets and explore additional hydrogeological variables to enhance model robustness.

Keywords

Iron concentration, Groundwater, Machine learning, Spatial analysis, Regression

Citation Details

Journal of Computational Systems and Applications, pp. 1-16