Improving Air Quality Prediction Models for Banting: A Performance Evaluation of Lasso, mRMR, and ReliefF

This study explores the effectiveness of various feature selection methods in forecasting next-day PM2.5 levels in Banting, Malaysia. The accurate prediction of PM2.5 concentrations is crucial for public health, enabling authorities to take timely actions to mitigate exposure to harmful pollutants....

詳細記述

書誌詳細
出版年:INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS
主要な著者: Arafin, Siti Khadijah; Mazumdar, Suvodeep; Ibrahim, Nurain
フォーマット: 論文
言語:English
出版事項: SCIENCE & INFORMATION SAI ORGANIZATION LTD 2025
主題:
オンライン・アクセス:https://www-webofscience-com.uitm.idm.oclc.org/wos/woscc/full-record/WOS:001441763100001
その他の書誌記述
要約:This study explores the effectiveness of various feature selection methods in forecasting next-day PM2.5 levels in Banting, Malaysia. The accurate prediction of PM2.5 concentrations is crucial for public health, enabling authorities to take timely actions to mitigate exposure to harmful pollutants. This study compares three feature selection methods: Lasso, mRMR, and ReliefF using a dataset consisting of 43,824 data points collected from Banting air quality monitoring stations (CA22B). The dataset includes ten variables, including pollutant concentrations such as O3, CO, NO2, SO2, PM10, and PM2.5, along with meteorological parameters such as temperature, humidity, wind direction and wind speed. The results revealed that Lasso outperformed both mRMR and ReliefF in terms of various performance metrics, including accuracy, sensitivity, precision, F1 score, and AUROC. Lasso demonstrated superior ability to handle multicollinearity, significantly improving the interpretability of the model by retaining only the most important variables. This suggests that the effectiveness of feature selection methods is highly dependent on the characteristics of the dataset, such as correlations among features. Thus, the top eight features to predict PM2.5 levels in Banting selected by Lasso method are relative humidity, PM2.5, wind direction, ambient temperature, PM10, NO2, wind speed, and O3. The findings from this study contribute to the growing body of knowledge on air quality prediction models, highlighting the importance of selecting the appropriate feature selection method to achieve the best model performance. Future research should explore the application of Lasso method in other geographical regions, including urban, suburban and rural areas, to assess the generalizability of the results.
ISSN:2158-107X
2156-5570