Improving transformer failure classification on imbalanced DGA data using data-level techniques and machine learning
This study addresses the challenge of imbalanced dissolved gas analysis (DGA) data in transformer failure classification by assessing the impact of data-level balancing techniques on machine learning performance. Five data-level strategies – Random Under-Sampling (RUS), Edited Nearest Neighbors (ENN...
Published in: | Energy Reports |
---|---|
Main Author: | |
Format: | Article |
Language: | English |
Published: |
Elsevier Ltd
2025
|
Online Access: | https://www.scopus.com/inward/record.uri?eid=2-s2.0-85211464729&doi=10.1016%2fj.egyr.2024.12.006&partnerID=40&md5=a2ce560b66c09b702e817d88325bd90e |
id |
2-s2.0-85211464729 |
---|---|
spelling |
2-s2.0-85211464729 Azmi P.A.R.; Yusoff M.; Sallehud-din M.T.M. Improving transformer failure classification on imbalanced DGA data using data-level techniques and machine learning 2025 Energy Reports 13 10.1016/j.egyr.2024.12.006 https://www.scopus.com/inward/record.uri?eid=2-s2.0-85211464729&doi=10.1016%2fj.egyr.2024.12.006&partnerID=40&md5=a2ce560b66c09b702e817d88325bd90e This study addresses the challenge of imbalanced dissolved gas analysis (DGA) data in transformer failure classification by assessing the impact of data-level balancing techniques on machine learning performance. Five data-level strategies – Random Under-Sampling (RUS), Edited Nearest Neighbors (ENN), NearMiss (NM), Random Over-Sampling (ROS), and ADASYN – were applied to balance the dataset and improve classification outcomes. The dataset includes key gas concentrations (H2, CH4, C2H6, C2H4, and C2H2) and a target defect variable (act). Three machine learning algorithms – Support Vector Machine, Decision Tree, and Random Forest – were tested, with results showing that ENN combined with SVM achieved the highest classification performance: 88% accuracy, 89.89% precision, 88.00% recall, 86.64% F1-score, and a runtime of 0.21 s. This approach demonstrates the effectiveness of data-level techniques in improving transformer fault diagnosis, offering a robust path forward for enhancing electrical power system reliability. Future research should refine these techniques and explore their integration with optimized models to enhance the accuracy of the proposed technique. © 2024 The Author(s) Elsevier Ltd 23524847 English Article |
author |
Azmi P.A.R.; Yusoff M.; Sallehud-din M.T.M. |
spellingShingle |
Azmi P.A.R.; Yusoff M.; Sallehud-din M.T.M. Improving transformer failure classification on imbalanced DGA data using data-level techniques and machine learning |
author_facet |
Azmi P.A.R.; Yusoff M.; Sallehud-din M.T.M. |
author_sort |
Azmi P.A.R.; Yusoff M.; Sallehud-din M.T.M. |
title |
Improving transformer failure classification on imbalanced DGA data using data-level techniques and machine learning |
title_short |
Improving transformer failure classification on imbalanced DGA data using data-level techniques and machine learning |
title_full |
Improving transformer failure classification on imbalanced DGA data using data-level techniques and machine learning |
title_fullStr |
Improving transformer failure classification on imbalanced DGA data using data-level techniques and machine learning |
title_full_unstemmed |
Improving transformer failure classification on imbalanced DGA data using data-level techniques and machine learning |
title_sort |
Improving transformer failure classification on imbalanced DGA data using data-level techniques and machine learning |
publishDate |
2025 |
container_title |
Energy Reports |
container_volume |
13 |
container_issue |
|
doi_str_mv |
10.1016/j.egyr.2024.12.006 |
url |
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85211464729&doi=10.1016%2fj.egyr.2024.12.006&partnerID=40&md5=a2ce560b66c09b702e817d88325bd90e |
description |
This study addresses the challenge of imbalanced dissolved gas analysis (DGA) data in transformer failure classification by assessing the impact of data-level balancing techniques on machine learning performance. Five data-level strategies – Random Under-Sampling (RUS), Edited Nearest Neighbors (ENN), NearMiss (NM), Random Over-Sampling (ROS), and ADASYN – were applied to balance the dataset and improve classification outcomes. The dataset includes key gas concentrations (H2, CH4, C2H6, C2H4, and C2H2) and a target defect variable (act). Three machine learning algorithms – Support Vector Machine, Decision Tree, and Random Forest – were tested, with results showing that ENN combined with SVM achieved the highest classification performance: 88% accuracy, 89.89% precision, 88.00% recall, 86.64% F1-score, and a runtime of 0.21 s. This approach demonstrates the effectiveness of data-level techniques in improving transformer fault diagnosis, offering a robust path forward for enhancing electrical power system reliability. Future research should refine these techniques and explore their integration with optimized models to enhance the accuracy of the proposed technique. © 2024 The Author(s) |
publisher |
Elsevier Ltd |
issn |
23524847 |
language |
English |
format |
Article |
accesstype |
|
record_format |
scopus |
collection |
Scopus |
_version_ |
1820775428383571968 |