Improving transformer failure classification on imbalanced DGA data using data-level techniques and machine learning

This study addresses the challenge of imbalanced dissolved gas analysis (DGA) data in transformer failure classification by assessing the impact of data-level balancing techniques on machine learning performance. Five data-level strategies – Random Under-Sampling (RUS), Edited Nearest Neighbors (ENN...

Full description

Bibliographic Details
Published in:Energy Reports
Main Author: Azmi P.A.R.; Yusoff M.; Sallehud-din M.T.M.
Format: Article
Language:English
Published: Elsevier Ltd 2025
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85211464729&doi=10.1016%2fj.egyr.2024.12.006&partnerID=40&md5=a2ce560b66c09b702e817d88325bd90e
id 2-s2.0-85211464729
spelling 2-s2.0-85211464729
Azmi P.A.R.; Yusoff M.; Sallehud-din M.T.M.
Improving transformer failure classification on imbalanced DGA data using data-level techniques and machine learning
2025
Energy Reports
13

10.1016/j.egyr.2024.12.006
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85211464729&doi=10.1016%2fj.egyr.2024.12.006&partnerID=40&md5=a2ce560b66c09b702e817d88325bd90e
This study addresses the challenge of imbalanced dissolved gas analysis (DGA) data in transformer failure classification by assessing the impact of data-level balancing techniques on machine learning performance. Five data-level strategies – Random Under-Sampling (RUS), Edited Nearest Neighbors (ENN), NearMiss (NM), Random Over-Sampling (ROS), and ADASYN – were applied to balance the dataset and improve classification outcomes. The dataset includes key gas concentrations (H2, CH4, C2H6, C2H4, and C2H2) and a target defect variable (act). Three machine learning algorithms – Support Vector Machine, Decision Tree, and Random Forest – were tested, with results showing that ENN combined with SVM achieved the highest classification performance: 88% accuracy, 89.89% precision, 88.00% recall, 86.64% F1-score, and a runtime of 0.21 s. This approach demonstrates the effectiveness of data-level techniques in improving transformer fault diagnosis, offering a robust path forward for enhancing electrical power system reliability. Future research should refine these techniques and explore their integration with optimized models to enhance the accuracy of the proposed technique. © 2024 The Author(s)
Elsevier Ltd
23524847
English
Article

author Azmi P.A.R.; Yusoff M.; Sallehud-din M.T.M.
spellingShingle Azmi P.A.R.; Yusoff M.; Sallehud-din M.T.M.
Improving transformer failure classification on imbalanced DGA data using data-level techniques and machine learning
author_facet Azmi P.A.R.; Yusoff M.; Sallehud-din M.T.M.
author_sort Azmi P.A.R.; Yusoff M.; Sallehud-din M.T.M.
title Improving transformer failure classification on imbalanced DGA data using data-level techniques and machine learning
title_short Improving transformer failure classification on imbalanced DGA data using data-level techniques and machine learning
title_full Improving transformer failure classification on imbalanced DGA data using data-level techniques and machine learning
title_fullStr Improving transformer failure classification on imbalanced DGA data using data-level techniques and machine learning
title_full_unstemmed Improving transformer failure classification on imbalanced DGA data using data-level techniques and machine learning
title_sort Improving transformer failure classification on imbalanced DGA data using data-level techniques and machine learning
publishDate 2025
container_title Energy Reports
container_volume 13
container_issue
doi_str_mv 10.1016/j.egyr.2024.12.006
url https://www.scopus.com/inward/record.uri?eid=2-s2.0-85211464729&doi=10.1016%2fj.egyr.2024.12.006&partnerID=40&md5=a2ce560b66c09b702e817d88325bd90e
description This study addresses the challenge of imbalanced dissolved gas analysis (DGA) data in transformer failure classification by assessing the impact of data-level balancing techniques on machine learning performance. Five data-level strategies – Random Under-Sampling (RUS), Edited Nearest Neighbors (ENN), NearMiss (NM), Random Over-Sampling (ROS), and ADASYN – were applied to balance the dataset and improve classification outcomes. The dataset includes key gas concentrations (H2, CH4, C2H6, C2H4, and C2H2) and a target defect variable (act). Three machine learning algorithms – Support Vector Machine, Decision Tree, and Random Forest – were tested, with results showing that ENN combined with SVM achieved the highest classification performance: 88% accuracy, 89.89% precision, 88.00% recall, 86.64% F1-score, and a runtime of 0.21 s. This approach demonstrates the effectiveness of data-level techniques in improving transformer fault diagnosis, offering a robust path forward for enhancing electrical power system reliability. Future research should refine these techniques and explore their integration with optimized models to enhance the accuracy of the proposed technique. © 2024 The Author(s)
publisher Elsevier Ltd
issn 23524847
language English
format Article
accesstype
record_format scopus
collection Scopus
_version_ 1820775428383571968