A class skew-insensitive ACO-based decision tree algorithm for imbalanced data sets

Ant-tree-miner (ATM) has an advantage over the conventional decision tree algorithm in terms of feature selection. However, real world applications commonly involved imbalanced class problem where the classes have different importance. This condition impeded the entropy-based heuristic of existing A...

Full description

Bibliographic Details
Published in:Indonesian Journal of Electrical Engineering and Computer Science
Main Author: Mohd Razali M.H.B.; Saian R.B.; Wah Y.B.; Ku-Mahamud K.R.
Format: Article
Language:English
Published: Institute of Advanced Engineering and Science 2021
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85092611049&doi=10.11591%2fijeecs.v21.i1.pp412-419&partnerID=40&md5=dcbaef833699b93e12d9d5b29673a763
id 2-s2.0-85092611049
spelling 2-s2.0-85092611049
Mohd Razali M.H.B.; Saian R.B.; Wah Y.B.; Ku-Mahamud K.R.
A class skew-insensitive ACO-based decision tree algorithm for imbalanced data sets
2021
Indonesian Journal of Electrical Engineering and Computer Science
21
1
10.11591/ijeecs.v21.i1.pp412-419
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85092611049&doi=10.11591%2fijeecs.v21.i1.pp412-419&partnerID=40&md5=dcbaef833699b93e12d9d5b29673a763
Ant-tree-miner (ATM) has an advantage over the conventional decision tree algorithm in terms of feature selection. However, real world applications commonly involved imbalanced class problem where the classes have different importance. This condition impeded the entropy-based heuristic of existing ATM algorithm to develop effective decision boundaries due to its biasness towards the dominant class. Consequently, the induced decision trees are dominated by the majority class which lack in predictive ability on the rare class. This study proposed an enhanced algorithm called Hellinger-Ant-tree-miner (HATM) which is inspired by ant colony optimization (ACO) metaheuristic for imbalanced learning using decision tree classification algorithm. The proposed algorithm was compared to the existing algorithm, ATM in nine (9) publicly available imbalanced data sets. Simulation study reveals the superiority of HATM when the sample size increases with skewed class (Imbalanced Ratio < 50%). Experimental results demonstrate the performance of the existing algorithm measured by BACC has been improved due to the class skew-insensitiveness of Hellinger Distance. The statistical significance test shows that HATM has higher mean BACC score than ATM. © 2021 Institute of Advanced Engineering and Science. All rights reserved.
Institute of Advanced Engineering and Science
25024752
English
Article
All Open Access; Gold Open Access; Green Open Access
author Mohd Razali M.H.B.; Saian R.B.; Wah Y.B.; Ku-Mahamud K.R.
spellingShingle Mohd Razali M.H.B.; Saian R.B.; Wah Y.B.; Ku-Mahamud K.R.
A class skew-insensitive ACO-based decision tree algorithm for imbalanced data sets
author_facet Mohd Razali M.H.B.; Saian R.B.; Wah Y.B.; Ku-Mahamud K.R.
author_sort Mohd Razali M.H.B.; Saian R.B.; Wah Y.B.; Ku-Mahamud K.R.
title A class skew-insensitive ACO-based decision tree algorithm for imbalanced data sets
title_short A class skew-insensitive ACO-based decision tree algorithm for imbalanced data sets
title_full A class skew-insensitive ACO-based decision tree algorithm for imbalanced data sets
title_fullStr A class skew-insensitive ACO-based decision tree algorithm for imbalanced data sets
title_full_unstemmed A class skew-insensitive ACO-based decision tree algorithm for imbalanced data sets
title_sort A class skew-insensitive ACO-based decision tree algorithm for imbalanced data sets
publishDate 2021
container_title Indonesian Journal of Electrical Engineering and Computer Science
container_volume 21
container_issue 1
doi_str_mv 10.11591/ijeecs.v21.i1.pp412-419
url https://www.scopus.com/inward/record.uri?eid=2-s2.0-85092611049&doi=10.11591%2fijeecs.v21.i1.pp412-419&partnerID=40&md5=dcbaef833699b93e12d9d5b29673a763
description Ant-tree-miner (ATM) has an advantage over the conventional decision tree algorithm in terms of feature selection. However, real world applications commonly involved imbalanced class problem where the classes have different importance. This condition impeded the entropy-based heuristic of existing ATM algorithm to develop effective decision boundaries due to its biasness towards the dominant class. Consequently, the induced decision trees are dominated by the majority class which lack in predictive ability on the rare class. This study proposed an enhanced algorithm called Hellinger-Ant-tree-miner (HATM) which is inspired by ant colony optimization (ACO) metaheuristic for imbalanced learning using decision tree classification algorithm. The proposed algorithm was compared to the existing algorithm, ATM in nine (9) publicly available imbalanced data sets. Simulation study reveals the superiority of HATM when the sample size increases with skewed class (Imbalanced Ratio < 50%). Experimental results demonstrate the performance of the existing algorithm measured by BACC has been improved due to the class skew-insensitiveness of Hellinger Distance. The statistical significance test shows that HATM has higher mean BACC score than ATM. © 2021 Institute of Advanced Engineering and Science. All rights reserved.
publisher Institute of Advanced Engineering and Science
issn 25024752
language English
format Article
accesstype All Open Access; Gold Open Access; Green Open Access
record_format scopus
collection Scopus
_version_ 1809677598277500928