Improving clustering-based and adaptive position-aware interpolation oversampling for imbalanced data classification

Class imbalance is one of the most significant difficulties in modern machine learning. This is because of the inherent bias of standard classifiers toward favoring majority instances while often ignoring minority instances. Interpolation-based oversampling techniques are among the most popular solu...

Full description

Bibliographic Details
Published in:JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES
Main Authors: Wang, Yujiang; Rosli, Marshima Mohd; Musa, Norzilah; Wang, Lei
Format: Article
Language:English
Published: SPRINGERNATURE 2024
Subjects:
Online Access:https://www-webofscience-com.uitm.idm.oclc.org/wos/woscc/full-record/WOS:001410486900001
author Wang
Yujiang; Rosli
Marshima Mohd; Musa
Norzilah; Wang
Lei
spellingShingle Wang
Yujiang; Rosli
Marshima Mohd; Musa
Norzilah; Wang
Lei
Improving clustering-based and adaptive position-aware interpolation oversampling for imbalanced data classification
Computer Science
author_facet Wang
Yujiang; Rosli
Marshima Mohd; Musa
Norzilah; Wang
Lei
author_sort Wang
spelling Wang, Yujiang; Rosli, Marshima Mohd; Musa, Norzilah; Wang, Lei
Improving clustering-based and adaptive position-aware interpolation oversampling for imbalanced data classification
JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES
English
Article
Class imbalance is one of the most significant difficulties in modern machine learning. This is because of the inherent bias of standard classifiers toward favoring majority instances while often ignoring minority instances. Interpolation-based oversampling techniques are among the most popular solutions for generating synthetic minority samples to correct imbalanced class distributions. However, synthetic minority samples have a risk of overlapping with the majority-class samples. Inappropriate interpolation of minority samples during oversampling can also result in over generalization. To overcome these drawbacks, we propose a Clustering- based and Adaptive Position-aware Interpolation Oversampling algorithm (CAPAIO) for imbalanced binary dataset classification. CAPAIO initially employs an improved density-based clustering algorithm to group minority instances into inland, borderline, and trapped samples. It then adaptively determines the size of each subcluster and allocates weights to minority samples, guiding the synthesis of minority samples based on these weights. Finally, distinct interpolation oversampling algorithms are individually performed on these three categories of minority samples. The experimental results demonstrate the effectiveness of the proposed CAPAIO inmost datasets compared with eleven other oversampling algorithms.
SPRINGERNATURE
1319-1578
2213-1248
2024
36
10
10.1016/j.jksuci.2024.102253
Computer Science
gold
WOS:001410486900001
https://www-webofscience-com.uitm.idm.oclc.org/wos/woscc/full-record/WOS:001410486900001
title Improving clustering-based and adaptive position-aware interpolation oversampling for imbalanced data classification
title_short Improving clustering-based and adaptive position-aware interpolation oversampling for imbalanced data classification
title_full Improving clustering-based and adaptive position-aware interpolation oversampling for imbalanced data classification
title_fullStr Improving clustering-based and adaptive position-aware interpolation oversampling for imbalanced data classification
title_full_unstemmed Improving clustering-based and adaptive position-aware interpolation oversampling for imbalanced data classification
title_sort Improving clustering-based and adaptive position-aware interpolation oversampling for imbalanced data classification
container_title JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES
language English
format Article
description Class imbalance is one of the most significant difficulties in modern machine learning. This is because of the inherent bias of standard classifiers toward favoring majority instances while often ignoring minority instances. Interpolation-based oversampling techniques are among the most popular solutions for generating synthetic minority samples to correct imbalanced class distributions. However, synthetic minority samples have a risk of overlapping with the majority-class samples. Inappropriate interpolation of minority samples during oversampling can also result in over generalization. To overcome these drawbacks, we propose a Clustering- based and Adaptive Position-aware Interpolation Oversampling algorithm (CAPAIO) for imbalanced binary dataset classification. CAPAIO initially employs an improved density-based clustering algorithm to group minority instances into inland, borderline, and trapped samples. It then adaptively determines the size of each subcluster and allocates weights to minority samples, guiding the synthesis of minority samples based on these weights. Finally, distinct interpolation oversampling algorithms are individually performed on these three categories of minority samples. The experimental results demonstrate the effectiveness of the proposed CAPAIO inmost datasets compared with eleven other oversampling algorithms.
publisher SPRINGERNATURE
issn 1319-1578
2213-1248
publishDate 2024
container_volume 36
container_issue 10
doi_str_mv 10.1016/j.jksuci.2024.102253
topic Computer Science
topic_facet Computer Science
accesstype gold
id WOS:001410486900001
url https://www-webofscience-com.uitm.idm.oclc.org/wos/woscc/full-record/WOS:001410486900001
record_format wos
collection Web of Science (WoS)
_version_ 1825722598675185664