Cost-sensitive feature selection for multi-label classification: multi-criteria decision-making approach

Mohanrasu, S.S.; Phan, Le Thi; Rajan, Rakkiyappan; Manavalan, Balachandran

doi:10.1108/ACI-09-2024-0353

Purpose

In multi-label classification, selecting the most relevant features is crucial for enhancing predictive performance and reducing computational complexity. Real-world scenarios often involve significant costs in data acquisition, including time, financial and computational resources. However, most existing feature selection methods overlook the associated costs.

Design/methodology/approach

Multicriteria decision-making (MCDM) has emerged as a powerful tool for addressing complex problems involving multiple, often conflicting criteria. This study proposes a novel cost-sensitive multi-label feature selection method that fuses feature importance with feature cost within an MCDM framework. The proposed method transforms a cost-sensitive multi-label feature selection problem into an MCDM problem by leveraging mutual information. Furthermore, the data were converted into Fermatean fuzzy sets, and the Fermatean fuzzy simple weighted sum product (WISP) method was employed to rank features based on their relevance to labels and associated costs.

Findings

Extensive experiments conducted on ten benchmark datasets against five evaluation metrics demonstrated the superiority of the proposed method in selecting relevant features while minimizing costs and consistently outperforming existing methods.

Originality/value

Unlike existing methods that integrate costs through penalties and select features via a greedy search, the proposed approach adopts an MCDM-based strategy for feature ranking. This method aims to achieve globally optimal outcomes by balancing the trade-offs between conflicting objectives, marking a significant advancement over existing techniques.

Graphicalabstract

1. Introduction

In the big data and artificial intelligence (AI) era, feature selection has become an essential step toward reducing computational complexity, enhancing model performance, and improving interpretability. Although traditional machine-learning algorithms typically assume a single label for each instance, real-world scenarios are often more complex, with a single instance potentially associated with multiple labels. Consequently, multi-label feature selection has become a prominent research topic, and numerous methods have been developed to address these challenges [1]. Feature-selection methods are generally classified into three categories. Filter methods assess feature quality based on predefined criteria or statistical measures. In contrast, wrapper methods make use of the learning algorithm. Embedded methods, on the other hand, integrate feature selection into the learning process. Filter-based methods are particularly preferred for high-dimensional datasets, owing to their simplicity and computational efficiency [1]. In particular, information-theory-based measures have been extensively utilized in feature selection because of their ability to elucidate the relationships between features and labels.

Many existing feature selection methods often assume that the data are readily available and free of charge. Obtaining data often incurs significant time, money, and computational resource costs [2]. For instance, a predictive model for disease risk may rely on features, such as blood test results, imaging scans, and genetic information. Although genetic tests offer valuable insights, they are typically expensive and not routinely performed. By integrating cost considerations into feature selection, healthcare providers can prioritize affordable and frequently available tests, such as blood tests, while balancing predictive accuracy with budget constraints. The goal is to optimize model performance while minimizing data acquisition costs [3]. Although high-cost tests may be irreplaceable in high-risk applications, low-cost tests can serve as valuable preliminary screenings in resource-constrained environments. This approach ensures that the models are practical and effective, thereby enhancing patient care while managing healthcare expenses. In general, two main costs are associated with learning from data: test costs and misclassification costs. Test costs refer to the expenses incurred in obtaining data, whereas misclassification costs represent the costs of incorrect predictions made by the model. This study explicitly emphasizes the feature cost, a subset of test costs that accounts for expenses related to obtaining specific features of the data. The primary challenge in cost-sensitive feature selection is achieving an effective balance between the feature cost and predictive accuracy. Selecting low-cost features can reduce the overall costs but may compromise the predictive performance of the model. In a multi-label scenario, this is further complicated by the need to simultaneously maximize feature relevance for multiple labels while minimizing feature costs. This creates two conflicting objectives: optimizing for one often detracts from the other.

However, MCDM has become a leading approach for tackling complex problems that involve multiple, often conflicting criteria in a structured way. These methods empower decision makers to consider a range of factors and derive comprehensive solutions. Classical MCDM methods like elimination and choice expressing reality (ELECTRE), technique for order of preference by similarity to ideal solution (TOPSIS), complex proportional assessment (COPRAS), Viekriterijumsko Kompromisno Rangiranje (VIKOR), weighted aggregated sum product assessment (WASPAS), and multiplicative multi-objective optimization by ratio analysis with full multiplicative form (MULTIMOORA) have been extensively used to address these challenges. However, to overcome the limitations of traditional approaches, researchers have introduced new methods, such as evaluation based on distance from the average solution (EDAS), combinative distance-based assessment (CODAS), combined compromise solution (CoCoSo), and WISP [4]. In addition, to better manage the inherent uncertainty in decision-making processes, most of these methods have been adapted to operate within fuzzy environments. This adaptation enabled the incorporation of imprecise or ambiguous information, resulting in more robust and realistic solutions for various fields.

Therefore, the application of an MCDM approach to address the cost-sensitive multi-label feature-selection problem is logical and promising. Unlike single-label classification, multi-label classification requires consideration that each feature may have a different level of contribution to the prediction of each label. A feature that is inexpensive and valuable for one label may be irrelevant to another, adding to the complexity of feature selection. Although considerable work has been conducted on cost-sensitive feature selection for single-label classification, multi-label scenarios remain relatively unexplored. To address this gap, we propose a novel methodology that fuses feature importance with feature cost within an MCDM framework. This fusion aims to optimize feature selection for multi-label classification tasks by balancing predictive performance with cost efficiency. The key contributions of this study are as follows:

(1)
We introduce a novel approach that frames an MCDM problem for cost-sensitive multi-label feature selection by leveraging mutual information. Our method uniquely fuses feature importance along multiple labels with cost factors, thereby enabling a more comprehensive evaluation of features in multi-label scenarios.
(2)
We converted the data to Fermatean fuzzy sets, which are powerful tools for handling uncertainty and imprecision in decision-making processes. We then employed the Fermatean fuzzy WISP method to rank features, considering both their relevance to the labels and their associated costs.
(3)
Unlike existing methods that integrate cost through penalties and select features via a greedy search [5, 6], the proposed approach adopts an MCDM-based strategy for feature ranking. This method achieves globally optimal outcomes by balancing the trade-offs between conflicting objectives, marking a significant advancement over existing techniques.
(4)
We conducted extensive experiments on ten benchmark datasets to evaluate the performance of the proposed method. The results demonstrated the effectiveness of our approach in selecting relevant features while minimizing costs, outperforming existing methods in terms of predictive accuracy and cost efficiency.

The structure of the paper is as follows: Section 2 reviews related work. Section 3 introduces the key notation and fundamental concepts. Section 4 details the proposed methodology for cost-sensitive multi-label feature selection. Section 5 outlines the experimental setup and reports the results obtained from experiments on ten benchmark datasets. Lastly, Section 6 concludes the paper and proposes directions for future research.

2. Related works

In this section, we review some of the key studies related to multi-label feature selection, cost-sensitive feature selection, and MCDM.

Multi-label feature selection has evolved significantly over the years, with various approaches addressing the complexities of real-world scenarios. Early approaches predominantly utilized problem-transformation techniques [1], which converted multi-label problems into multiple single-label problems to facilitate feature selection. However, these approaches often fail to identify intricate nonlinear relationships between features and labels. The field has subsequently witnessed a surge in information-theory-based feature selection methods. Early work on single-label feature selection [7, 8] was later extended and adapted to address the more complex challenges of multi-label feature selection [9]. One approach utilized both mutual and interaction information to guide feature selection [10], while a subsequent method introduced a scalable criterion to handle large label sets more efficiently by leveraging simple dependency functions [11]. Building on these developments, a multi-label feature selection algorithm incorporating two conditional mutual information terms [12] was proposed. Additionally, a framework for information-theoretic methods was developed, proposing a novel approach based on three lower-order information-theoretic terms [13]. However, despite these advancements, existing methods do not account for the costs associated with data in multi-label feature selection.

Several researchers have incorporated feature-associated costs into single-label feature selection. One framework introduced cost constraints into existing single-label feature selection algorithms [3], while another method employed simulated annealing to account for test costs during the feature selection process [14]. While significant progress has been made in cost-sensitive feature selection for single-label scenarios, comparatively less attention has been given to multi-label data. One approach transformed logical labels into label distributions and ranked features using neighborhood mutual information with cost considerations [5]. More recently, another method [6] utilized normalized conditional mutual information with a cost penalty to evaluate feature relevance, followed by a greedy sequential search to derive a cost-constrained feature subset. However, existing multi-label cost-sensitive feature selection methods often overlook the overall feature space structure, leading to suboptimal solutions. This gap in the literature highlights the need for more comprehensive and globally aware cost-sensitive feature selection methods for multi-label data.

Since the introduction of fuzzy sets by Zadeh [15], fuzzy-based MCDM methods have found widespread application in various fields, including supply chain, health and safety, energy, and waste management. This field has experienced significant advancements with the introduction of more sophisticated fuzzy set theories. Atanassov [16] generalized fuzzy sets to intuitionistic fuzzy sets, followed by developing Pythagorean [17] and Fermatean fuzzy sets [18], each offering increased flexibility in representing complex information. Fuzzy-based MCDM tools are particularly adept at handling conflicting criteria, which makes them potentially valuable for feature selection tasks. The WISP method recently developed [4], integrates the weighted sum and product approaches. In their comparative study [19], the authors evaluated WISP and other MCDM methods for real-world decision-making problems. Since then, several extensions of the WISP have been proposed. For instance, the WISP method has been adapted to various set theories, such as fuzzy sets, by one group of researchers [20], and to grey sets for sustainable supplier selection by another [21]. It was later extended to rough sets [22] and intuitionistic fuzzy sets [23], and further expanded to interval-valued Pythagorean fuzzy sets [24]. Another extension applied it to interval-valued Fermatean fuzzy sets for telescopic forklift selection [25]. Early attempts to address this gap include the work of one group [26], MCDM methods to evaluate feature selection techniques for text classification. Another proposed an MCDM-based approach for feature selection in text classification, utilizing ridge regression to develop a decision matrix and the COPRAS method to rank the features [27]. Despite these adaptations, its application to feature selection remains largely unexplored.

3. Preliminaries

This section presents the essential notation and core concepts that will be used throughout the rest of the study. The basic definitions are presented in Supplementary File.

3.1 Multi-label learning

Consider $D$ as a multi-label dataset consisting of $N$ instances and $L$ labels. Subsequently, each instance $I_{i}$ is characterized by a feature vector and label vector, expressed as $I_{i} = (X_{i}, L_{i})$ for $i = 1, \dots, N$ ⁠. Here, the feature vector $X_{i}$ is $(x_{i 1}, \dots, x_{i M})$ and the label vector $L_{i}$ is $(l_{i 1}, \dots, l_{i L})$ ⁠. The objective of multi-label classification is to learn a function that can predict the label vector $L_{k}$ for any given feature vector $X_{k}$ ⁠.

3.2 Mutual information

Mutual information is a key concept in information theory that describes the amount of information shared by two random variables. It is rooted in Shannon’s entropy, a measure introduced by Shannon in his seminal work on information theory [28], which introduced the concept of entropy to quantify uncertainty in random variables.

3.3 Fermatean fuzzy WISP

Recently, an innovative MCDM approach was introduced that combines the weighted sum and weighted product methods [4]. This newly developed method builds upon the foundations of established techniques such as MULTIMOORA, WASPAS, and CoCoSo, and offers a more comprehensive solution for complex decision-making scenarios. In this study, we extend their method to FFSs and apply it to solving the cost-sensitive multi-label feature selection problem. The detailed steps of this method are provided in the Supplementary File.

4. Proposed multi-label feature selection method

This section describes the design of the proposed cost-sensitive multi-label feature selection method, which consists of three distinct stages.

(1)
Feature Relevance Assessment: We started by calculating the mutual information between each feature and label. This calculation measures feature relevance and provides a foundation for subsequent decision-making.
(2)
Decision Matrix Construction and Fuzzy Transformation: We construct a decision matrix by integrating the cost information with the previously determined feature relevance. In this matrix, features serve as alternatives, whereas labels and their associated costs serve as criteria. To better manage the uncertainty and imprecision inherent in the decision-making process, we transform this decision matrix into Fermatean fuzzy sets. This transformation enhanced the flexibility of the proposed approach.
(3)
Feature Ranking: We employed the Fermatean fuzzy WISP method to rank features based on their relevance to labels and associated costs. This final stage synthesizes information from the previous steps to produce a ranked list of features. The Fermatean fuzzy model was chosen over other generalized fuzzy models due to its enhanced capacity to capture a wider range of uncertainties and offer a more refined representation of both membership and non-membership degrees.

The subsequent steps will elucidate each stage of this procedure in detail:

Step 1. Let $X_{: i}$ ⁠, $i = 1, \dots, M$ ⁠, represent the feature vector and $L_{: k}$ ⁠, where $k = 1, \dots, L$ represents the column-wise label vectors. Let $C_{i}$ denote the cost of $i$ -th feature. We considered the features as alternatives and labels, along with cost, as criteria. The decision matrix is obtained by calculating the mutual information as follows:

D = (\begin{matrix} I (X_{: 1}, L_{: 1}) & \dots & I (X_{: 1}, L_{: L}) & C_{1} \\ I (X_{: 2}, L_{: 1}) & \dots & I (X_{: 2}, L_{: L}) & C_{2} \\ ⋮ & ⋱ & ⋮ \\ I (X_{: M}, L_{: 1}) & \dots & I (X_{: M}, L_{: L}) & C_{M} \end{matrix})

(1)

Step 2. Next, this decision matrix is transformed into FFSs as follows [29]:

D_{FFS} = (\begin{matrix} {\tilde{S}}_{FFS} (μ_{11}, ν_{12}) & \dots & {\tilde{S}}_{FFS} (μ_{1 L}, ν_{1 L}) & {\tilde{S}}_{FFS} (μ_{1 (L + 1)}, ν_{1 (L + 1)}) \\ {\tilde{S}}_{FFS} (μ_{21}, ν_{22}) & \dots & {\tilde{S}}_{FFS} (μ_{2 L}, ν_{2 L}) & {\tilde{S}}_{FFS} (μ_{2 (L + 1)}, ν_{2 (L + 1)}) \\ ⋮ & ⋱ & ⋮ & ⋮ \\ {\tilde{S}}_{FFS} (μ_{M 1}, ν_{M 2}) & \dots & {\tilde{S}}_{FFS} (μ_{M L}, ν_{M L}) & {\tilde{S}}_{FFS} (μ_{M (L + 1)}, ν_{M (L + 1)}) \end{matrix})

(2)

where

μ_{i j} = 1 - \frac{1 - {\overset{´}{μ}}_{i j}}{1 + (e^{λ} - 1) {\overset{´}{μ}}_{i j}}, ν_{i j} = \frac{1 - μ_{i j}}{1 + (e^{λ} - 1) μ_{i j}}, λ > 0 .

(3)

with

{\overset{´}{μ}}_{i j} = \frac{D_{i j} - \min_{j} D_{i j}}{\max_{j} D_{i j} - \min_{j} D_{i j}} .

(4)

Furthermore, the hesitation degree $π_{i j}$ ⁠, is computed as $π_{i j} = \sqrt[3]{1 - μ_{i j}^{3} - ν_{i j}^{3}}$ for all $i = 1, \dots, M$ ⁠, and $j = 1, \dots, L, L + 1$ ⁠. Thus, the problem of cost-sensitive multi-label feature selection is transformed into an MCDM problem. Here, the criteria with respect to the labels are the maximization criteria, whereas the cost is the minimization criterion. The objective is to maximize the relevance of the feature with respect to the labels and minimize the cost associated with the feature.

Step 3. The sum and product of each alternative for the maximization and minimization criteria were calculated as follows:

S_{i}^{\max} = 〈 \sqrt[3]{1 - \prod_{j \in Ω_{\max}} {(1 - μ_{i j}^{3})}^{w_{j}}}, \prod_{j \in Ω_{\max}} {(ν_{i j})}^{w_{j}} 〉

(5)

S_{i}^{\min} = 〈 \sqrt[3]{1 - \prod_{j \in Ω_{\min}} {(1 - μ_{i j}^{3})}^{w_{j}}}, \prod_{j \in Ω_{\min}} {(ν_{i j})}^{w_{j}} 〉

(6)

P_{i}^{\max} = 〈 \prod_{j \in Ω_{\max}} {(μ_{i j})}^{w_{j}}, \sqrt[3]{1 - \prod_{j \in Ω_{\max}} {(1 - ν_{i j}^{3})}^{w_{j}}} 〉

(7)

P_{i}^{\min} = 〈 \prod_{j \in Ω_{\min}} {(μ_{i j})}^{w_{j}}, \sqrt[3]{1 - \prod_{j \in Ω_{\min}} {(1 - ν_{i j}^{3})}^{w_{j}}} 〉

(8)

where $w_{j}$ is the weight corresponding to the criterion $j$ ⁠, $w_{j} \in [0,1]$ ⁠, and $\sum_{j = 1}^{L} w_{j} = 1$ ⁠. For the criteria with respect to the labels, the information entropy values were considered as the weights, whereas the importance of the cost was considered as the weight for the cost criterion.

Step 4. The Fermatean fuzzy values were then converted to crisp values using the score function as follows:

{\tilde{S}}_{i}^{\max} = s (S_{i}^{\max})

(9)

{\tilde{S}}_{i}^{\min} = s (P_{i}^{\min})

(10)

{\tilde{P}}_{i}^{\max} = s (S_{i}^{\max})

(11)

{\tilde{P}}_{i}^{\min} = s (P_{i}^{\min}) .

(12)

Step 5. The utility measures are calculated in the following manner:

u_{i}^{s d} = {\tilde{S}}_{i}^{\max} - {\tilde{S}}_{i}^{\min}

(13)

u_{i}^{p d} = {\tilde{P}}_{i}^{\max} - {\tilde{P}}_{i}^{\min}

(14)

u_{i}^{s r} = \frac{{\tilde{S}}_{i}^{\max}}{{\tilde{S}}_{i}^{\min}}

(15)

u_{i}^{p r} = \frac{{\tilde{P}}_{i}^{\max}}{{\tilde{P}}_{i}^{\min}} .

(16)

Step 6. The utility measures are recalculated as follows:

{\tilde{u}}_{i}^{s d} = \frac{1 + u_{i}^{s d}}{1 + \max_{i} u_{i}^{s d}}

(17)

{\tilde{u}}_{i}^{p d} = \frac{1 + u_{i}^{p d}}{1 + \max_{i} u_{i}^{p d}}

(18)

{\tilde{u}}_{i}^{s r} = \frac{1 + u_{i}^{s r}}{1 + \max_{i} u_{i}^{s r}}

(19)

{\tilde{u}}_{i}^{p r} = \frac{1 + u_{i}^{p r}}{1 + \max_{i} u_{i}^{p r}} .

(20)

Step 7. The final utility score $u_{i}$ is obtained as follows:

u_{i} = \frac{{\tilde{u}}_{i}^{s d} + {\tilde{u}}_{i}^{p d} + {\tilde{u}}_{i}^{s r} + {\tilde{u}}_{i}^{p r}}{4}

(21)

Step 8. The overall ranking of the features was determined by sorting the alternatives based on their utility scores. The feature with the highest utility score was considered the best.

The pseudo code of this proposed method is provided in the Supplementary File.

5. Experimental results

In this section, we demonstrate the effectiveness of the proposed method through a comprehensive set of experiments that utilize various evaluation metrics across multiple datasets.

5.1 Datasets

In the experiment, we utilize ten benchmark multi-label datasets collected from the MULAN [¹] and UCO [²] libraries. The characteristics of each dataset are listed in Table 1. Here, “label cardinality” denotes the mean number of labels assigned to each instance, while “label density” represents the normalized label cardinality [30]. Because of the absence of inherent cost information in the datasets, we adopted a common approach from the existing literature [3] to simulate the cost factors. Specifically, we assigned each feature a random value between 0 and 1 to represent its associated cost.

Table 1

Summary of the ten benchmark datasets

Datasets	Instances	Features	Labels	Label cardinality	Label density
Arts	5,000	462	26	1.6360	0.0629
Education	5,000	550	33	1.4606	0.0443
Entertainment	5,000	640	21	1.4204	0.0676
Genbase	662	1,186	27	1.2520	0.0460
GpositivePseAAC	519	440	4	1.0080	0.2520
Health	5,000	612	32	1.6622	0.0519
Recreation	5,000	606	22	1.4232	0.0647
Reference	5,000	793	33	1.1690	0.0350
Social	5,000	1,047	39	1.2834	0.0329
Yeast	2,417	103	14	4.2370	0.3030

Datasets	Instances	Features	Labels	Label cardinality	Label density
Arts	5,000	462	26	1.6360	0.0629
Education	5,000	550	33	1.4606	0.0443
Entertainment	5,000	640	21	1.4204	0.0676
Genbase	662	1,186	27	1.2520	0.0460
GpositivePseAAC	519	440	4	1.0080	0.2520
Health	5,000	612	32	1.6622	0.0519
Recreation	5,000	606	22	1.4232	0.0647
Reference	5,000	793	33	1.1690	0.0350
Social	5,000	1,047	39	1.2834	0.0329
Yeast	2,417	103	14	4.2370	0.3030

Source(s): Authors’ own creation

5.2 Experiment settings

To evaluate the performance of the algorithms, we employed three example-based metrics: ranking loss, coverage, and average precision, accompanied by two metrics based on labels: Macro-F1 and Micro-F1 [1]. The lower scores of ranking loss and coverage indicate better performance (denoted by $↓$ ⁠), while greater scores of average precision, Macro-F1, and Micro-F1 signify superior performance (denoted by $↑$ ⁠). The experiments were conducted using an ML-kNN classifier [31] with k = 10 and a smoothing parameter of 1. To ensure the reliability of the results, we employed a fivefold cross-validation approach.

We compared the proposed approach against two existing cost-sensitive multi-label feature selection techniques: cost-sensitive feature selection on multi-label data (CFSM) [5] and cost-constrained feature selection with a mutual information approach (CFSMIA) [6]. Furthermore, it was compared to seven conventional multi-label feature selection algorithms: D2F [10], IGMF [32], LRDG [33], ML-COPRAS [27], MLACO [34], SCLS [11], and STFS [13]. These methods were chosen based on their relevance to the proposed approach, despite the existence of numerous alternatives.

5.3 Results and discussion

We compared the proposed method with two existing cost-sensitive multi-label feature selection methods and seven conventional multi-label feature selection techniques. Each algorithm is evaluated under various cost constraints to assess its performance. First, the algorithms ranked the features, and each metric was applied as the total feature cost was progressively limited. The metric values across different cost constraints are presented in Figure 1, where the $x$ -axis represents the cost values, and the $y$ -axis indicates the corresponding metric values at each cost level. The averages of these values are compiled in Table 2 to provide a clear quantitative summary; the best values are highlighted in bold. We also conducted a win-tie-loss analysis to compare the proposed method against each benchmark method. The last row of the table includes the average metric values for each comparison method and their average rankings. The results for the other four metrics are provided in a separate Supplementary File.

Figure 1

View large Download slide

Performance comparison of algorithms based on average precision $(↑)$

Table 2

Average performance comparison of algorithms based on average precision $(↑)$

Algorithms	D2F	IGMF	LRDG	ML-COPRAS	MLACO	SCLS	STFS	CFSM	CFSMIA	PROPOSED
Arts	0.4988 (4)	0.4627 (8)	0.4429 (10)	0.4699 (7)	0.4941 (5)	0.5049 (2)	0.5042 (3)	0.4462 (9)	0.4788 (6)	0.5100 (1)
Education	0.5397 (4)	0.5015 (7)	0.4793 (10)	0.5115 (5)	0.5068 (6)	0.5431 (3)	0.5462 (1)	0.4926 (9)	0.4951 (8)	0.5450 (2)
Entertainment	0.5484 (6)	0.5236 (8)	0.4972 (9)	0.5576 (3)	0.5516 (4)	0.5659 (2)	0.5431 (7)	0.4911 (10)	0.5492 (5)	0.5683 (1)
Genbase	0.9274 (4)	0.9158 (6)	0.4996 (10)	0.9426 (1.5)	0.5336 (8.5)	0.9415 (3)	0.9178 (5)	0.5336 (8.5)	0.6406 (7)	0.9426 (1.5)
GpositivePseAAC	0.8139 (3)	0.7229 (8)	0.7226 (9)	0.6937 (10)	0.7677 (7)	0.8107 (4)	0.8148 (2)	0.8002 (6)	0.8023 (5)	0.8232 (1)
Health	0.6945 (3)	0.6316 (7.5)	0.6167 (10)	0.6743 (5)	0.6677 (6)	0.6924 (4)	0.6976 (2)	0.6297 (9)	0.6316 (7.5)	0.7055 (1)
Recreation	0.5049 (4)	0.3958 (10)	0.4015 (8)	0.4706 (6)	0.4794 (5)	0.5135 (3)	0.5146 (2)	0.4008 (9)	0.4319 (7)	0.5162 (1)
Reference	0.6074 (5)	0.5833 (7)	0.5725 (10)	0.5888 (6)	0.6082 (4)	0.6112 (3)	0.6143 (2)	0.5806 (8)	0.5773 (9)	0.6216 (1)
Social	0.7086 (3)	0.6543 (8)	0.6041 (10)	0.6834 (6)	0.6352 (9)	0.7043 (5)	0.7114 (2)	0.7045 (4)	0.6561 (7)	0.7182 (1)
Yeast	0.7482 (3)	0.7283 (9)	0.7366 (7)	0.7358 (8)	0.7451 (6)	0.7455 (5)	0.7471 (4)	0.7188 (10)	0.7518 (2)	0.7526 (1)
Win/Tie/Loss	10/0/0	10/0/0	10/0/0	9/1/0	10/0/0	10/0/0	9/0/1	10/0/0	10/0/0
Average	0.6592 (3.9)	0.6120 (7.85)	0.5573 (9.3)	0.6328 (5.75)	0.5989 (6.05)	0.6633 (3.4)	0.6611 (3.0)	0.5798 (8.25)	0.6015 (6.35)	0.6703 (1.15)

Algorithms	D2F	IGMF	LRDG	ML-COPRAS	MLACO	SCLS	STFS	CFSM	CFSMIA	PROPOSED
Arts	0.4988 (4)	0.4627 (8)	0.4429 (10)	0.4699 (7)	0.4941 (5)	0.5049 (2)	0.5042 (3)	0.4462 (9)	0.4788 (6)	0.5100 (1)
Education	0.5397 (4)	0.5015 (7)	0.4793 (10)	0.5115 (5)	0.5068 (6)	0.5431 (3)	0.5462 (1)	0.4926 (9)	0.4951 (8)	0.5450 (2)
Entertainment	0.5484 (6)	0.5236 (8)	0.4972 (9)	0.5576 (3)	0.5516 (4)	0.5659 (2)	0.5431 (7)	0.4911 (10)	0.5492 (5)	0.5683 (1)
Genbase	0.9274 (4)	0.9158 (6)	0.4996 (10)	0.9426 (1.5)	0.5336 (8.5)	0.9415 (3)	0.9178 (5)	0.5336 (8.5)	0.6406 (7)	0.9426 (1.5)
GpositivePseAAC	0.8139 (3)	0.7229 (8)	0.7226 (9)	0.6937 (10)	0.7677 (7)	0.8107 (4)	0.8148 (2)	0.8002 (6)	0.8023 (5)	0.8232 (1)
Health	0.6945 (3)	0.6316 (7.5)	0.6167 (10)	0.6743 (5)	0.6677 (6)	0.6924 (4)	0.6976 (2)	0.6297 (9)	0.6316 (7.5)	0.7055 (1)
Recreation	0.5049 (4)	0.3958 (10)	0.4015 (8)	0.4706 (6)	0.4794 (5)	0.5135 (3)	0.5146 (2)	0.4008 (9)	0.4319 (7)	0.5162 (1)
Reference	0.6074 (5)	0.5833 (7)	0.5725 (10)	0.5888 (6)	0.6082 (4)	0.6112 (3)	0.6143 (2)	0.5806 (8)	0.5773 (9)	0.6216 (1)
Social	0.7086 (3)	0.6543 (8)	0.6041 (10)	0.6834 (6)	0.6352 (9)	0.7043 (5)	0.7114 (2)	0.7045 (4)	0.6561 (7)	0.7182 (1)
Yeast	0.7482 (3)	0.7283 (9)	0.7366 (7)	0.7358 (8)	0.7451 (6)	0.7455 (5)	0.7471 (4)	0.7188 (10)	0.7518 (2)	0.7526 (1)
Win/Tie/Loss	10/0/0	10/0/0	10/0/0	9/1/0	10/0/0	10/0/0	9/0/1	10/0/0	10/0/0
Average	0.6592 (3.9)	0.6120 (7.85)	0.5573 (9.3)	0.6328 (5.75)	0.5989 (6.05)	0.6633 (3.4)	0.6611 (3.0)	0.5798 (8.25)	0.6015 (6.35)	0.6703 (1.15)

The best values (with first rank) was highlighted in bold fonts

Source(s): Authors’ own creation

With respect to average precision, as illustrated in Figure 1, the proposed method consistently outperformed the comparison methods across most cost levels. While some algorithms show non-monotonic trends, with fluctuations in average precision as costs increase, the proposed method exhibits a more stable and reliable performance throughout the cost spectrum. This increasing trend reflects the ability of the algorithm to rank features by considering their relevance to labels and the associated costs. The proposed method also achieved a superior performance for the other four metrics evaluated. Notably, it performs well across all cost constraints on the social dataset. In contrast, most algorithms encountered challenges on the Genbase and Yeast datasets, likely because of the high relative difference between the number of features and instances in these datasets. Despite this, the proposed method continued to outperform the other algorithms, demonstrating its robustness across diverse datasets.

Based on the average values presented in Table 2, the proposed method outperforms all other methods in every dataset except for one (Education), where it ranks lower. It also shares the top position with ML-COPRAS in the Genbase dataset. Additionally, in terms of ranking loss and coverage, the proposed method delivered the best results across nine datasets, with the exception of the Recreation dataset. Macro-F1 attained the highest scores in all datasets, whereas Micro-F1 yielded seven datasets. Overall, these results highlight the effectiveness and adaptability of the proposed method for cost-sensitive multi-label feature selection.

5.4 Statistical analysis

In this section, we employ two nonparametric tests, the Friedman [35] and Bonferroni-Dunn tests [36] to systematically test the significance of the proposed method. Table 3 lists the Friedman test results for various evaluation metrics, along with the critical value. The null hypothesis of equal performance is rejected because the $F_{F}$ values exceed the critical value. Subsequently, we utilized the Bonferroni-Dunn post hoc test to analyze the relative performance of the comparison methods. The performance of the two algorithms differs significantly if the average rank differs by one critical difference (CD). Figure 2 presents CD diagrams with respect to various evaluation metrics. Although the performance of the proposed method is comparable to that of some methods, it demonstrates a significantly superior average classification performance compared to the other algorithms.

Table 3

Friedman test results across different evaluation metrics

Evaluation metic	$F_{F}$	Critical value $(α = 0.05)$
Ranking loss	30.0071	1.9976
Coverage	32.0108
Average precision	25.1458
Macro-F1	32.1243
Micro-F1	21.7772

Source(s): Authors’ own creation

Figure 2

View large Download slide

Bonferroni-Dunn test results across different evaluation metrics

5.5 Stability analysis

To intuitively evaluate the stability of the proposed method across different datasets, we conducted a stability analysis using radar charts. Performance scores were normalized to the range $[0.1, 0.5]$ ⁠. A more regular polygon shape formed by the connecting lines indicates greater stability of the corresponding algorithm across different datasets. Figure 3 illustrates the stability analysis results of the proposed method for ten benchmark datasets and five evaluation metrics. The results indicated that the proposed method was more stable than the other methods.

Figure 3

View large Download slide

Stability analysis across different evaluation metrics

6. Conclusion

In this study, a novel approach to cost-sensitive multi-label feature selection was proposed that effectively integrates feature importance with feature costs within an MCDM framework. The method begins by constructing a decision matrix utilizing mutual information along with cost values, which are then transformed into Fermatean fuzzy numbers. Subsequently, the Fermatean fuzzy WISP method is employed to rank the features. Unlike existing approaches that rely on the penalty-based integration of costs and employ greedy search strategies, the proposed MCDM-based technique offers a more comprehensive and flexible solution for balancing feature importance and costs. This strategic advantage was demonstrated through extensive experiments conducted on ten benchmark datasets, where the proposed method consistently outperformed state-of-the-art feature-selection algorithms across five evaluation metrics. The ability to optimize feature selection while accounting for costs represents a significant advancement, particularly in real-world applications, where data acquisition can incur substantial time, financial, and computational expenses. Future research will focus on extending the proposed method to handle missing values in datasets, a challenge frequently encountered in many practical applications. In addition, this approach can be adapted for parallel processing, potentially enhancing its efficiency and scalability. By addressing these limitations, the versatility and applicability of a cost-sensitive multi-label feature selection approach can be significantly enhanced, making it more robust and practical for a wide range of real-world scenarios.

This work was supported by the National Research Foundation of Korea (NRF) funded by the Ministry of Science and ICT (2021R1A2C1014338 and RS-2024-00344752). This research was also supported by the Department of Integrative Biotechnology, Sungkyunkwan University (SKKU) and the BK21 FOUR Project.

Notes

1.

https://mulan.sourceforge.net/datasets-mlc.html

2.

https://www.uco.es/kdis/mllresources/

References

1

Kashef

S

,

Nezamabadi-pour

H

,

Nikpour

B

.

Multilabel feature selection: a comprehensive review and guiding experiments

.

Wiley Interdisciplinary Reviews: Data Mining Knowledge Discov

.

2018

;

8

(

2

): e1240. doi:

https://doi.org/10.1002/widm.1240

.

Google Scholar

2

Turney

PD

.

Types of cost in inductive concept learning

.

arXiv. preprint cs/0212034 2002

.

3

Bolón-Canedo

V

,

Porto-Díaz

I

,

Sánchez-Maroño

N

,

Alonso-Betanzos

A

.

A framework for cost-based feature selection

.

Pattern Recognition

.

2014

;

47

(

7

):

2481

-

9

. doi:

https://doi.org/10.1016/j.patcog.2014.01.008

.

Google Scholar

Crossref

4

Stanujkic

D

,

Popovic

G

,

Karabasevic

D

,

Meidute-Kavaliauskiene

I

,

Ulutaş

A

.

An integrated simple weighted sum product method-WISP

.

IEEE Trans Eng Manag

.

2021

;

70

(

5

):

1933

-

44

. doi:

https://doi.org/10.1109/tem.2021.3075783

.

Google Scholar

Crossref

5

Long

X

,

Qian

W

,

Wang

Y

,

Shu

W

.

Cost-sensitive feature selection on multi-label data via neighborhood granularity and label enhancement

.

Appl Intelligence

.

2021

;

51

(

4

):

2210

-

32

. doi:

https://doi.org/10.1007/s10489-020-01993-w

.

Google Scholar

Crossref

6

Klonecki

T

,

Teisseyre

P

,

Lee

J

.

Cost-constrained feature selection in multilabel classification using an information-theoretic approach

.

Pattern Recognition

.

2023

;

141

: 109605. doi:

https://doi.org/10.1016/j.patcog.2023.109605

.

Google Scholar

7

Battiti

R

.

Using mutual information for selecting features in supervised neural net learning

.

IEEE Trans Neural networks

.

1994

;

5

(

4

):

537

-

50

. doi:

https://doi.org/10.1109/72.298224

.

Google Scholar

Crossref

PubMed

8

Peng

H

,

Long

F

,

Ding

C

.

Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy

.

IEEE Trans Pattern Anal Mach Intell

.

2005

;

27

(

8

):

1226

-

38

. doi:

https://doi.org/10.1109/tpami.2005.159

.

Google Scholar

Crossref

PubMed

9

Lin

Y

,

Hu

Q

,

Liu

J

,

Duan

J

.

Multi-label feature selection based on max-dependency and min-redundancy

.

Neurocomputing

.

2015

;

168

:

92

-

103

. doi:

https://doi.org/10.1016/j.neucom.2015.06.010

.

Google Scholar

Crossref

10

Lee

J

,

Kim

DW

.

Mutual information-based multi-label feature selection using interaction information

.

Expert Syst Appl

.

2015

;

42

(

4

):

2013

-

25

. doi:

https://doi.org/10.1016/j.eswa.2014.09.063

.

Google Scholar

Crossref

11

Lee

J

,

Kim

DW

.

SCLS: multi-label feature selection based on scalable criterion for large label set

.

Pattern Recogn

.

2017

;

66

:

342

-

52

. doi:

https://doi.org/10.1016/j.patcog.2017.01.014

.

Google Scholar

Crossref

12

Zhang

P

,

Gao

W

.

Feature relevance term variation for multi-label feature selection

.

Appl Intell

.

2021

;

51

(

7

):

5095

-

110

. doi:

https://doi.org/10.1007/s10489-020-02129-w

.

Google Scholar

Crossref

13

Gao

W

,

Hao

P

,

Wu

Y

,

Zhang

P

.

A unified low-order information-theoretic feature selection framework for multi-label learning

.

Pattern Recogn

.

2023

;

134

: 109111. doi:

https://doi.org/10.1016/j.patcog.2022.109111

.

Google Scholar

14

Niu

J

,

Zhao

H

,

Zhu

W

.

Feature selection with test cost constraint through a simulated annealing algorithm

.

J Internet Technol

.

2016

;

17

(

6

):

1133

-

40

. doi:

https://doi.org/10.6138/JIT.2016.17.6.20141119

.

Google Scholar

15

Zadeh

LA

.

Fuzzy sets as a basis for a theory of possibility

.

Fuzzy Sets and Syst

.

1978

;

1

(

1

):

3

-

28

. doi:

https://doi.org/10.1016/0165-0114(78)90029-5

.

Google Scholar

Crossref

16

Atanassov

KT

,

Atanassov

KT

.

Intuitionistic Fuzzy Sets

.

Heidelberg

:

Physica

;

1999

.

Google Scholar

Crossref

17

Yager

RR

.

Pythagorean fuzzy subsets

. In:

2013 joint IFSA world congress and NAFIPS annual meeting (IFSA/NAFIPS)

.

IEEE

;

2013

. p.

57

-

61

.

Google Scholar

Crossref

18

Senapati

T

,

Yager

RR

.

Fermatean fuzzy sets

.

J Ambient Intell Hum Comput

.

2020

;

11

(

2

):

663

-

74

. doi:

https://doi.org/10.1007/s12652-019-01377-0

.

Google Scholar

Crossref

19

Stanujkić

D

,

Karabašević

D

,

Popović

G

,

Zavadskas

EK

,

Saračević

M

,

Stanimirović

PS

,

Ulutaş

A.

,

Katsikis

V.N.

,

Meidute-Kavaliauskiene

I

.

Comparative analysis of the simple WISP and some prominent MCDM methods: A Python approach

.

Axioms

.

2021

;

10

(

4

):

347

. doi:

https://doi.org/10.3390/axioms10040347

.

Google Scholar

Crossref

20

Karabašević

D

,

Ulutaş

A

,

Stanujkić

D

,

Saračević

M

,

Popović

G

.

A new fuzzy extension of the simple WISP method

.

Axioms

.

2022

;

11

(

7

):

332

. doi:

https://doi.org/10.3390/axioms11070332

.

Google Scholar

Crossref

21

Ulutaş

A

,

Topal

A

,

Pamučar

D

,

Stević

Ž

,

Karabašević

D

,

Popović

G

.

A new integrated multi-criteria decision-making model for sustainable supplier selection based on a novel grey WISP and grey BWM methods

.

Sustainability

.

2022

;

14

(

24

): 16921. doi:

https://doi.org/10.3390/su142416921

.

Google Scholar

22

Cao

B

,

Jin

Y

,

Ulutaş

A

,

Topal

A

,

Stević

Ž

,

Karabasevic

D

,

Sava

C

.

A new integrated rough multi-criteria decision-making model for enterprise resource planning software selection

.

PeerJ Comput Sci

.

2024

;

10

: e2096. doi:

https://doi.org/10.7717/peerj-cs.2096

.

Google Scholar

23

Zavadskas

EK

,

Stanujkic

D

,

Turskis

Z

,

Karabasevic

D

.

An intuitionistic extension of the simple WISP method

.

Entropy

.

2022

;

24

(

2

):

218

. doi:

https://doi.org/10.3390/e24020218

.

Google Scholar

Crossref

PubMed

24

Rani

P

,

Pamucar

D

,

Mishra

AR

,

Hezam

IM

,

Ali

J

,

Ahammad

SH

.

An integrated interval-valued Pythagorean fuzzy WISP approach for industry 4.0 technology assessment and digital transformation

.

Ann Oper Res

.

2023

;

342

(

2

):

1

-

40

. doi:

https://doi.org/10.1007/s10479-023-05355-w

.

Google Scholar

Crossref

25

Görçün

ÖF

,

Ulutaş

A

,

Topal

A

,

Ecer

F

.

Telescopic forklift selection through a novel interval-valued fermatean fuzzy PIPRECIA–WISP approach

.

Expert Syst Appl

.

2024

;

255

: 124674. doi:

https://doi.org/10.1016/j.eswa.2024.124674

.

Google Scholar

26

Kou

G

,

Yang

P

,

Peng

Y

,

Xiao

F

,

Chen

Y

,

Alsaadi

FE

.

Evaluation of feature selection methods for text classification with small datasets using multiple criteria decision-making methods

.

Appl Soft Comput

.

2020

;

86

: 105836. doi:

https://doi.org/10.1016/j.asoc.2019.105836

.

Google Scholar

27

Mohanrasu

SS

,

Janani

K

,

Rakkiyappan

R

.

A COPRAS-based approach to multi-label feature selection for text classification

.

Math Comput Simul

.

2024

;

222

:

3

-

23

. doi:

https://doi.org/10.1016/j.matcom.2023.07.022

.

Google Scholar

Crossref

28

Shannon

CE

.

A mathematical theory of communication

.

Bell Syst Tech J

.

1948

;

27

(

3

):

379

-

423

. doi:

https://doi.org/10.1002/j.1538-7305.1948.tb01338.x

.

Google Scholar

Crossref

29

Dounis

A

,

Avramopoulos

AN

,

Kallergi

M

.

Advanced fuzzy sets and genetic algorithm optimizer for mammographic image enhancement

.

Electronics

.

2023

;

12

(

15

):

3269

. doi:

https://doi.org/10.3390/electronics12153269

.

Google Scholar

Crossref

30

Tsoumakas

G

,

Katakis

I

.

Multi-label classification: an overview

.

Int J Data Warehous Min

.

2007

;

3

(

3

):

1

-

13

. doi:

https://doi.org/10.4018/jdwm.2007070101

.

Google Scholar

Crossref

31

Zhang

ML

,

Zhou

ZH

.

ML-KNN: A lazy learning approach to multi-label learning

.

Pattern Recogn

.

2007

;

40

(

7

):

2038

-

48

. doi:

https://doi.org/10.1016/j.patcog.2006.12.019

.

Google Scholar

Crossref

32

Li

L

,

Liu

H

,

Ma

Z

,

Mo

Y

,

Duan

Z

,

Zhou

J

,

Zhao

J

.

Multi-label feature selection via information gain

. In:

Advanced Data Mining and Applications: 10th International Conference, ADMA 2014

,

December 19-21, 2014

,

Guilin, China

;

2014

. p.

345

-

55

.

Proceedings 10

.

Springer

. doi:

https://doi.org/10.1007/978-3-319-14717-8_27

.

Google Scholar

Crossref

33

Zhang

Y

,

Huo

W

,

Tang

J

.

Multi-label feature selection via latent representation learning and dynamic graph constraints

.

Pattern Recogn

.

2024

;

151

: 110411. doi:

https://doi.org/10.1016/j.patcog.2024.110411

.

Google Scholar

34

Paniri

M

,

Dowlatshahi

MB

,

Nezamabadi-Pour

H

.

MLACO: a multi-label feature selection algorithm based on ant colony optimization

.

Knowledge-Based Syst

.

2020

;

192

: 105285. doi:

https://doi.org/10.1016/j.knosys.2019.105285

.

Google Scholar

35

Friedman

M

.

A comparison of alternative tests of significance for the problem of m rankings

.

Ann Math Stat

.

1940

;

11

(

1

):

86

-

92

. doi:

https://doi.org/10.1214/aoms/1177731944

.

Google Scholar

Crossref

36

Dunn

OJ

.

Multiple comparisons among means

.

J Am Stat Assoc

.

1961

;

56

(

293

):

52

-

64

. doi:

https://doi.org/10.2307/2282330

.

Google Scholar

Crossref

Supplementary material

The supplementary material for this article can be found online.

2025

S.S. Mohanrasu, Le Thi Phan, Rakkiyappan Rajan and Balachandran Manavalan

Published in Applied Computing and Informatics. Published by Emerald Publishing Limited. This article is published under the Creative Commons Attribution (CC BY 4.0) licence. Anyone may reproduce, distribute, translate and create derivative works of this article (for both commercial and non-commercial purposes), subject to full attribution to the original publication and authors. The full terms of this licence may be seen at http://creativecommons.org/licences/by/4.0/legalcode

Cost-sensitive feature selection for multi-label classification: multi-criteria decision-making approach

1. Introduction

2. Related works

3. Preliminaries

3.1 Multi-label learning

3.2 Mutual information

3.3 Fermatean fuzzy WISP

4. Proposed multi-label feature selection method

5. Experimental results

5.1 Datasets

5.2 Experiment settings

5.3 Results and discussion

5.4 Statistical analysis

5.5 Stability analysis

6. Conclusion

Notes

References

Supplementary material

Supplementary data

Email Alerts

Cited By

Cost-sensitive feature selection for multi-label classification: multi-criteria decision-making approach Open Access

1. Introduction

2. Related works

3. Preliminaries

3.1 Multi-label learning

3.2 Mutual information

3.3 Fermatean fuzzy WISP

4. Proposed multi-label feature selection method

5. Experimental results

5.1 Datasets

5.2 Experiment settings

5.3 Results and discussion

5.4 Statistical analysis

5.5 Stability analysis

6. Conclusion

Notes

References

Supplementary material

Supplementary data

Email Alerts

Suggested Reading

Recommended for you

Cited By

Gift article access

Gift article access

Gift article access

Gift article access

Sharing Unavailable

Cost-sensitive feature selection for multi-label classification: multi-criteria decision-making approach