RELEVANCE, REDUNDANCY, AND COMPLEMENTARITY TRADE-OFF (RRCT): A PRINCIPLED, GENERIC, ROBUST FEATURE-SELECTION TOOL

Relevance, redundancy, and complementarity trade-off (RRCT): A principled, generic, robust feature-selection tool

Relevance, redundancy, and complementarity trade-off (RRCT): A principled, generic, robust feature-selection tool

Blog Article

Summary: We present a new heuristic feature-selection (FS) algorithm that integrates in a principled algorithmic framework the three key FS components: relevance, redundancy, and complementarity.Thus, we call it relevance, redundancy, and complementarity trade-off (RRCT).The association strength between each feature and the response and between feature pairs is quantified via an information theoretic transformation of rank correlation coefficients, and the feature complementarity is quantified using partial correlation coefficients.We empirically benchmark the performance of RRCT against 19 FS algorithms across four synthetic and eight real-world datasets in indicative challenging settings evaluating the following: (1) matching the true feature set and (2) out-of-sample performance in binary and multi-class classification problems when presenting selected features into a random forest.RRCT is very competitive in here both tasks, and we tentatively make suggestions on the generalizability and application of the best-performing FS algorithms across settings where they may operate effectively.

The bigger picture: High-dimensional datasets are now increasingly met across a wide span of data-science applications, where the large number of variables may obstruct the extraction of useful patterns in the data and often prove detrimental to the subsequent supervised learning process (regression or classification).The problem is known as the curse of dimensionality and is often tackled using feature-selection algorithms.This study proposes an accurate, robust, computationally efficient feature-selection algorithm that is applicable in mixed-type variable settings across both regression and classification while inherently accounting for all key properties in determining a parsimonious feature subset: relevance, redundancy, and conditional relevance (or complementarity).Selecting a robust feature subset can save on data-collection resources, reduce computational cost and statistical-model portability, enhance interpretability, and often craggy range sauvignon blanc 2022 increase model generalization performance.

Report this page