A discretization method based on maximizing the area under receiver operating characteristic curve

Original by M. Kurtcephe, H. A. Güvenir, Bilkent University, 2013, 25 pages

This summary note was Posted on

• Many machine learning algorithms require all features to be categorical
• Use discretization to convert the range of continuous feature into intervals
• Dougherty and al (1995) showed that discretization improve the predictive performance and running time of machine learning algorithms
• Need to find the proper cut-points for each interval
• Propose the Maximum Area under the ROC curve-based Discretization (MAD)
• Good alternative to other discretization methods
• MAD is a supervised, merging , global and statistic discretization method

Discretization types

• Supervised vs unsupervised: Equal width or equal frequency binning do not use class labels of instances and are called unsupervised.
• Splitting vs merging:
• Splitting divides continuous space into a set of smaller intervals by finding proper cut points

-> Optimised using entropy

• Merging use each distinct point on the continuous space as an individual candidate and merges similar neighbors to form larger intervals

-> Optimised using chisquare

• Global vs local: some methods use subsets of the instance space rather than the whole space and are called local methods
• Dynamic vs static: static discretization discretizes attributes one by one and assume no interrelations between attributes

• AUC is equal to the probability that the classifier will rank a randomly chosen positive instance higher that a randomly chosen negative instance
• AUC is the same method used by the Wilcoxon test of ranks (Hanley and Mc Neil – 1982)
• Discretization measure is thus the minimum number of cut-points that rank positive labeled instances higher than the negative labeled instance
• Calculate ROC using every cut points
• MAD method continues to merge candidate cut-points to form larger intervals until the maximum AUC is obtained. Compares slopes between every two consecutive ROC points and finds the junction points that cause concavities and eliminates them thus eliminating points which are not on the convex hull
• Other method for calculating the convex Hull is the Quick Hull method, proposed by  Preperata and Shamos (1985)
• The maximum AUC is defined by the convex hull formed by the ROC points
• Optimization method of the convex Hull different for two-class vs multi-class datasets
• Use Provost and Domingos (2001) for multi class
• The MAD method is not designed to minimize the number of intervals but to maximize the AUC (Entropy MDLP is known  for producing less intervals for example)

Performance comparison

• Compared with ChiMerge, Entropy MDLP, FFD, PD and without discretization
• On average MAD outperforms FFD and PD methods in terms on number of intervals
• Outperforms Entropy MDLP which is one of the most well-known discretization methods in terms of predictive performance
• Outperforms ChiMerge, FFD and PD method in terms of predictive performance
• Runs faster than other discretization methods for two class datasets
• Main drawback in the computation time for multiclass datasets (will investigate faster algo)