 Many machine learning algorithms require all features to be categorical
 Use discretization to convert the range of continuous feature into intervals
 Dougherty and al (1995) showed that discretization improve the predictive performance and running time of machine learning algorithms
 Need to find the proper cutpoints for each interval
 Propose the Maximum Area under the ROC curvebased Discretization (MAD)
 Good alternative to other discretization methods
 MAD is a supervised, merging , global and statistic discretization method
Discretization types
 Supervised vs unsupervised: Equal width or equal frequency binning do not use class labels of instances and are called unsupervised.
 Splitting vs merging:
 Splitting divides continuous space into a set of smaller intervals by finding proper cut points
> Optimised using entropy

 Merging use each distinct point on the continuous space as an individual candidate and merges similar neighbors to form larger intervals
> Optimised using chisquare
 Global vs local: some methods use subsets of the instance space rather than the whole space and are called local methods
 Dynamic vs static: static discretization discretizes attributes one by one and assume no interrelations between attributes
MAD method
 AUC is equal to the probability that the classifier will rank a randomly chosen positive instance higher that a randomly chosen negative instance
 AUC is the same method used by the Wilcoxon test of ranks (Hanley and Mc Neil – 1982)
 Discretization measure is thus the minimum number of cutpoints that rank positive labeled instances higher than the negative labeled instance
 Calculate ROC using every cut points
 MAD method continues to merge candidate cutpoints to form larger intervals until the maximum AUC is obtained. Compares slopes between every two consecutive ROC points and finds the junction points that cause concavities and eliminates them thus eliminating points which are not on the convex hull
 Other method for calculating the convex Hull is the Quick Hull method, proposed by Preperata and Shamos (1985)
 The maximum AUC is defined by the convex hull formed by the ROC points
 Optimization method of the convex Hull different for twoclass vs multiclass datasets
 Use Provost and Domingos (2001) for multi class
 The MAD method is not designed to minimize the number of intervals but to maximize the AUC (Entropy MDLP is known for producing less intervals for example)
Performance comparison
 Compared with ChiMerge, Entropy MDLP, FFD, PD and without discretization
 On average MAD outperforms FFD and PD methods in terms on number of intervals
 Outperforms Entropy MDLP which is one of the most wellknown discretization methods in terms of predictive performance
 Outperforms ChiMerge, FFD and PD method in terms of predictive performance
 Runs faster than other discretization methods for two class datasets
 Main drawback in the computation time for multiclass datasets (will investigate faster algo)