A survey of discretization techniques: taxonomy and empirical analysis in supervised learning

Original by S. Garcia, J. Luengo, J. A. Sáez, V. López, F. Herrera, IEEE Computer Society, 2013, 16 pages

This summary note was Posted on

Main characteristics of discretizers

• Statics vs dynamic: static is independent of the learner and acts prior to the learning task. Most are static. Dynamic are ID3 discretizers, ITFP
• Univariate vs multivariate: multivariate consider all attributes to define the initial set of cut points (or final ones). Univariate inly work with one attribute at the time
• Supervised vs unsupervised: Supervised consider heuristic measures to determine the best cut points( entropy, interdependence etc..). Most are supervised. Unsupervised discretizers are equal width and equal frequency.
• Splitting vs merging: Splitting methods establish a cut point among all possible boundaries and divide the domain into two intervals. Merging starts with a predefined partition and remove a candidate cut point to mix both intervals
• Global vs local:  To make a decision a discretizer can either require all available data in the attribute or use only partial information
• Direct vs incremental: direct discretizers divide the range into k intervals simultaneously. By contrast incremental pass through an improvement process (also called hierarchical discretizers)
• Evaluation measure: metrics used by the discretizer to compare to candidate schemes: (information measure, statistical measure, rough sets, wrappers, binning)
• Parametric vs nonparametric: parametric (ChiMerge, CADD) requires a  maximum number of intervals fixed by the user. Non parametric (MDLP, CAIM) computes the minimum number of intervals considering a tradeoff with loss of information
• Top down vs bottom up: Top down starts with an empty discretization and adds a new cut point every time (MDLP). Bottom up starts with all possible cutpoints and merge two intervals (ChiMerge)
• Stopping condition: Must be specified for non parametric approaches (minimum description length measure, confidence thresholds, inconsistency ratios)
• Disjoint vs non disjoint: In a disjoint discretization intervals can not overlap
• Ordinal vs nominal: ordinal discretization transforms quantitative data into ordered qualitative data (non common)

Comparison criteria

• Number of intervals
• Inconsistency
• Accuracy: Use Cohen’ s kappa which compensates for random hits. Original purpose was to measure the degree of dis/agreement between two people observing the came phenomenon. Less expressive as than ROC curves when applied to binary classification but effective for multiclass problems
• Predictive classification rate
• Time

Results

Can not recommend one best performing method, it depends upon the problem tackled

• FUSINTER, ChiMerge, CAIM and Modified CHi2 offer excellent performances over all
• PKID, FFD are suitable methods for lazy d learning and CACC, Distance and MODL are good choices in the rule induction learning
• FUSINTER, Distance, Chi2, MDLP and UCPD obtain satisfactory tradeoff between the number of intervals and accuracy
• CAIM is one of the simplest discretizer and is pretty effective