- More discussion here
- The complete discretization techniques

### Main characteristics of discretizers

**Statics vs dynamic**: static is independent of the learner and acts prior to the learning task. Most are static. Dynamic are ID3 discretizers, ITFP**Univariate vs multivariate:**multivariate consider all attributes to define the initial set of cut points (or final ones). Univariate inly work with one attribute at the time**Supervised vs unsupervised:**Supervised consider heuristic measures to determine the best cut points( entropy, interdependence etc..). Most are supervised. Unsupervised discretizers are equal width and equal frequency.**Splitting vs merging:**Splitting methods establish a cut point among all possible boundaries and divide the domain into two intervals. Merging starts with a predefined partition and remove a candidate cut point to mix both intervals**Global vs local:**To make a decision a discretizer can either require all available data in the attribute or use only partial information**Direct vs incremental:**direct discretizers divide the range into k intervals simultaneously. By contrast incremental pass through an improvement process (also called hierarchical discretizers)**Evaluation measure:**metrics used by the discretizer to compare to candidate schemes: (information measure, statistical measure, rough sets, wrappers, binning)**Parametric vs nonparametric:**parametric (ChiMerge, CADD) requires a maximum number of intervals fixed by the user. Non parametric (MDLP, CAIM) computes the minimum number of intervals considering a tradeoff with loss of information**Top down vs bottom up:**Top down starts with an empty discretization and adds a new cut point every time (MDLP). Bottom up starts with all possible cutpoints and merge two intervals (ChiMerge)**Stopping condition**: Must be specified for non parametric approaches (minimum description length measure, confidence thresholds, inconsistency ratios)**Disjoint vs non disjoint**: In a disjoint discretization intervals can not overlap**Ordinal vs nominal:**ordinal discretization transforms quantitative data into ordered qualitative data (non common)

### Comparison criteria

- Number of intervals
- Inconsistency
- Accuracy: Use
**Cohen’ s kappa**which compensates for random hits. Original purpose was to measure the degree of dis/agreement between two people observing the came phenomenon. Less expressive as than ROC curves when applied to binary classification but effective for multiclass problems - Predictive classification rate
- Time

### Results

Can not recommend one best performing method, it depends upon the problem tackled

- FUSINTER, ChiMerge, CAIM and Modified CHi2 offer excellent performances over all
- PKID, FFD are suitable methods for lazy d learning and CACC, Distance and MODL are good choices in the rule induction learning
- FUSINTER, Distance, Chi2, MDLP and UCPD obtain satisfactory tradeoff between the number of intervals and accuracy
- CAIM is one of the simplest discretizer and is pretty effective