Classification

1. Classification: Basic Concepts

Classification is a supervised learning technique where a model predicts categorical class labels based on input features.

Key Terms

Classifier: Algorithm that maps input data to a category.
Training Data: Labeled dataset used to train the model.
Testing Data: Unseen data used to evaluate model performance.
Features (Attributes): Input variables used for prediction.
Class Label: Output category to be predicted.

Evaluation Metrics

Accuracy: (TP + TN) / Total predictions
Precision: TP / (TP + FP) (How many selected items are relevant?)
Recall (Sensitivity): TP / (TP + FN) (How many relevant items are selected?)
F1-Score: Harmonic mean of Precision & Recall
Confusion Matrix: Tabulates TP, TN, FP, FN.

2. Decision Tree Induction

A decision tree is a flowchart-like structure where:

Internal nodes = Feature tests
Branches = Outcomes of tests
Leaf nodes = Class labels

Tree Construction (ID3, C4.5, CART)

Select the best attribute to split data (using a splitting criterion).
Partition data into subsets based on attribute values.
Repeat recursively until:
- All samples belong to one class, or
- No remaining features, or
- Tree reaches max depth.

Splitting Criteria

Information Gain (ID3)
( IG(D, A) = H(D) - H(D|A) )
(Maximize gain; biased towards high-cardinality features.)
Gain Ratio (C4.5)
( GR(D, A) = \frac{IG(D, A)}{SplitInfo(A)} )
(Normalizes IG to reduce bias.)
Gini Index (CART)
( Gini(D) = 1 - \sum (p_i)^2 )
(Measures impurity; prefers larger partitions.)

Advantages & Disadvantages

Pros	Cons
Easy to interpret	Prone to overfitting
Handles non-linear data	Unstable (small changes → new tree)
No need for feature scaling	Biased if classes are imbalanced

3. Bayesian Classification

Bayesian methods use probability to predict class membership.

Naïve Bayes Classifier

Assumes features are independent given the class (naïve assumption).
Uses Bayes’ Theorem:
( P(Y|X) = \frac{P(X|Y) \cdot P(Y)}{P(X)} )
- ( P(Y|X) ): Posterior probability (class given features).
- ( P(X|Y) ): Likelihood (feature distribution per class).
- ( P(Y) ): Prior probability of class.

Types of Naïve Bayes

Gaussian NB: Assumes continuous features follow a normal distribution.
Multinomial NB: For discrete counts (e.g., text classification).
Bernoulli NB: Binary features (e.g., word presence/absence).

Pros & Cons

Pros	Cons
Fast & scalable	Naïve assumption (feature independence)
Works well with high dimensions	Struggles if dependencies exist

4. Bayesian Belief Networks (BBNs)

Also called Bayesian Networks or Probabilistic Graphical Models.
Represents dependencies between variables via a Directed Acyclic Graph (DAG).

Key Components

Nodes: Random variables (features or class).
Edges: Conditional dependencies.
Conditional Probability Tables (CPTs): Quantify relationships.

Example

Medical Diagnosis:
- Nodes: Smoking, Cancer, Cough
- Edges: Smoking → Cancer → Cough
- CPT: ( P(Cough | Cancer) ), ( P(Cancer | Smoking) ).

Inference in BBNs

Compute posterior probabilities given evidence (e.g., ( P(Cancer | Cough = True) )).
Algorithms: Variable Elimination, Markov Chain Monte Carlo (MCMC).

Advantages & Disadvantages

Pros	Cons
Handles dependencies	Complex to construct
Interpretable structure	Computationally expensive for large networks
Incorporates prior knowledge	Requires probability estimations

Summary Table

Method	Key Idea	When to Use
Decision Trees	Split data via feature tests	Need interpretability, non-linear data
Naïve Bayes	Probabilistic, independence assumption	Text classification, high dimensions
Bayesian Networks	Models dependencies via DAG	Domain knowledge available, dependencies matter