Python and NumPy implementation of ID3 algorithm for decision tree. This is a vectorized implementation of the Decision tree tutorial code by Google Developers.
Reference: https://github.com/random-forests/tutorials/blob/master/decision_tree.ipynb
- The algorithm expects the first N-1 columns to be features and the last column to be labels.
- The code was written for a subset of the Wine quality dataset.
- Depth : Pre-pruning depth of decision tree.
- min_sample_split: Minimum number of samples to be present at a node for further splits to occur.
- For numerical attributes, the algorithm iterates over all values in a column to find the best value. (Binary split)
- For categorical attributes, each split considers if a particular sample belongs to a particular category or not. (Binary split)