Post

Decision Trees and k-NN on Synthetic Data

Entropy-based feature selection and learning curve analysis with custom 2D data distributions.

Decision Trees and k-NN on Synthetic Data

Summary



  • This project begins with a manual implementation of a decision tree classifier, using entropy-based uncertainty to select features and construct a shallow binary tree.

  • The second half focuses on generating structured synthetic data from a probabilistic distribution and evaluating the k-Nearest Neighbors classifier.

  • It explores the effect of training size and the hyperparameter k on prediction accuracy, providing a hands-on study of interpretable and non-parametric models in small-scale learning tasks.

This post is licensed under CC BY 4.0 by the author.