Decision Trees and k-NN on Synthetic Data

Entropy-based feature selection and learning curve analysis with custom 2D data distributions.

Posted Sep 17, 2024 Updated Jul 23, 2025

By Shawn Kim

1 min read

Summary

This project begins with a manual implementation of a decision tree classifier, using entropy-based uncertainty to select features and construct a shallow binary tree.
The second half focuses on generating structured synthetic data from a probabilistic distribution and evaluating the k-Nearest Neighbors classifier.
It explores the effect of training size and the hyperparameter k on prediction accuracy, providing a hands-on study of interpretable and non-parametric models in small-scale learning tasks.

This post is licensed under CC BY 4.0 by the author.