Neural Networks and Language Models
I've started a series on this blog covering the basics of neural networks and language models. Each of the relevant posts is accompanied by source code on GitHub.
Neural Networks - Autograd
- A Scalar-Valued Autograd Engine - We build the core of an automatic differentiation engine with support for addition and multiplication. To get there, we review some calculus basics, compute some derivatives, and fall in love with the chain rule.
- More Autograd: Arithmetic Operations, Nonlinearities, and a Simple Neural Network Library - We extend the autograd engine to support additional arithmetic operations and nonlinear functions. We then build a simple neural network library on top of the engine and use it to train a "real" machine learning model.
Language Models
- A Statistical Character Bigram Language Model - We build the simplest possible language model with statistical "learning" and character bigrams. We also develop an intuition for loss functions and implement one to evaluate the quality of our model.
- N-Grams and Other Experiments - We generalize the statistical bigram language model from the previous post to support arbitrary n-grams. We also implement a simple version of hyperparameter tuning for our statistical models.