Follows this blog AI Foundations

Stopping Before It Overfits

Early stopping is a technique used during training where you stop the model once its performance on new validation data starts to get worse. This happens because the model begins to memorize the training data instead of learning general patterns. For example, if the validation loss goes from 0.5 to 0.3 to 0.25 and then increases to 0.27, you stop at 0.25, since anything after that indicates overfitting and poorer performance on unseen data.

Smarter Weight Updates With Adam

Adam is an optimization algorithm that improves how model weights are updated during training. It keeps track of past gradients and adapts the step size for each parameter, making learning faster and more stable. Instead of taking the same size step every time, Adam takes larger steps when it is safe to do so and smaller steps when there is a risk of overshooting.

Forward And Backward Passes

The forward pass is when the model makes a prediction and calculates how far off it is using a loss function. The backward pass then determines how each weight contributed to that error and updates them accordingly. For example, if the model predicts 8 but the correct answer is 10, the backward pass adjusts the weights so that the next prediction moves closer to 10.

Bias, Variance, And Model Complexity

Bias error occurs when a model is too simple and fails to capture important patterns in the data, while variance error occurs when a model is too complex and starts to memorize noise instead of learning general trends. For instance, fitting a straight line to clearly curved data results in high bias, while fitting a very wavy line to simple data results in high variance.

Binary Classification And Logistic Regression

In binary classification, each example belongs to one of two classes, such as “car” or “no car,” and logistic regression is a common model for this setting. It applies the sigmoid function to convert a raw score into a probability between 0 and 1, representing how likely the input belongs to the positive class.

Perceptrons As Building Blocks

A perceptron is the simplest unit in a neural network that takes inputs, applies weights and a bias, and produces an output. It acts as a basic building block for deeper networks, where many perceptrons are stacked and connected to learn complex patterns.

The Sigmoid Activation Function

The sigmoid function is used to map any real-valued number into a range between 0 and 1, which makes it especially useful for representing probabilities in binary classification problems. Large positive inputs are squashed to values close to 1, and large negative inputs are squashed to values close to 0.

Softmax For Many Classes

Softmax regression is used when there are more than two classes and the model needs to output a probability for each possible class. For example, given an image, it might output probabilities like cat = 0.7, dog = 0.2, and bird = 0.1, and classify the image as a cat because that class has the highest probability.

Reading A Confusion Matrix

A confusion matrix is used to evaluate how well a classification model performs by showing how often its predictions are correct or incorrect. It breaks results into four categories—true positives, true negatives, false positives, and false negatives—so you can see not just how often the model is wrong, but what kinds of mistakes it makes.

Accuracy, Precision, And Recall

Accuracy measures the overall performance of a model by calculating the proportion of correct predictions out of all predictions made, and it is often the first metric used for evaluating binary classification models. Precision measures how many of the predicted positive cases are actually correct, while recall measures how many of the actual positive cases the model correctly identifies, helping you understand performance beyond just overall accuracy.