Least Squares

Author: Ernad Mujakic
Date: 2025-07-21


The least squares method is an optimization technique that attempts to find a linear function which minimizes the sum of squared distances between observed and predicted values. This method is widely used in Regression analysis for finding the best-fit function between a matrix of features and one or more independent variables.

linearRegression.png


Method

In a Simple Linear Regression, the model is defined as:

Where:

  • is the dependent variable.
  • is the slope of the line.
  • is the independent variable.
  • is the y-intercept.

Objective

The objective is to minimize the sum of squared Euclidean distances:

Where:

  • is the number of data points.
  • are the observed data points.

This is equivalent to min

Steps

  • Calculate Slope: Use the following formula to calculate the slope, :
  • Calculate Y-Intercept: Use the slope calculated in the previous step and the following formula to calculate the y-intercept, :
  • Formulate the Function: Construct the line of best fit in the form of

Limitations

  • Sensitive to Outliers:The least squares method is sensitive to outliers, which could unfairly skew the resulting regression line, resulting in inaccurate predictions.
  • Assumes Linearity: The least squares method assumes a linear relationship between the dependent and independent variables. If the underlying relationship is non-linear, this can introduce Bias in the predictions.
  • Assumes Homoscedasticity: The least squares method assumes that the variance of errors is a constant (homoscedastic), rather than a function of the independent variable (Heteroscedastic).
  • Multicollinearity: If the independent variables are highly correlated, the least squares method has difficulty determining the effect of each attribute on the dependent variable, making the model unstable.

Types of Least Squares

The most common form of the least squares method, commonly used in Linear Regression models. It simply minimizes the sum of squared residuals without any weights.

Extends ordinary least squares to handle heteroscedasticity by assigning weights to data objects based on Covariance Matrix of the residuals.

A specific type of generalized least squares, used when the dataset exhibits heteroscedasticity. WLS is more robust because it weighs the influence of each data point based on its variance.


Additional Techniques

Also known as L2 regularization, ridge regression is a type of linear regression that introduces a penalty term to the OLS cost function. Ridge regression adds the squared coefficients as a penalty to the cost function, thereby, punishing large coefficient values. This helps prevent overfitting and addresses any Multicollinearity in the independent variables.
The cost function for ridge regression defined as:

Where:

  • is the actual output.
  • is the predicted output.
  • are the coefficients
  • is the regularization parameter which determines the magnitude of the penalty term.

By adjusting the ridge parameter, λ, ridge regression allows for control over the bias-variance tradeoff. As increases, the bias of the model increases and the variance decreases. Optimizing this parameter can achieve the ideal balance between overfitting and underfitting model.

Also known as L1 regularization, lasso regression is a type of linear regression which, like ridge regression, introduces a penalty term to the OLS cost function. Lasso regression adds the absolute coefficient values to the cost function, this allows the coefficients of irrelevant variables to be shrunk down to 0, thereby, simplifying and regularizing the model.
The cost function for lasso regression defined as:

Where:

  • is the actual output.
  • is the predicted output.
  • are the coefficients
  • is the regularization parameter which determines the magnitude of the penalty term.

Unlike ridge regression, which shrinks coefficients but typically keeps all predictors in the model, lasso regression can set the coefficients of irrelevant variables exactly to zero. This makes it an effective technique for Feature Selection, that is, identifying and retaining only the most important variables.


References