tags:
  - math
  - statistics
  - machineLearning
  - regression
date: 2025-07-21
aliases:

Least Squares

Author: Ernad Mujakic
Date: 2025-07-21

The least squares method is an optimization technique that attempts to find a linear function which minimizes the sum of squared distances between observed and predicted values. This method is widely used in Regression analysis for finding the best-fit function between a matrix of features and one or more independent variables.

Method

In a Simple Linear Regression, the model is defined as:

Where:

is the dependent variable.
is the slope of the line.
is the independent variable.
is the y-intercept.

Objective

The objective is to minimize the sum of squared Euclidean distances:

Where:

is the number of data points.
are the observed data points.

This is equivalent to min

Steps

Calculate Slope: Use the following formula to calculate the slope, :
Calculate Y-Intercept: Use the slope calculated in the previous step and the following formula to calculate the y-intercept, :
Formulate the Function: Construct the line of best fit in the form of

Limitations

Sensitive to Outliers:The least squares method is sensitive to outliers, which could unfairly skew the resulting regression line, resulting in inaccurate predictions.
Assumes Linearity: The least squares method assumes a linear relationship between the dependent and independent variables. If the underlying relationship is non-linear, this can introduce Bias in the predictions.
Assumes Homoscedasticity: The least squares method assumes that the variance of errors is a constant (homoscedastic), rather than a function of the independent variable (Heteroscedastic).
Multicollinearity: If the independent variables are highly correlated, the least squares method has difficulty determining the effect of each attribute on the dependent variable, making the model unstable.

Types of Least Squares

Ordinary Least Squares

The most common form of the least squares method, commonly used in Linear Regression models. It simply minimizes the sum of squared residuals without any weights.

Generalized Least Squares

Extends ordinary least squares to handle heteroscedasticity by assigning weights to data objects based on Covariance Matrix of the residuals.

Weighted Least Squares

A specific type of generalized least squares, used when the dataset exhibits heteroscedasticity. WLS is more robust because it weighs the influence of each data point based on its variance.

Additional Techniques

Ridge Regression

Also known as L2 regularization, ridge regression is a type of linear regression that introduces a penalty term to the OLS cost function. Ridge regression adds the squared coefficients as a penalty to the cost function, thereby, punishing large coefficient values. This helps prevent overfitting and addresses any Multicollinearity in the independent variables.
The cost function for ridge regression defined as:

Where:

is the actual output.
is the predicted output.
are the coefficients
is the regularization parameter which determines the magnitude of the penalty term.

By adjusting the ridge parameter, λ, ridge regression allows for control over the bias-variance tradeoff. As increases, the bias of the model increases and the variance decreases. Optimizing this parameter can achieve the ideal balance between overfitting and underfitting model.

Lasso Regression

Also known as L1 regularization, lasso regression is a type of linear regression which, like ridge regression, introduces a penalty term to the OLS cost function. Lasso regression adds the absolute coefficient values to the cost function, this allows the coefficients of irrelevant variables to be shrunk down to 0, thereby, simplifying and regularizing the model.
The cost function for lasso regression defined as:

Where:

is the actual output.
is the predicted output.
are the coefficients
is the regularization parameter which determines the magnitude of the penalty term.

Unlike ridge regression, which shrinks coefficients but typically keeps all predictors in the model, lasso regression can set the coefficients of irrelevant variables exactly to zero. This makes it an effective technique for Feature Selection, that is, identifying and retaining only the most important variables.

References

Prabhu Raghav, “Linear Regression Simplified - Ordinary Least Square vs Gradient Descent,” Medium, May 15, 2018. https://medium.com/data-science/linear-regression-simplified-ordinary-least-square-vs-gradient-descent-48145de2cf76 (accessed Jul. 21, 2025).
GeeksforGeeks, “Least Square Method | Definition Graph and Formula,” GeeksforGeeks, Jul. 06, 2023. https://www.geeksforgeeks.org/maths/least-square-method/
“Least squares,” Wikipedia, Dec. 19, 2019. https://en.wikipedia.org/wiki/Least_squares
A. Menon, “Linear Regression Using Least Squares - TDS Archive - Medium,” Medium, Sep. 08, 2018. https://medium.com/data-science/linear-regression-using-least-squares-a4c3456e8570 (accessed Jul. 21, 2025).
The Organic Chemistry Tutor, “Linear Regression Using Least Squares Method - Line of Best Fit Equation,” YouTube. Jul. 13, 2020. [YouTube Video]. Available: https://www.youtube.com/watch?v=P8hT5nDai6A
GeeksforGeeks, “Ridge Regression,” GeeksforGeeks, Jun. 11, 2024. https://www.geeksforgeeks.org/machine-learning/what-is-ridge-regression/ (accessed Jul. 23, 2025).
GeeksforGeeks, “What is Lasso Regression?,” GeeksforGeeks, May 15, 2024. https://www.geeksforgeeks.org/machine-learning/what-is-lasso-regression/