Least Squares
Author: Ernad Mujakic
Date: 2025-07-21
The least squares method is an optimization technique that attempts to find a linear function which minimizes the sum of squared distances between observed and predicted values. This method is widely used in Regression analysis for finding the best-fit function between a matrix of features and one or more independent variables.
Method
In a Simple Linear Regression, the model is defined as:
Where:
is the dependent variable. is the slope of the line. is the independent variable. is the y-intercept.
Objective
The objective is to minimize the sum of squared Euclidean distances:
Where:
is the number of data points. are the observed data points.
This is equivalent to min
Steps
- Calculate Slope: Use the following formula to calculate the slope,
: - Calculate Y-Intercept: Use the slope
calculated in the previous step and the following formula to calculate the y-intercept, : - Formulate the Function: Construct the line of best fit in the form of
Limitations
- Sensitive to Outliers:The least squares method is sensitive to outliers, which could unfairly skew the resulting regression line, resulting in inaccurate predictions.
- Assumes Linearity: The least squares method assumes a linear relationship between the dependent and independent variables. If the underlying relationship is non-linear, this can introduce Bias in the predictions.
- Assumes Homoscedasticity: The least squares method assumes that the variance of errors is a constant (homoscedastic), rather than a function of the independent variable (Heteroscedastic).
- Multicollinearity: If the independent variables are highly correlated, the least squares method has difficulty determining the effect of each attribute on the dependent variable, making the model unstable.
Types of Least Squares
The most common form of the least squares method, commonly used in Linear Regression models. It simply minimizes the sum of squared residuals without any weights.
Extends ordinary least squares to handle heteroscedasticity by assigning weights to data objects based on Covariance Matrix of the residuals.
A specific type of generalized least squares, used when the dataset exhibits heteroscedasticity. WLS is more robust because it weighs the influence of each data point based on its variance.
Additional Techniques
Also known as L2 regularization, ridge regression is a type of linear regression that introduces a penalty term to the OLS cost function. Ridge regression adds the squared coefficients as a penalty to the cost function, thereby, punishing large coefficient values. This helps prevent overfitting and addresses any Multicollinearity in the independent variables.
The cost function for ridge regression defined as:
Where:
is the actual output. is the predicted output. are the coefficients is the regularization parameter which determines the magnitude of the penalty term.
By adjusting the ridge parameter, λ, ridge regression allows for control over the bias-variance tradeoff. As
Also known as L1 regularization, lasso regression is a type of linear regression which, like ridge regression, introduces a penalty term to the OLS cost function. Lasso regression adds the absolute coefficient values to the cost function, this allows the coefficients of irrelevant variables to be shrunk down to 0, thereby, simplifying and regularizing the model.
The cost function for lasso regression defined as:
Where:
is the actual output. is the predicted output. are the coefficients is the regularization parameter which determines the magnitude of the penalty term.
Unlike ridge regression, which shrinks coefficients but typically keeps all predictors in the model, lasso regression can set the coefficients of irrelevant variables exactly to zero. This makes it an effective technique for Feature Selection, that is, identifying and retaining only the most important variables.
References
- Prabhu Raghav, “Linear Regression Simplified - Ordinary Least Square vs Gradient Descent,” Medium, May 15, 2018. https://medium.com/data-science/linear-regression-simplified-ordinary-least-square-vs-gradient-descent-48145de2cf76 (accessed Jul. 21, 2025).
- GeeksforGeeks, “Least Square Method | Definition Graph and Formula,” GeeksforGeeks, Jul. 06, 2023. https://www.geeksforgeeks.org/maths/least-square-method/
- “Least squares,” Wikipedia, Dec. 19, 2019. https://en.wikipedia.org/wiki/Least_squares
- A. Menon, “Linear Regression Using Least Squares - TDS Archive - Medium,” Medium, Sep. 08, 2018. https://medium.com/data-science/linear-regression-using-least-squares-a4c3456e8570 (accessed Jul. 21, 2025).
- The Organic Chemistry Tutor, “Linear Regression Using Least Squares Method - Line of Best Fit Equation,” YouTube. Jul. 13, 2020. [YouTube Video]. Available: https://www.youtube.com/watch?v=P8hT5nDai6A
- GeeksforGeeks, “Ridge Regression,” GeeksforGeeks, Jun. 11, 2024. https://www.geeksforgeeks.org/machine-learning/what-is-ridge-regression/ (accessed Jul. 23, 2025).
- GeeksforGeeks, “What is Lasso Regression?,” GeeksforGeeks, May 15, 2024. https://www.geeksforgeeks.org/machine-learning/what-is-lasso-regression/