Euclidean Distance
Author: Ernad Mujakic
Date: 2025-07-17
The Euclidean distance is the most popular Distance Measure that quantifies the dissimilarity between Numerical Data. It represents the straight-line distance between two points in Euclidean space and is calculated using the Pythagorean Theorem.
Formula
Let
This formula computes the square root of the sum of the squared differences of each corresponding attribute, providing a measure of the straight-line distance in the multidimensional space.
Properties
- Positive: Like any other distance metric, the range of the Euclidean distance is
where a 0 distance indicates that the two points are at the same location. - Symmetric: The Euclidean distance is symmetric, meaning
- Triangle Inequality: The Euclidean distance obeys the triangle inequality, which states that the distance from
to is always less than or equal to the distance from to plus the distance from to . Meaning that taking a detour through a third point cannot result in a shorter distance than a direct path from to .
Squared Euclidean Distance
The squared Euclidean distance is computed as only the sum of squared differences:
The squared Euclidean distance amplifies greater distances more so then the standard Euclidean distance, and is faster and easier to compute. This makes it more desirable in problems where significant distances should be penalized harsher, such as in Clustering or Outlier Detection. Though, the squared Euclidian distance does not obey the triangle inequality.
Squared Euclidean distance is a Convex Function, which makes it more desirable in optimization theory since it permits the use of Convex Analysis.
Least squares is an optimization technique that attempts to find the function which minimizes the sum of the square Euclidean distances between the observed and predicted values. This method is widely used in Machine Learning, particularly in Regression analysis.
Divergence is a kind of distance measure that applies to probability distributions. The squared Euclidean distance is the simplest divergence measure.
Other common divergence measures include Kullback–Leibler Divergence and Jensen-Shannon Divergence.
Applications
Machine Learning
- Clustering: Euclidean distance is used in clustering algorithms such as K-Means Clustering to measure distance between data points.
- Classification: Classification algorithms like K-Nearest-Neighbors may utilize Euclidean distance to classify data points based on the label of their nearest neighbors.
Statistics
- Outlier Detection: The Euclidean or squared Euclidean distance can be used to identify data points which deviate significantly from the rest of the dataset.
- Multivariate Analysis: Euclidean distance can be used in techniques like Principal Component Analysis or Multidimensional Scaling to measure the distance between points with multiple dimensions.
- Least Squares Method: The Euclidean distance is a key step for the method of least squares, which is commonly used to optimize regression problems.
- Divergence: Squared Euclidean distance is a simple measure of divergence, allowing you to compare probability distributions.
- Optimization: Squared Euclidean distance is preferred in optimization theory due to its smoothness and convexity, permitting the use of convex analysis.
Robotics
- Path Planning: Euclidean distance can be used for calculating the shortest path between points. It is also an Admissible and Consistent Heuristic in search algorithms such as A* Search.
- Localization: in Simultaneous Localization and Mapping problems, the Euclidean distance can be used to determine a robot's position relative to known landmarks.
- Image Recognition: Euclidean distance is used to compare feature vectors in images, aiding in object recognition.
- Image Segmentation: Euclidean distance is used to measure similarities between pixel values in clustering-based image segmentation.
Other Distance Measures
Other common distance measures include:
- Chebyshev Distance: The maximum absolute difference between 2 vectors across all dimensions.
- Minkowski Distance: A generalized distance measure that is defined by a parameter
whose common values are the Manhattan distance, Euclidean distance, and Chebyshev distance. - Manhattan Distance: The shortest distance between 2 vectors using only 90° movements.
- Jaccard Index: Used to compare sets and is defined as the size of the intersection of 2 sets, over the size of their union.
References
- J. Han and M. Kamber, Data Mining : Concepts and Techniques, 3rd ed. Amsterdam ; Boston: Elsevier/Morgan Kaufmann, 2012.
- GeeksforGeeks, “Euclidean Distance,” GeeksforGeeks, Mar. 13, 2024. https://www.geeksforgeeks.org/maths/euclidean-distance/
- Wikipedia Contributors, “Euclidean distance,” Wikipedia, Apr. 01, 2019. https://en.wikipedia.org/wiki/Euclidean_distance
- “Least squares,” Wikipedia, Dec. 19, 2019. https://en.wikipedia.org/wiki/Least_squares
- Maarten Grootendorst, “9 Distance Measures in Data Science | TDS Archive,” Medium, Feb. 2021. https://medium.com/data-science/9-distance-measures-in-data-science-918109d069fa