Median
Author: Ernad Mujakic
Date: 2025-07-07
The Median is a Measure of Central Tendency that represents the middle value of an ordered dataset. It is particularly useful in scenarios where the data is skewed or contains outliers, as it provides a more accurate representation of the center of the dataset compared to other measures like the Mean. The median can only be applied to Numerical Data and not Categorical Data.
Definition
The median is defined as the 50th Percentile or the second Quartile (Q2), which divides the dataset into two equal halves. This means that half of the data points are below the median and half are above it.
Properties
- Uniqueness: In a finite dataset, the median is unique, meaning there is only one median.
- Robustness: The median is robust to outliers, making it a reliable measure in skewed or noisy datasets.
- Non-Parametric: The median does not assume a particular distribution of the underlying data.
- Invariance: The median remains unchanged under linear transformations of the dataset.
Formula
If
While if
Multivariate Median
The multivariate median extends the concept of the median to multiple dimensions. One of the most common multivariate median is the Geometric Median which focuses on minimizing the Euclidean distance of a set of points in a Euclidean space. The geometric median is defined as:
Where:
denotes the Euclidean distance. represents the -th data point in -dimensional space.
Applications
- Machine Learning: Multivariate medians are commonly employed in machine learning, particularly in clustering algorithms when dealing with centroid initialization, such as in DBSCAN.
- Computer Vision: The geometric median can be used to find a central point among pixel locations, helping in tasks such as object tracking.
- Robust Statistics: Multivariate medians are utilized as a measure of central tendency over other measures such as the mean, due to median's robustness to outliers.
Other common Measures of Central Tendency include the Mean, which represents the average value of a population, the Mode, which identifies the most frequently occurring value in a set, and the Midrange, calculated as the average of the maximum and minimum values.
References
- J. Han, M. Kamber, and J. Pei, Data Mining : Concepts and Techniques. Burlington, Ma: Elsevier, 2012.
- Wikipedia, “Median,” Wikipedia, Apr. 17, 2020. https://en.wikipedia.org/wiki/Median