Standard Deviation
Author: Ernad Mujakic
Date: 2025-07-10
Standard deviation is a Measure of Dispersion that summarizes the amount variation in a dataset. It represents the average distance between each data point and the Mean of the population.
Interpretation
Properties
- Non-Negative: The standard deviation can never be negative, since it is an average distance measure and distance can also never be negative.
- Sensitive to Outliers: Extreme outliers can have a significant impact on the standard deviation.
- Same Units: The standard deviation is expressed in the same units as the underlying dataset.
Calculation
The standard deviation of a numeric attribute
Population Standard Deviation
For a population, the standard deviation is calculated using:
Where:
represents the mean of the population. represents the number of observations.
Sample Standard Deviation
For a sample, the standard deviation is calculated using:
Where:
represents the mean of the sample. represents the number of observations.
Relation to Variance
The standard deviation is equal to the square root of the Variance of the same dataset. Variance measures the average of the squared differences from the mean, providing insight into the spread of the data.
Other Common Measures of Dispersion
Other common measures of dispersion include:
- Interquartile Range (IQR): which is the distance covered by the middle 50% of the dataset.
- Range: which is the difference between the maximum and minimum values in a dataset
- Quartiles: which are the three values that divide a dataset into four equal parts.
References
- J. Han, M. Kamber, and J. Pei, Data Mining : Concepts and Techniques. Burlington, Ma: Elsevier, 2012.
- GeeksforGeeks, “Standard Deviation Formula, Examples & How to Calculate,” GeeksforGeeks, Jul. 06, 2022. https://www.geeksforgeeks.org/maths/standard-deviation-formula/
- J. Frost, “Standard Deviation: Interpretations and Calculations,” Statistics By Jim, 2024. https://statisticsbyjim.com/basics/standard-deviation/