Standard Deviation

Author: Ernad Mujakic
Date: 2025-07-10


Standard deviation is a Measure of Dispersion that summarizes the amount variation in a dataset. It represents the average distance between each data point and the Mean of the population.

Interpretation

  • Low Standard Deviation: Indicates that the data points tend to be very close to the mean, suggesting consistency and reliability in the dataset.

  • High Standard Deviation: Implies that data points are more spread out across the range, indicating greater variability and less predictability.

HighVsLowSTD.png

Properties

  • Non-Negative: The standard deviation can never be negative, since it is an average distance measure and distance can also never be negative.
  • Sensitive to Outliers: Extreme outliers can have a significant impact on the standard deviation.
  • Same Units: The standard deviation is expressed in the same units as the underlying dataset.

Calculation

The standard deviation of a numeric attribute , denoted with , is defined as:

Population Standard Deviation

For a population, the standard deviation is calculated using:

Where:

  • represents the mean of the population.
  • represents the number of observations.

Sample Standard Deviation

For a sample, the standard deviation is calculated using:

Where:

  • represents the mean of the sample.
  • represents the number of observations.

Relation to Variance

The standard deviation is equal to the square root of the Variance of the same dataset. Variance measures the average of the squared differences from the mean, providing insight into the spread of the data.


Other Common Measures of Dispersion

Other common measures of dispersion include:

  • Interquartile Range (IQR): which is the distance covered by the middle 50% of the dataset.
  • Range: which is the difference between the maximum and minimum values in a dataset
  • Quartiles: which are the three values that divide a dataset into four equal parts.

References