Quartile

Author: Ernad Mujakic
Date: 2025-07-07


Quartiles are a type of Quantile that divide an ordered dataset into 4 equal parts. The quartiles of a dataset are three values, where each value represents a certain Percentile of the data. Quartiles are a Measure of Dispersion, meaning it assesses the spread of a population.

  • First Quartile (Q1): The 25th percentile, meaning that 25% of the data falls below the first quartile. It can be thought of as the Median of the lower half of the data.
  • Second Quartile (Q2): The 50th percentile, or the median of the dataset. 50% of the data falls below the second quartile.
  • Third Quartile (Q3): The 75th percentile, 75% of the data falls below the third quartile. It can be thought of as the median of the upper half of the dataset.

The five-number summary is a set of five values that describe the distribution of a dataset or population and consists of the following:

  • Minimum: The smallest value in the dataset.
  • First Quartile (Q1): The 25th percentile.
  • Median (Q2): The 50th percentile.
  • Third Quartile (Q3): The 75th percentile.
  • Maximum: The largest value in the dataset.
    The five-number summary can be visualized using a Boxplot and is useful for identifying potential Outliers in a population.

Calculation

The calculate the quartiles of a dataset, follow these steps:

  1. Sort the data, typically in ascending order.
  2. Find the median (Q2), which divides the dataset in half. If there’s an odd number of data points, exclude the median; if even, include it in both halves.
  3. Find the first quartile, which is the median of the lower half of the dataset.
  4. Find the third quartile, which is the median of the upper half of the dataset.

The Interquartile Range (IQR) is a measure of statistical dispersion that represents the range within which the central 50% of the data points lie. It is defined as:

The IQR provides insights on the spread of the data around the median, and is commonly used to identify outliers where values that are below or above are considered outliers.


Boxplot and Outlier Detection

A boxplot, sometimes referred to as a whisker plot, provides a visual summary of a distribution of a dataset. The plot depicts the values of the Five-Number Summary which consists of the median, first and third quartiles, and the maximum and minimum values.

Components

The boxplot consists of:

  • The Box: Which represents the Interquartile Range of the distribution, which can be thought of as the middle 50% of the data.
  • Median Line: The line in the center of the box which represents the second quartile, or the median of the dataset.
  • Whiskers: The lines extending from the box to the value of the minimum or maximum value of the distribution. It is common for the whiskers to be set to the smallest and largest values with 1.5 times the IQR, rather than the true minimum and maximum values of the population.
    Pasted image 20250708093049.png
    https://www.machinelearningplus.com/plots/python-boxplot/

References