Mean

Author: Ernad Mujakic
Date: 2025-07-04


The mean is a Measure of Central Tendency which attempts to summarize an entire dataset with a single number. It provides an illustration of the average value within a collection of values, making it essential for data analysis tasks. The mean can only be applied to Numerical Data and not Categorical Data.

There are several types of means, each suited for different applications and fields. The most commonly used type is the arithmetic mean, which is calculated by summing all values and dividing by the number of observations. Other variations include the geometric mean, often used in financial contexts, the harmonic mean, which is beneficial in situations involving rates, and Root-Mean Square, often used to measure the average voltage of an AC source.

Other common Measures of Central Tendency include the Median, which represents the middle value when the data is ordered, the Mode, which identifies the most frequently occurring value in a set, and the Midrange, calculated as the average of the maximum and minimum values.


Types of Means

Arithmetic Mean

The arithmetic mean, commonly referred to as the average, is the sum of all values in a dataset divided by the number of values. It is the most widely used Measure of Central Tendency.

There are two types of arithmetic means:

  1. Sample Mean (): This represents the average value of a subset drawn from a larger population.
  2. Group Mean (): This denotes the average of values within a specific category or attribute of a dataset.

Formula

The formula for calculating the sample mean is:

Where:

  • is the sample mean.
  • is the number of values in the dataset.
  • is the value of an individual object at index i.

Application

The arithmetic mean is commonly used in statistics the summarize datasets and provide a simple Measure of Central Tendency. Though an arithmetic mean is susceptible to outliers, making it a less relevant metric for skewed datasets.

Trimmed Mean / Interquartile Mean

The trimmed mean is an arithmetic mean which discards a specified number of values from both ends of the value range. This is done to minimize the effect of outliers on the mean, which can skew the results. The trimmed mean generally gives a more accurate representation of the center making it particularly useful in datasets prone to outliers or extreme variations.

The Interquartile Mean is a specific type of trimmed mean that excludes the first and last Quartile of ordered data. This results in the average of the middle 50% of values, offering a robust measure of central tendency that is less influenced by extreme values.

Formula

The formula for the trimmed mean is:

The formula for the Interquartile mean is:

Application

The trimmed mean is useful for analyzing skewed datasets, or datasets that have large amount of outliers. The trimmed mean offers a more stable measure that is less affected by outliers, making it valuable across various disciplines.

Weighted Mean

In a weighted mean, instead of each value contributing equally like in an arithmetic mean, each value is assigned a weight (or coefficient) based on its significance. This allows for a more nuanced average that reflects the importance of different contributions. The arithmetic mean is a weighted mean where all weights are equal.

Formula

The formula for calculating the weighted mean is:

Application

The weighted mean is used when there is a need to model the relative importance of various attributes. This is commonly seen in artificial intelligence techniques such as Ensemble Machine Learning algorithms.

The harmonic mean calculates the average of a set of numbers that are defined in relation to some unit. It is calculated by taking the reciprocal of the arithmetic mean of the reciprocals of each value in the dataset.

Formula

The formula for the harmonic mean is:

Application

In Machine Learning, the harmonic mean is commonly used to calculate the F1 Score of a model. It is also commonly used when analyzing speed and rates, such as finding the average speed of multiple segments of a journey.

The geometric mean calculates the average of a set of values by using the product of a values rather than their sum. It involves multiplying all the values in the dataset and then taking the th root of that product, where is the total number of values in the set. The geometric mean is less susceptible to outliers than the arithmetic mean since each value is part of a product rather than a sum.

It is called the geometric mean because it's commonly used to find the side length of a square that has the same area as a rectangle with given side lengths. For example, if you have a rectangle with dimensions , the length of a square with equal volume is the geometric mean of and , which is .

Formula

The formula for the geometric mean is:

Application

The geometric mean is commonly used to calculate average rates of return on investments over time. The geometric mean can also be employed in Data Normalization to normalize features, particularly when the values span multiple orders of magnitude.

The root mean square (RMS), often denoted as , is an average of the magnitude of a set of values, and is and is also known as the quadratic mean. It is particularly useful in sets with values of both positive and negative numbers. RMS is calculated by taking the square root of the arithmetic average of the squared values.

For a continuous function defined over the interval , the RMS is determined by squaring the function, averaging the squared values over the interval, and then taking the square root of that average.

Formula

The formula for discrete RMS is:

The formula for continuous RMS is:

Application

Root-mean square is widely used in signal processing the measure the power of AC currents and voltages. In regression analysis, the RMS error is a common metric for evaluation the performance of a model. RMS is also used to evaluate the performance of control systems regarding the stability and responsiveness to input signals. RMS can also be used in optimization algorithms for neural networks, such as RMSprop.

Expected Value / Mean of a Probability Distribution

The mean of a Probability Distribution represents the average outcome of some Random Variable. It is a specific type of weighted mean where each outcome of some random variable is weighted by the Probability of that outcome occurring.

The Expected Value of a random variable, denoted as , is the weighted average of its possible outcomes. This concept is crucial in various fields, including artificial intelligence, especially in the context of stochastic task environments, where uncertainty plays a significant role.

Formula

For discrete probability distributions, the expected value is defined as:

For continuous probability distributions, the expected value is defined as:

Application

The expected value is a fundamental metric in decision theory for AI systems, guiding the decision-making process by allowing agents to evaluate actions based on their expected rewards. Expected value is also important in game theory for evaluating strategies based on their expected payoffs. In Machine Learning, the expected value is used for supervised techniques, particularly for loss functions.

Mean of a Function

The mean of a continuous function over a specific interval is defined as the integral of the function divided by the length of the interval. Root-Mean Square is a type of mean of a function.

Formula

The formula for the mean of a continuous function defined over the interval is:

Application

The mean of a function has many applications, such as in statistics, where it provides the measure of central tendency for continuous random variables. It is also commonly applied in machine learning when performing feature engineering, such as when normalizing data.


References