Mean
Author: Ernad Mujakic
Date: 2025-07-04
The mean is a Measure of Central Tendency which attempts to summarize an entire dataset with a single number. It provides an illustration of the average value within a collection of values, making it essential for data analysis tasks. The mean can only be applied to Numerical Data and not Categorical Data.
There are several types of means, each suited for different applications and fields. The most commonly used type is the arithmetic mean, which is calculated by summing all values and dividing by the number of observations. Other variations include the geometric mean, often used in financial contexts, the harmonic mean, which is beneficial in situations involving rates, and Root-Mean Square, often used to measure the average voltage of an AC source.
Other common Measures of Central Tendency include the Median, which represents the middle value when the data is ordered, the Mode, which identifies the most frequently occurring value in a set, and the Midrange, calculated as the average of the maximum and minimum values.
Types of Means
Arithmetic Mean
The arithmetic mean, commonly referred to as the average, is the sum of all values in a dataset divided by the number of values. It is the most widely used Measure of Central Tendency.
There are two types of arithmetic means:
- Sample Mean (
): This represents the average value of a subset drawn from a larger population. - Group Mean (
): This denotes the average of values within a specific category or attribute of a dataset.
Formula
The formula for calculating the sample mean is:
Where:
is the sample mean. is the number of values in the dataset. is the value of an individual object at index i.
Application
The arithmetic mean is commonly used in statistics the summarize datasets and provide a simple Measure of Central Tendency. Though an arithmetic mean is susceptible to outliers, making it a less relevant metric for skewed datasets.
Trimmed Mean / Interquartile Mean
The trimmed mean is an arithmetic mean which discards a specified number of values from both ends of the value range. This is done to minimize the effect of outliers on the mean, which can skew the results. The trimmed mean generally gives a more accurate representation of the center making it particularly useful in datasets prone to outliers or extreme variations.
The Interquartile Mean is a specific type of trimmed mean that excludes the first and last Quartile of ordered data. This results in the average of the middle 50% of values, offering a robust measure of central tendency that is less influenced by extreme values.
Formula
The formula for the trimmed mean is:
The formula for the Interquartile mean is:
Application
The trimmed mean is useful for analyzing skewed datasets, or datasets that have large amount of outliers. The trimmed mean offers a more stable measure that is less affected by outliers, making it valuable across various disciplines.
Weighted Mean
In a weighted mean, instead of each value contributing equally like in an arithmetic mean, each value is assigned a weight (or coefficient) based on its significance. This allows for a more nuanced average that reflects the importance of different contributions. The arithmetic mean is a weighted mean where all weights are equal.
Formula
The formula for calculating the weighted mean is:
Application
The weighted mean is used when there is a need to model the relative importance of various attributes. This is commonly seen in artificial intelligence techniques such as Ensemble Machine Learning algorithms.
The harmonic mean calculates the average of a set of numbers that are defined in relation to some unit. It is calculated by taking the reciprocal of the arithmetic mean of the reciprocals of each value in the dataset.
Formula
The formula for the harmonic mean is:
Application
In Machine Learning, the harmonic mean is commonly used to calculate the F1 Score of a model. It is also commonly used when analyzing speed and rates, such as finding the average speed of multiple segments of a journey.
The geometric mean calculates the average of a set of values by using the product of a values rather than their sum. It involves multiplying all the values in the dataset and then taking the
It is called the geometric mean because it's commonly used to find the side length of a square that has the same area as a rectangle with given side lengths. For example, if you have a rectangle with dimensions
Formula
The formula for the geometric mean is:
Application
The geometric mean is commonly used to calculate average rates of return on investments over time. The geometric mean can also be employed in Data Normalization to normalize features, particularly when the values span multiple orders of magnitude.
The root mean square (RMS), often denoted as
For a continuous function
Formula
The formula for discrete RMS is:
The formula for continuous RMS is:
Application
Root-mean square is widely used in signal processing the measure the power of AC currents and voltages. In regression analysis, the RMS error is a common metric for evaluation the performance of a model. RMS is also used to evaluate the performance of control systems regarding the stability and responsiveness to input signals. RMS can also be used in optimization algorithms for neural networks, such as RMSprop.
Expected Value / Mean of a Probability Distribution
The mean of a Probability Distribution represents the average outcome of some Random Variable. It is a specific type of weighted mean where each outcome of some random variable
The Expected Value of a random variable, denoted as
Formula
For discrete probability distributions, the expected value is defined as:
Where
For continuous probability distributions, the expected value is defined as:
Where
Application
The expected value is a fundamental metric in decision theory for AI systems, guiding the decision-making process by allowing agents to evaluate actions based on their expected rewards. Expected value is also important in game theory for evaluating strategies based on their expected payoffs. In Machine Learning, the expected value is used for supervised techniques, particularly for loss functions.
Mean of a Function
The mean of a continuous function over a specific interval is defined as the integral of the function divided by the length of the interval. Root-Mean Square is a type of mean of a function.
Formula
The formula for the mean of a continuous function
Application
The mean of a function has many applications, such as in statistics, where it provides the measure of central tendency for continuous random variables. It is also commonly applied in machine learning when performing feature engineering, such as when normalizing data.
References
- S. Glen, “Mean, Median, Mode: What They Are, How to Find Them,” Statistics How To, 2022. https://www.statisticshowto.com/probability-and-statistics/statistics-definitions/mean-median-mode/
- Wikipedia contributors, “Mean,” Wikipedia, Apr. 25, 2025. https://en.wikipedia.org/wiki/Mean
- J. Frost, “What is the Mean in Statistics?,” Statistics By Jim, Aug. 21, 2021. https://statisticsbyjim.com/basics/mean_average/
- J. Han and M. Kamber, Data Mining : Concepts and Techniques, 3rd ed. Amsterdam ; Boston: Elsevier/Morgan Kaufmann, 2012.
- “Expectation | Mean | Average,” www.probabilitycourse.com. https://www.probabilitycourse.com/chapter3/3_2_2_expectation.php