Probability
Author: Ernad Mujakic
Date: 2025-07-02
Probability is a branch of mathematics that deals with the analysis of events and their likelihood of occurring. The probability of an event, written
Basic Definitions
Random Experiment - any process or action that yields uncertain outcomes.
Sample Space - commonly denoted as
Event - a subset of the sample space, representing specific outcomes.
Power Set: The set of all possible subsets of a sample space, including the empty set
The probability of an event is calculated by dividing the number of favorable outcomes by the total size of the sample space:
Mathematics
Joint Probability
Joint probability is the probability of 2 events happening simultaneously. If 2 events are independent, then their joint probability is:
If 2 events are mutually exclusive, then the probability of one or the other occurring is equal to the sum of their probabilities:
If 2 events aren't necessarily mutually exclusive, then you simply subtract their intersection in order to prevent counting the values in their intersection twice:
Conditional Probabilities
The probability of one event occurring, given that another has already occurred, denoted as
Bayes' Theorem states that:
Axioms of Probability
Kolmogorov's Axioms
- Non-Negativity: The probability of an event can never be negative:
- Normalization: The probability of an entire sample space is always 1:
- Countable Additivity: The probability of any countable sequence of disjoint (mutually exclusive) events,
is equal to the sum of the probabilities of the individual events:
Properties of Probability
- Complement Rule: The complement of
is denoted as or and is equal to:
- Mutual Exclusivity: If events A and B are mutually exclusive, that is they can never occur simultaneously, then:
- Empty Set: The probability of the empty set is always 0:
- Law of Total Probability: If events
are mutually exclusive and form a partition of the sample space, and is any event, the law of total probability states:
- Central Limit Theorem: States that if we take random samples from any population, the Mean of those samples will form a normal distribution as the sample size gets sufficiently large.
- Law of Large Numbers: States that the mean of the outcomes obtained from a large number of independent samples will converge to the expected value of the underlying probability distribution.
The factorial of a non-negative integer
A permutation of a set is a possible arrangement of it's elements, where the order of the elements matters. The number of possible permutations of
A combination of a set is a selection of it's elements where the order does not matter. The number of combinations of
A random variable is a formalization that assigns a numerical value to the outcome of a random event or experiment. Random variables can be classified into two main types:
- Discrete random variables take on a countable number of distinct values.
- Continuous random variables take on a infinite number of values within a range.
Random variables are typically denoted by capital letters. For example, the set
A probability distribution is a mathematical function which describes the likelihood of different outcomes for the domain of a random variable. The probability of each outcome is between 0 and 1 (inclusive), and the sum of probabilities of each outcome must sum to 1.
Discrete probability distributions describe the probability of each possible value in the domain of a discrete random variable.
The Probability Mass Function (PMF) gives the probability of each possible value of a random variable
The sum of all outcomes must sum to 1:
Common Discrete Distributions:
- Binomial Distribution - Models the number of successes in a fixed amount of independent Bernoulli trials.
- Bernoulli Distribution - Models the distribution of a random variable with two possible outcomes.
- Poisson Distribution - Models the probability of a number of events occurring in a fixed interval given a constant mean rate
.
Continuous probability distributions describe the probability of a continuous random variable. These distributions are characterized by a Probability Density Function (PDF).
The probability density function describes the likelihood of a random variable taking on a value in a specific range:
where
Common Continuous Distributions:
- Normal Distribution - Also called the Gaussian distribution, represents a bell curve that is symmetric around the mean.
- Exponential Distribution - Skewed to the right, often used to model the time or space between events in a Poisson Process.
- Uniform Distribution - Rectangle shape, indicating that all outcomes are equally likely.
The expected value, or mean, of a random variable
Discrete Distributions
For discrete probability distributions, the expected value is defined as:
Where
Continuous Distributions
For continuous probability distributions, the expected value is defined as:
Where
Applications
Probability is the foundation of many Machine Learning algorithms such as:
- Naïve Bayes Classifier - a classification model based on Bayes' theorem with strong assumptions about feature independence.
- Hidden Markov Model - A statistical model that represents a system which is assumed to be a Markov Process.
Decision theory utilizes probability to make rational decisions under uncertainty. Relevant subject include:
- Utility Function - Assigning subjective desirability to outcomes.
- Markov Decision Process - A framework for modeling sequential decision-making with stochastic outcomes.
A Bayesian Network is a probabilistic model implemented as a directed acyclic graph that represents a set of random variables and their conditional dependencies, as well as a set of conditional probability distribution tables.
Game theory is a mathematical framework for modelling strategic interactions in a multi-agent interdependent environment. Probability is crucial for game theory, where agents face uncertainty over the actions of other players as well as the state of the game.
Advanced Topics
A Markov process is a mathematical model which models a stochastic environment that satisfies the Markov Property. The Markov property states that the future state depends only on the present state and not past states. Markov processes have transition probabilities between states, quantifying the chance of moving from one state to another.
Bayesian statistics is subfield of Statistics that uses Bayes' theorem to update the probability of a hypothesis as new evidence becomes available. Bayesian methods are widely used in classification, regression, and decision-making algorithms.
References
- Peter. R. Norvig, Artificial Intelligence: A Modern Approach, Global Edition. 2021.
- Wikipedia contributors, “Probability theory,” Wikipedia, Apr. 23, 2025. https://en.wikipedia.org/wiki/Probability_theory
- Wikipedia Contributors, “Probability axioms,” Wikipedia, Dec. 05, 2019. https://en.wikipedia.org/wiki/Probability_axioms
- J. Soch, “Kolmogorov axioms of probability,” The Book of Statistical Proofs, Jul. 30, 2021. https://statproofbook.github.io/D/prob-ax.html (accessed Mar. 16, 2025)
- Wikipedia Contributors, “Factorial,” Wikipedia, Oct. 18, 2019. https://en.wikipedia.org/wiki/Factorial
- Wikipedia Contributors, “Permutation,” Wikipedia, Sep. 22, 2019. https://en.wikipedia.org/wiki/Permutation
- “Bayesian statistics,” Wikipedia, May 29, 2020. https://en.wikipedia.org/wiki/Bayesian_statistics