tags:
  - AI
  - math
  - "#statistics"
  - probability
date: 2025-07-02
aliases:
  - Probabilities

Probability

Author: Ernad Mujakic
Date: 2025-07-02

Probability is a branch of mathematics that deals with the analysis of events and their likelihood of occurring. The probability of an event, written is a number between 0 and 1, with 0 representing impossible and 1 representing certainty. Probability theory is a framework for making inferences of events that have elements of uncertainty or randomness within them or their outcomes.

Basic Definitions

Random Experiment - any process or action that yields uncertain outcomes.

Sample Space - commonly denoted as , is the set of all possible outcomes of a random experiment.

Event - a subset of the sample space, representing specific outcomes.

Power Set: The set of all possible subsets of a sample space, including the empty set or .

The probability of an event is calculated by dividing the number of favorable outcomes by the total size of the sample space:

Mathematics

Joint Probability

Joint probability is the probability of 2 events happening simultaneously. If 2 events are independent, then their joint probability is:

If 2 events are mutually exclusive, then the probability of one or the other occurring is equal to the sum of their probabilities:

If 2 events aren't necessarily mutually exclusive, then you simply subtract their intersection in order to prevent counting the values in their intersection twice:

Conditional Probabilities

The probability of one event occurring, given that another has already occurred, denoted as (the probability of A given B), is:

Bayes' Theorem

Bayes' Theorem states that:

Axioms of Probability

Kolmogorov's Axioms

Non-Negativity: The probability of an event can never be negative:
Normalization: The probability of an entire sample space is always 1:
Countable Additivity: The probability of any countable sequence of disjoint (mutually exclusive) events, is equal to the sum of the probabilities of the individual events:

Properties of Probability

Complement Rule: The complement of is denoted as or and is equal to:
Mutual Exclusivity: If events A and B are mutually exclusive, that is they can never occur simultaneously, then:
Empty Set: The probability of the empty set is always 0:
Law of Total Probability: If events are mutually exclusive and form a partition of the sample space, and is any event, the law of total probability states:
Central Limit Theorem: States that if we take random samples from any population, the Mean of those samples will form a normal distribution as the sample size gets sufficiently large.
Law of Large Numbers: States that the mean of the outcomes obtained from a large number of independent samples will converge to the expected value of the underlying probability distribution.

Counting Techniques

Factorial

The factorial of a non-negative integer , denoted , is the product of all positive integers less than or equal to :

Permutation

A permutation of a set is a possible arrangement of it's elements, where the order of the elements matters. The number of possible permutations of elements from a set with total elements is given by:

Combination

A combination of a set is a selection of it's elements where the order does not matter. The number of combinations of objects from a set with total elements is given by:

Random Variable

A random variable is a formalization that assigns a numerical value to the outcome of a random event or experiment. Random variables can be classified into two main types:

Discrete random variables take on a countable number of distinct values.
Continuous random variables take on a infinite number of values within a range.

Random variables are typically denoted by capital letters. For example, the set can represent the random variables 'heads' and 'tails' in a coin flip experiment. Their values are represented with their corresponding lowercase letter.

Probability Distribution

A probability distribution is a mathematical function which describes the likelihood of different outcomes for the domain of a random variable. The probability of each outcome is between 0 and 1 (inclusive), and the sum of probabilities of each outcome must sum to 1.

1. Discrete Probability Distributions

Discrete probability distributions describe the probability of each possible value in the domain of a discrete random variable.

The Probability Mass Function (PMF) gives the probability of each possible value of a random variable :

The sum of all outcomes must sum to 1:

Common Discrete Distributions:

Binomial Distribution - Models the number of successes in a fixed amount of independent Bernoulli trials.
Bernoulli Distribution - Models the distribution of a random variable with two possible outcomes.
Poisson Distribution - Models the probability of a number of events occurring in a fixed interval given a constant mean rate .

2. Continuous Probability Distributions

Continuous probability distributions describe the probability of a continuous random variable. These distributions are characterized by a Probability Density Function (PDF).

The probability density function describes the likelihood of a random variable taking on a value in a specific range:

where is the PDF and must satisfy:

Common Continuous Distributions:

Normal Distribution - Also called the Gaussian distribution, represents a bell curve that is symmetric around the mean.
Exponential Distribution - Skewed to the right, often used to model the time or space between events in a Poisson Process.
Uniform Distribution - Rectangle shape, indicating that all outcomes are equally likely.

Expected Value

The expected value, or mean, of a random variable , denoted , is a measure of central tendency that represents the average outcome of a random variable. This concept is crucial in various fields, including artificial intelligence, especially in the context of stochastic (random) task environments, where uncertainty plays a significant role.

Discrete Distributions

For discrete probability distributions, the expected value is defined as:

Where is the Probability Mass Function.

Continuous Distributions

For continuous probability distributions, the expected value is defined as:

Where is the Probability Density Function.

Applications

Machine Learning

Probability is the foundation of many Machine Learning algorithms such as:

Naïve Bayes Classifier - a classification model based on Bayes' theorem with strong assumptions about feature independence.
Hidden Markov Model - A statistical model that represents a system which is assumed to be a Markov Process.

Decision Theory

Decision theory utilizes probability to make rational decisions under uncertainty. Relevant subject include:

Utility Function - Assigning subjective desirability to outcomes.
Markov Decision Process - A framework for modeling sequential decision-making with stochastic outcomes.

Bayesian Network

A Bayesian Network is a probabilistic model implemented as a directed acyclic graph that represents a set of random variables and their conditional dependencies, as well as a set of conditional probability distribution tables.

Game Theory

Game theory is a mathematical framework for modelling strategic interactions in a multi-agent interdependent environment. Probability is crucial for game theory, where agents face uncertainty over the actions of other players as well as the state of the game.

Advanced Topics

Markov Process

A Markov process is a mathematical model which models a stochastic environment that satisfies the Markov Property. The Markov property states that the future state depends only on the present state and not past states. Markov processes have transition probabilities between states, quantifying the chance of moving from one state to another.

Bayesian Statistics

Bayesian statistics is subfield of Statistics that uses Bayes' theorem to update the probability of a hypothesis as new evidence becomes available. Bayesian methods are widely used in classification, regression, and decision-making algorithms.

References

Peter. R. Norvig, Artificial Intelligence: A Modern Approach, Global Edition. 2021.
Wikipedia contributors, “Probability theory,” Wikipedia, Apr. 23, 2025. https://en.wikipedia.org/wiki/Probability_theory
Wikipedia Contributors, “Probability axioms,” Wikipedia, Dec. 05, 2019. https://en.wikipedia.org/wiki/Probability_axioms
J. Soch, “Kolmogorov axioms of probability,” The Book of Statistical Proofs, Jul. 30, 2021. https://statproofbook.github.io/D/prob-ax.html (accessed Mar. 16, 2025)
Wikipedia Contributors, “Factorial,” Wikipedia, Oct. 18, 2019. https://en.wikipedia.org/wiki/Factorial
Wikipedia Contributors, “Permutation,” Wikipedia, Sep. 22, 2019. https://en.wikipedia.org/wiki/Permutation
“Bayesian statistics,” Wikipedia, May 29, 2020. https://en.wikipedia.org/wiki/Bayesian_statistics