Probability Distributions !

In the end we only regret chances we didn’t take” – Unknown

This post is a part of IIT-M CSE “Concept-Sunday” series .

src :;

From past couple of weeks we have been focusing on data-structures and specifically trees. This week we are going to move away from our tradition and discuss about some common probability distributions which is perhaps bread and butter of any data scientist. It also forms the basis for machine learning and deep learning which is gaining lot of traction these days. This post will serve as a short refresher for binomial, uniform, poisson and normal distribution. The post assumes that you are well versed with basics of probability.

Probability Distribution:

Probability distribution and frequency distribution are fundamental to statistics and two faces of the same coin. Its worthwhile to describe the former in comparison with the latter. While frequency distribution dictates how many times an event has occurred; probability distribution says how many times an event should have occurred. So the chance of occurrence of a particular event in a given setup is described by the probability distribution.

There are different distributions with each having its own use, we will specifically focus on binomial, uniform, poisson and normal distribution.

Binomial Distribution (p):

The binomial distribution describes the distribution of outcomes from a series of trials where each trial meets certain conditions.

  • The experiment consists of n repeated trials.
  • The outcome of each trial may be classified as either a success or a failure(hence the name Binomial).
  • Each trial is independent of other trials.
  • The outcomes are mutually exclusive, meaning there can never be a mix of outcomes. Only one outcome can occur at a time. For example when we flip a coin , the we can either obtain a head or a tail(assuming the coin never stands :P)
  • The probability of success in a trial remains constant in all the trials. For example while tossing a coin if the probability of obtaining a head is 0.5, then for the next trial this probability remains the same.

The number of success x in n trials of a binomial experiment is called the binomial random variable. We need to find the probability of obtaining x successes in n trials given the probability of success in one trial.

The binomial probability distribution is given as,


where ,

  • p  = the probability of a success.
  • x  = the number of success, may range from 0, 1, 2 …. n
  • (1 – p)  = probability of failure(q)

Here is an example of a binomial distribution of obtaining x heads in 20 trials given that the probability of obtaining a head is p; x can take values [0,20]. We can see that with p=0.5 the chance of getting 10 heads has highest probability where as with p=0.75 there are high chances of getting 15 heads.

Uniform Distribution (a, b) :

A uniform distribution, also called a rectangular distribution, is a probability distribution that has constant probability.

The discrete uniform distribution is a symmetric probability distribution whereby a finite number of values are equally likely to be observed; every one of n values has equal probability 1/n. Another way of saying “discrete uniform distribution” would be “a known, finite number of outcomes equally likely to happen”.

Rolling a single die is one example of a discrete uniform distribution; a die roll has six possible outcomes: 1,2,3,4,5, or 6. There is a 1/6 probability for each number being rolled.

Let X represent a random variable taking on the possible values of {0-9}, and each possible value has equal probability. This is again a discrete uniform distribution and the probability for each of the 10 possible value is P(X = x ) =  1/ n = 1/10 = 0.1

If we use a uniform random number generator(rand function – matlab) to generate 20000 numbers between [1, 10], we can see that the frequency of occurrence of each number 1 to 10 is almost the same which is approximately equal to 20000/10 = 2000.


On similar lines continuous uniform distribution is defined and its pdf is given by,


Poisson distribution (λ):

Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since the last event. In short it is used to model the number of events occurring within a given time interval.

Suppose we track the number of emails we receive each day and see that on an average we receive 10 mails per day. Also if either of these emails do not affect the arrival of other emails i.e they are independent of each other, then we can arrive at a reasonable assumption that number of emails received per day obeys Poisson distribution. Other examples that may follow a Poisson: the number of phone calls received by a call center per hour or the number of decay events per second from a radioactive source.

The formula for the Poisson probability mass function is


λ is the shape parameter which indicates the average number of events in the given time interval.

For example if mean number of calls to a fire station on a weekday is 8, what is the probability that on a given weekday there would be 11 calls?  applying x = 11 and = 8 in the above formula, we get probability = 0.072.


Poisson distribution for λ = 5, 10 and 15 – curves are right skewed

Normal Distribution (µ, σ):

The Normal Distribution is the most important probability distribution in statistics and probability theory. It is also known as Gaussian Distribution(named in honor of the great German mathematician Carl Friedrich Gauss) and Bell Curve (as the shape of the normal curve is bell shaped).

The basis for the use of normal distribution to approximate any distribution in an application is the central limit theorem which states that the sum of a large number of identically distributed random variables, each with finite mean and variance, is normally distributed.

Normal probability density function :


Parameter µ equals the mean or average value of the population, and σ (standard deviation) tells us how much the numerical data values of the population are spread out. As σ increases, the more spread out the population data values are. From a quality control point of view, a primary objective of a manufacturing process is to keep the σs of the parts dimensions as small as possible.

The plot below shows normal distribution for different values of mean and variance.



Definitions are directly taken from .

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s