Introduction
Bayesian
inference is considered the second main stream of statistical analysis with the
frequentist approach to statistical analysis which contain MLE (Maximum
likelihood Estimation) and least square methods.
Bayesian
inference is concerned with generation of conditional distribution of the model
parameters given the sample data which is “The posterior density of any
statistical model”.
Bayesian
inference methods try to combine information from the prior density and the
likelihood function, the posterior density summarizes the plausibility of the
parameters of a model given the observed data.
Bayesian
inference is all about applying bayes theorem to experimental datasets
obtained, this mainly comprises of three entities.
Prior :
this is the
investigator belief of parameters distributions of the data obtained, it is a
subjective quantity that is not affected by the occurrences of other events.
Likelihood :
This is a
function of how likely the event would occur in the light of the other event
and under the experimental conditions and it could be following binomial ,
normal or exponential distributions.
Posterior :
this is a probability distribution of the statistical
model parameters and it is called posterior because it occurs as a result of
the joint conditional probabilities of all parameters of interest together in
the statistical model. Unscaled form of Bayes theorem
Posterior = Prior * Likelihood
Differences
between Bayesian inference methods and frequentist inference methods are
summarized in the following points.
1. Presence of
Prior density: this is the investigator belief about the data distribution
parameters of interest, Bayesian inference methods are used to change this
prior belief of the parameters distributions in the light of new information
gained from the sample data which will be obtained using the likelihood
function.
2. The parameters
of distribution enter the model as random variables which then form a joint
distribution with the sample data, whereas, in the frequentist approach, the
parameters are unknown and are fixed quantities, There is no probability
density functions associated with them.
Bayes
Theorem
It states that for
any two events A and B
Represents
the probability of even A given that B has occurred. This is a conditional
probability of A given the occurrence of event B.
Because we
say that event B must occurs first , it makes the sample space is B and the
event A will occur first means that the probability of their occurrences will
come due to the intersection of both events, so we can state the bayes theorem
in the light of the above information as follows.
P(A|B) è the posterior which is the
probability distribution given that both events had certainly occurred.
P(A) è this is the prior which is not
affected by the occurrence of B, this is the investigator belief of the
conditional probability of both events.
P(B|A) è this is the likelihood function that
gives how likely event B will occur given that Event A had occurred first
P(B) è this is the normalizing factor and
it makes sure that the joint posterior probability will be normalized to unity.
For discrete
probability distributions, if we assumed that event A can be expressed as a
discrete partition of the sample space S.
Then, the
normalizing factor becomes sum over the likelihood and the prior for each
random variable in the sample space S . This is bayes theorem with partitioning of the sample space.
It follows
by convention, that bayes theorem becomes
The
normalizing factor becomes a sum over all joint likelihood and joint priors of
the sample space S.
For
continuous Distributions, the normalizing factor becomes an integral over all
the sample space obtained instead of the sum of joint priors and likelihood
like follows.
F(Theta|X) è The posterior density.
F(Theta) è the prior or marginal density
L(X|Theta) è the likelihood , which is a function
of theta.
Types of Priors
a) Conjugate
Prior
One solution the problem of intractable
Bayesian posterior densities is to find a prior distribution that is a member
of a family of functions that result in a posterior density which is also being
a member of that family. In this case the posterior can be obtained from the
prior by a change in its parameters to some function of the data. This is
defined as a parameter updating.
b) Vague or
non-informative prior
If there is
no useful information exists about the parameters, we can use vague or
non-informative prior.
In this type
of prior , we use a uniform distribution across the range of values of the
parameter or a normal distribution with a large variance.
If there is
no prior found for such an analysis we can use a uniform distribution from the
parameters of interest.