Tuesday, August 6, 2013

Bayesian Inference - Bayes Theorem

Introduction

Bayesian inference is considered the second main stream of statistical analysis with the frequentist approach to statistical analysis which contain MLE (Maximum likelihood Estimation) and least square methods.
Bayesian inference is concerned with generation of conditional distribution of the model parameters given the sample data which is “The posterior density of any statistical model”.
Bayesian inference methods try to combine information from the prior density and the likelihood function, the posterior density summarizes the plausibility of the parameters of a model given the observed data.
Bayesian inference is all about applying bayes theorem to experimental datasets obtained, this mainly comprises of three entities.
Prior :
this is the investigator belief of parameters distributions of the data obtained, it is a subjective quantity that is not affected by the occurrences of other events.
Likelihood :
This is a function of how likely the event would occur in the light of the other event and under the experimental conditions and it could be following binomial , normal or exponential distributions.


Posterior :
 this is a probability distribution of the statistical model parameters and it is called posterior because it occurs as a result of the joint conditional probabilities of all parameters of interest together in the statistical model. Unscaled form of Bayes theorem
Posterior  = Prior * Likelihood

Differences between Bayesian inference methods and frequentist inference methods are summarized in the following points.
1.     Presence of Prior density: this is the investigator belief about the data distribution parameters of interest, Bayesian inference methods are used to change this prior belief of the parameters distributions in the light of new information gained from the sample data which will be obtained using the likelihood function.

2.     The parameters of distribution enter the model as random variables which then form a joint distribution with the sample data, whereas, in the frequentist approach, the parameters are unknown and are fixed quantities, There is no probability density functions associated with them.

Bayes Theorem 

It states that for any two events A and B


Represents the probability of even A given that B has occurred. This is a conditional probability of A given the occurrence of event B.
Because we say that event B must occurs first , it makes the sample space is B and the event A will occur first means that the probability of their occurrences will come due to the intersection of both events, so we can state the bayes theorem in the light of the above information as follows.


P(A|B) è the posterior which is the probability distribution given that both events had certainly occurred.
P(A) è this is the prior which is not affected by the occurrence of B, this is the investigator belief of the conditional probability of both events.
P(B|A) è this is the likelihood function that gives how likely event B will occur given that Event A had occurred first
P(B) è this is the normalizing factor and it makes sure that the joint posterior probability will be normalized to unity.

For discrete probability distributions, if we assumed that event A can be expressed as a discrete partition of the sample space S.

 Like the following

Then, the normalizing factor becomes sum over the likelihood and the prior for each random variable in the sample space S . This is bayes theorem with partitioning of the sample space.
It follows by convention, that bayes theorem becomes

The normalizing factor becomes a sum over all joint likelihood and joint priors of the sample space S.

For continuous Distributions, the normalizing factor becomes an integral over all the sample space obtained instead of the sum of joint priors and likelihood like follows.

F(Theta|X) è The posterior density.

F(Theta) è the prior or marginal density
L(X|Theta) è the likelihood , which is a function of theta.
è is the normalizing factor, becomes an integral over all the obtained sample space.


Types of Priors

a)    Conjugate Prior
    One solution the problem of intractable Bayesian posterior densities is to find a prior distribution that is a member of a family of functions that result in a posterior density which is also being a member of that family. In this case the posterior can be obtained from the prior by a change in its parameters to some function of the data. This is defined as a parameter updating.
b)    Vague or non-informative prior
If there is no useful information exists about the parameters, we can use vague or non-informative prior.
In this type of prior , we use a uniform distribution across the range of values of the parameter or a normal distribution with a large variance.
If there is no prior found for such an analysis we can use a uniform distribution from the parameters of interest.


No comments:

Post a Comment