Tuesday, August 6, 2013

Bayesian Inference - Bayes Theorem

Introduction

Bayesian inference is considered the second main stream of statistical analysis with the frequentist approach to statistical analysis which contain MLE (Maximum likelihood Estimation) and least square methods.
Bayesian inference is concerned with generation of conditional distribution of the model parameters given the sample data which is “The posterior density of any statistical model”.
Bayesian inference methods try to combine information from the prior density and the likelihood function, the posterior density summarizes the plausibility of the parameters of a model given the observed data.
Bayesian inference is all about applying bayes theorem to experimental datasets obtained, this mainly comprises of three entities.
Prior :
this is the investigator belief of parameters distributions of the data obtained, it is a subjective quantity that is not affected by the occurrences of other events.
Likelihood :
This is a function of how likely the event would occur in the light of the other event and under the experimental conditions and it could be following binomial , normal or exponential distributions.


Posterior :
 this is a probability distribution of the statistical model parameters and it is called posterior because it occurs as a result of the joint conditional probabilities of all parameters of interest together in the statistical model. Unscaled form of Bayes theorem
Posterior  = Prior * Likelihood

Differences between Bayesian inference methods and frequentist inference methods are summarized in the following points.
1.     Presence of Prior density: this is the investigator belief about the data distribution parameters of interest, Bayesian inference methods are used to change this prior belief of the parameters distributions in the light of new information gained from the sample data which will be obtained using the likelihood function.

2.     The parameters of distribution enter the model as random variables which then form a joint distribution with the sample data, whereas, in the frequentist approach, the parameters are unknown and are fixed quantities, There is no probability density functions associated with them.

Bayes Theorem 

It states that for any two events A and B


Represents the probability of even A given that B has occurred. This is a conditional probability of A given the occurrence of event B.
Because we say that event B must occurs first , it makes the sample space is B and the event A will occur first means that the probability of their occurrences will come due to the intersection of both events, so we can state the bayes theorem in the light of the above information as follows.


P(A|B) è the posterior which is the probability distribution given that both events had certainly occurred.
P(A) è this is the prior which is not affected by the occurrence of B, this is the investigator belief of the conditional probability of both events.
P(B|A) è this is the likelihood function that gives how likely event B will occur given that Event A had occurred first
P(B) è this is the normalizing factor and it makes sure that the joint posterior probability will be normalized to unity.

For discrete probability distributions, if we assumed that event A can be expressed as a discrete partition of the sample space S.

 Like the following

Then, the normalizing factor becomes sum over the likelihood and the prior for each random variable in the sample space S . This is bayes theorem with partitioning of the sample space.
It follows by convention, that bayes theorem becomes

The normalizing factor becomes a sum over all joint likelihood and joint priors of the sample space S.

For continuous Distributions, the normalizing factor becomes an integral over all the sample space obtained instead of the sum of joint priors and likelihood like follows.

F(Theta|X) è The posterior density.

F(Theta) è the prior or marginal density
L(X|Theta) è the likelihood , which is a function of theta.
è is the normalizing factor, becomes an integral over all the obtained sample space.


Types of Priors

a)    Conjugate Prior
    One solution the problem of intractable Bayesian posterior densities is to find a prior distribution that is a member of a family of functions that result in a posterior density which is also being a member of that family. In this case the posterior can be obtained from the prior by a change in its parameters to some function of the data. This is defined as a parameter updating.
b)    Vague or non-informative prior
If there is no useful information exists about the parameters, we can use vague or non-informative prior.
In this type of prior , we use a uniform distribution across the range of values of the parameter or a normal distribution with a large variance.
If there is no prior found for such an analysis we can use a uniform distribution from the parameters of interest.


Monday, July 29, 2013

Linear Regression With Biological Data

Linearization on Michaelis Menten equation


Michaelis menten equation , is a mathematical model on which we can investigate the reaction kinetics of an enzyme when it binds to its substrate , it gives us full understanding on the underlying rate of reactions and the effects on the substrate by the catalysis of the enzyme.

Importance of this mathematical model lies in revealing the catalytic mechanism of this enzyme, its role in metabolism, how its activity is controlled and how a drug or an agonist might inhibit the enzyme.
General mathematical mode of michaelis menten is as follows



The plot of v versus [S] is not linear, this nonlinearity could make it difficult to estimate Km and Vmax accurately , therefore, several researchers developed some linearization of the above mathematical model such as lineweaver-burk plot , Eadie-Hofstee diagram and the hanes-Woolf plot. In this assignment, I am going to explain the lineweaver-Burk plot In more details.

LineWeaver-Burk Plot

LineWeaver-Burk plot or double reciprocal plot is a common way of illustrating kinetic data. It is produced by taking the reciprocal of both sides of the michaelis menten equation. As follows 

The above double reciprocal method yields a linear equation on the rate of catalytic reaction of an enzyme which can be plotted against the observed data .
According to the general formulae for a straight line of the form

Y = M * X + b

M è Which is the slope of the regression line, in LineWeaver-Burk plot represents Km / Vmax
B è The intercept , Y-Axis intercept which represents the reciprocal of Vmax . 1 / Vmax
And another intercept with X-Axis which represents the reciprocal of michaelis constant which is 1 / Km

The following plot explains it in more details


Example of Michaelis menten Linearization

In R , I have followed the following steps to produce LineWeaver-Burk Plot
 
Ø substrate <- c(20,40,80,120,160,220,260)
Ø reaction <- c(41.0,77.0,113,144,162,185,197)
Ø dataset <- data.frame(substrate,reaction)
Ø attach(dataset)
Ø plot(reaction~substrate,main=”Michaelis Menten Kinetics Plot”)



Producing a lineWeaver Burk plot implies taking the double reciprocal on both sides of michaelis menten equation , so taking the reciprocal of both substrate and reaction variables as follows
 
Ø recipSubstrate <- 1 / substrate
Ø recipReaction <- 1 / reaction
Ø plot(recipReaction~recipSubstrate,main=”Reciprocal of Michaelis menten variables”)



Now to generate a linear model from the following data , we are going to issue the lm Command in R.

Ø  linearmodel <- lm(recipReaction~recipSubstrate)
Ø Call:
Ø lm(formula = recipReaction ~ recipSubstrate)
Ø  
Ø Coefficients:
Ø    (Intercept)  recipSubstrate 
Ø       0.003468        0.412298 
Ø Summary(linearmodel)
Ø Call:
Ø lm(formula = recipReaction ~ recipSubstrate)
Ø  
Ø Residuals:
Ø          1          2          3          4          5          6          7
Ø  3.072e-04 -7.886e-04  2.277e-04  4.048e-05  1.278e-04  6.317e-05  2.223e-05
Ø  
Ø Coefficients:
Ø                 Estimate Std. Error t value Pr(>|t|)   
Ø (Intercept)    0.0034682  0.0002145   16.17 1.65e-05 ***
Ø recipSubstrate 0.4122980  0.0096963   42.52 1.36e-07 ***
Ø ---
Ø Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Ø  
Ø Residual standard error: 0.0003976 on 5 degrees of freedom
Ø Multiple R-squared:  0.9972,   Adjusted R-squared:  0.9967
Ø F-statistic:  1808 on 1 and 5 DF,  p-value: 1.357e-07


To generate the regression line on the following data

Ø Abline(linearmodel)



Interpretation



From the above data , we can predict both Vmax and Km as follows
According to line-Weaver Burk plot , the Y-axis intercept is 1 / Vmax
b = 1 / Vmax  = 0.0034682
Vmax = 288.334
Where the X-Axis intercept with the Reciprocal of the substrate on the above plot is
1 / Km = 0.4122980
Km = 2.42543
Now we have predicted both Vmax and km values, Therefore, we can calculate the slope of the line which is represented by
M = Km / Vmax = 2.42543 / 288.334 = 0.008411
Coefficient of Determination R-Squared is  = 0.9972 , that means about 99 % of the data will be predicted by the regression line and 1 % will be left for unknown reasons.
Generally , the regression line is considered a good fit for predicting the relationship between the two variables.



Regression

Definition

Regression is a type of statistical analysis used for estimating the relationship between variables, Which explores the possible relationships between two or more variables, One variable called dependent variable or the response and another one is independent variable .
Regression helps us predict the value of the response variable with respect to the change in the independent variable , It estimates the conditional expectations of dependent variable given the independent variable.
Regression models
Regression has the following models
  • 1        Linear regression
  • 2.       Nonlinear regression
  • 3.       More Complex regression called multiple regression which encompasses several independent variables).
  • 4.       Logistic regression where the dependent variable is binary.

    Linear Regression


Simple Linear regression explores possible relationships among data of linear nature, where the relationship could be modeled as a straight line .
In general , a straight line could be modeled given the equation below
A = M * x + b

Where M is the slope of the regression line and b is the intercept which is the line crosses the y axis.

Conditions for linear regression to make it valid
A.      Normality : Regression is a parametric test, and assumes that the data are normally distributed.
B.      Linearity : A Linear relation exists between the dependent and the independent variables. Thus it is sensible to model the relationship as a straight line.
C.      Independence : That data should have been drawn independently.
D.      Unexplained variance : the portion of the variance of the dependent variable not explained by the independent variable is due to random error and follows a normal distribution.



Maximum Likelihood Estimation for Binomial Distribution Example

PDF for Binomial Distribution is


 Where y number of successes and w is the probability of success.

Suppose we have 10 Bernoulli trials of tossing a coin

The pdf In this case is 

The generated plot is as follows


Let us define a likelihood function for this simple binomial distribution .
L(w|y) = f(y|w) , suppose that the number of successes in an experiment is 7

So the likelihood function in that case is

The likelihood function is a function of the parameter whereas the pdf function is a function of the data observed , according to our example both functions are represented on different scales , the likelihood function is described on a parameter scale whereas the PDF function is described on a data scale , the likelihood function is a curvature function as follows .



 Steps to obtain a maximum likelihood function are as follows

1)      Obtaining the likelihood function as above for this specific binomial PDF.
2)      Take the logarithm of the likelihood because both are monotonically related to each other.
3)      Differentiate the log-likelihood function and let it equal zero. 

4)      To make sure that the likelihood function is a maximum not a minimum , we should calculate the second derivative and make sure that the results are all negative, since the maximum likelihood should be a convex or a peak.

So , in the above example ,  by taking the log of the likelihood function, we will have the following equation.


Next , we need to calculate the first derivative of the log likelihood function to yield the following



Now the MLE for the binomial distribution is obtained as above ,  now we are going to set that to  zero to solve for w which is the needed parameter for binomial distribution
7 – 10w / w(1-w) = 0
Solving this equation will give that w = 0.7 ………………………………………………….(MLE)

To make sure that, it is actually the maximum likelihood not the minimum one , we should take the second derivative of the log likelihood should be calculated and set to W(mle) to make sure it is negative as follows









Maximum Likelihood Estimation

Maximum Likelihood Estimation

Definition

MLE or Maximum likelihood estimation , is defined as a function of the parameters involved in any chosen statistical model under investigation , it gives us intuitions about the parameters whose values are most likely produced the probability distributions and/or the data observed.
Example
I have two examples of maximum likelihood estimation one for binomial distribution and another one for normal distribution .
Here, I will an example of maximum likelihood estimation for normal distribution.
Suppose that we have a random variate called X which belongs to a normal distribution that obey the following criterion.

X ~ N(Mu,SD)

Let Mean(Mu)  = Theta which is the parameter we need to estimate and for our conveniences we will let the standard deviation equals 1.
So, given a probability density function for the normal distribution as below .


We substitute the mean variable by our parameter symbol theta which we need to estimate its value and substituting the values for standard deviation to 1, we will obtain the following equation in return.

Now , we need to derive the likelihood function for a general normal distribution, assume that we have a sample of I.I.D random variables which are independent but identically distributed random variables.
A general likelihood estimation function would be as follows

Now , this accordingly will be

 By differentiating the following equation and setting the result to zero


Finally , we will have

 Now this answer is logically true, that the maximum likelihood estimate of theta should be and it is truly the sample mean.

Statistical Inference

Statistical Inference
Introduction

Statistical inference is the second broad category of Statistics , where statistics is divided into two categories , Descriptive statistics and statistical inference or inferential statistics .
Statistical inference is considered to be the action of drawing conclusions or hypotheses from collected observational data.
Statistical inference also describes set of procedures or steps that can be used to give postulates for future outcome from systems affected by random variations. There are initial requirements needed in systems to be considered for statistical inference studies , the system should produce reasonable answers when that system is applied to well-defined situations and these answers could be generalized to more other situations.
We use statistical inference to allow us understand behavior in samples to learn more about the behavior of the population which is often too large to collect or inaccessible to the researcher.
Statistical inference needs to use and/or derive a statistical model  for the data in hand on which we use it to derive future predictions , the success of any statistical inference tasks truly depends on the chosen statistical model , actually it is at the heart of any statistical inference task.
Statistical inference is divided up in two parts .
1)      What is needed for statistical inference test ?
Statistical inference needs a statistical model  of the random process that is supposed to generate the data, which is known when randomization has been used.
2)      What is the output for a statistical inference test ?
The output from a statistical inference test is a conclusion or a statistical proposition , These propositions or conclusions can be enumerated as follows :
·         An Estimate  : A particular value that best approximates some parameter of interest.
·         Confidence interval : an interval that is constructed using a dataset drawn from a given population of interest, so that , under repeated sampling of such datasets, such intervals would contain the true parameter value with the probability at the stated confidence level.
·         Acceptance or Rejection of a given hypothesis.
Whereas ,
Descriptive Statistics is a preliminary step of any data analysis , it only presents facts about the observed data , like the correctness of random sampling , how the data is distributed , how the data is deviated from its central average (mean) , in general sense , it gives overall summary of the data .
Any statistical analysis begins with describing the observed data (Descriptive Statistics), It gives a general intuition about the source of data and how the data is distributed in terms of researchers interests. Typically, Descriptive statistics is a preliminary step in any statistical analysis.

Also , in this preliminary step , the data analyst could curate the data , preprocess them and prepare them for a specific statistical inference analysis test.


Hypothesis Testing
Definition
Hypothesis Testing or significance testing is a method by which we test a claim or a hypothesis about a parameter in a population using data measured in a sample. In this method, we test some hypothesis by determining the likelihood that a sample statistic could have been selected, if the hypothesis regarding the population parameter were true.
The method of hypothesis testing can be summarized in four steps as follows
1)      First , We identify a hypothesis or claim that we feel should be tested.
For example , we might want to test the claim that the mean number of hours that children in the united states watch TV is 3 hours.
2)      We select a criterion upon which we decide that the claim being tested is true or not.
For example , The claim is that children watch 3 hours of TV per week.
3)      Selection of a random sample from the population and measure the sample mean.
4)      Compare what we observe in the sample to what we expect to observe if the claim we are testing is true.We expect the sample mean to be around 3 hours. If the discrepancy between the sample mean and population mean is small , then we will likely decide that the claim we are testing is indeed true. If the discrepancy is too large, then we will likely decide to reject the claim as being not true.

Decision Theory , I tried to learn about decision theory but I could not understand it , it is not covered in the course and I had hard time to figure what exactly it is , it is something related to game theory, which I don’t have any clue to it .

Parameter Estimation
In mathematical modeling, Such hypotheses about the structure and inner working of the behavioral process of interest are stated in terms of parameteric families of probability distributions called models. The goal of modeling is to deduce the form of the underlying process by testing the viability of the model.
When a model is specified with its parameters for the research of interest , now we can begin to evaluate its goodness of fit, that is, how well it fits the observed data. Goodness of fit is assessed by finding parameter values of a model that best fits the data , This procedure is called parameter estimation.
Types of parameter estimation
1.      Least-Squares estimation (LSE) : in this type, we are estimating the difference between the true value of the parameter compared to its predicted value, it is highly associated with regression analysis when we try to minimize residuals of least square, which is defined by 
RSS = E [ Yi – pY]^2
It is useful for obtaining a descriptive measure for the purpose of summarizing observed data but it has no basis for testing hypotheses or constructing confidence intervals.

2.       Maximum Likelihood Estimation (MLE)
 It is a widely used method in parameter estimation in statistics; it has many optimal properties in estimation as follows.
A.      Sufficiency (Complete information about the parameter of interest contained in its MLE Estimator).
B.      Consistency: true parameter value that generated data recovered asymptotically for example as data of sufficiently large samples.
C.      Efficiency: Lowest-possible variance parameterization invariance (Same MLE solution obtained independent of parameterization used).
D.      MLE is a prerequisite for chi-square test, the G-square test, Bayesian methods, inference with missing data and modeling of random effects.


Example
Binomial Distribution
Let us consider one observation and one parameter in binomial distribution, suppose that y represents number of successes in a sequence of 10 bernoulli trials (Tossing a coin 10 times), the probability of a success on any one trial, represented by the parameter w, is 0.2 . the pdf in this case should be 


Likelihood function
Given dataset and the model of the interest, the likelihood function will find a PDF function among all probability densities that the model prescribes, that is most likely to have produced the data.
Let us assume that the likelihood function is 
L(W|Y) is the likelihood function which represents the likelihood of the parameter w given the observed data y and it is truly a function of w.
We could use log likelihood function instead of the likelihood function itself because both are monotonically related to each others.


Tuesday, April 30, 2013

Initial Introduction To Hacking The Genome

Gentle Introduction

This Blog will offer a series of articles on how to develop Bioinformatics algorithms to sequence Genomic data in order to mine the knowledge lurking inside the genome sequences for us to gain more deeper insights on molecular function of the underlying disease.

Then , we will extend our knowledge to include the following set of topics :

  • NGS Data Analysis
  • Biological network analysis
  • How to Analyze Biological data in R (Statistical Programming language) 
  • Metabolomics (using Enzyme Kinetics and ordinary differential equations) to describe Any Metabolic network.

In our series of articles , we are going to introduce a new scripting language for writing algorithms that is new into this field , few scientists are using it which is called "Groovy" , Groovy is a new additions to JRE 7 (Java Runtime Environment) which is considered dynamic loosely typed scripting language on top of Java virtual machine , It combines the power of Any Scripting Language like Perl , Python...etc and the extremely flexible  programming framework of Java , that you can utilize any code written in java into groovy and vice versa , We will give gentle introduction to the building blocks of this script as we move on .

We will also utilize the latest technologies in analyzing genomic data like introducing Hadoop and Grid Computing in Computational Biology/Bioinformatics.

I hope you will enjoy these series as much as we will enjoy writing it .

Sincerely Yours;
Mohamed Ibrahim
Computational Biologist