clc;
clear;
format compact
echo on;
%Tutorial 5

%In this tutorial we'll study the error propagation in estimating
%the equilibrium constant for a gas phase reaction. Suppose that an
%undergraduate researcher is studying the following reaction:

% 2NO(g) + 2H2(g) <--> N2(g)+2H2O(g)

%The student measured the concentrations once the system reached
%equilibrium. The average measurements are the following:

H2O=1.38
N2=0.19
NO=0.62
H2=0.12

%The equilibrium constant can be calculated as:

%K = [N2]*[H2O]^2 / [NO]^2*[H2]^2

%We write a little function 'get_constant.m' to use these values to
%calculate the equilibrium constant.  It will be convenient to write this
%function to take in all these values as an array 'concentrations':
concentrations =[H2O,N2,NO,H2];

%Lets calculate the equilibrium constant
the_constant =get_constant(concentrations)

%We want to calculate the error in the equilibrium constant.  Our
%researcher estimated the uncertainty in each of these concentration
%measurements, either by doing multiple measurements of each, or by other
%means.  We put these standard deviations in the vector sigma.
%
%sigma contains the magnitude of the std deviation for the water, Nitrogen,
%nitrous oxide and Hydrogen concentrations, respectively.

sigma = [0.005,0.003,0.004,0.006];

%For the error calculation we will do a numerical derivative to get the 
% gradient of the equilibrium constant with respect to each of the 
%concentrations. We'll use a forward difference scheme.

%Forward difference scheme:
% dy/dx ~= (y2-y1)/(x2-x1);

%Where y2 is given by y2=f(x1+dx), dx is a small number.  Because the
%magnitude of all the x(i)'s are different, we choose a dxi which scales
%with the sigma of each variable.  Thus:

ep=10^-4; %Small value

gradf=zeros(size(concentrations)); %We initialize the array.
m=length(concentrations);
for i=1:m
    xt=concentrations; %Note that we have to put this -inside- the loop!
    dxi=ep*sigma(i);
    xt(i)=xt(i)+dxi; %We increase the ith measurement by a factor dx
    new_constant=get_constant(xt);
    gradf(i)=(new_constant-the_constant)/dxi;
    echo off
end
echo on

%The calculated error is just that obtained from the error propagation
%formula.  The measurements are considered independent, so:

varconst=gradf*diag(sigma.^2)*gradf';

%and we get the 1 sigma error bars:

error=varconst^.5

%The error is pretty large, so we want to check to see if it is still
%normally distributed (e.g., that our error propagation formula - that
%ignores higher order terms - is still valid).  We can do this using a
%simple simulation.

%We initialize the array of simulated concentration measurements:
n=4000;
concentration_matrix=zeros(n,length(concentrations));

for i=1:m %Loop over each measurement
    concentration_matrix(:,i)=concentrations(i)+randn(n,1)*sigma(i);
    echo off
end
echo on

%We have written the function so that it can take in all the concentrations
%for all the simulated experiments at once!  Thus we get:

eqmconstants=sort(get_constant(concentration_matrix));

%Where we sort the calculated values from smallest to largest.  We can use
%this to plot up the cumulative probability and compare it to a gaussian
%distribution:
cumprob=[.5:n-.5]/n;

%We have the calculated normal distribution from the error propagation
%formula too:
cumprobest=(0.5 + erf((eqmconstants-the_constant)/error/sqrt(2))/2);
figure(1)
plot(eqmconstants,cumprob,'b',eqmconstants,cumprobest,'r')
xlabel('equilibrium constant')
ylabel('cumulative probability')
legend('simulation','error calculation','Location','SouthEast')
grid on

%Comparison of the two results shows that the error formula is a bit off.
%We can look at the 95% (2 sigma) confidence intervals.  That calculated
%from the error formula is:

conf_interval=[the_constant-2*error,the_constant+2*error]

%While that from the simulation is:
sim_interval=[eqmconstants(round(n/40)),eqmconstants(round(n*39/40))]

%Thus, the interval is shifted slightly to higher equilibrium constants.
%This deviation would vanish for smaller values of measurement error
%(divide sigma by 10 and re-run the code to see this).  In general, the
%error calculation is even more uncertain than the mean (error is a random
%variable too!) so this deviation may not be significant for these
%parameters.
%

%As a final note, we have assumed that the measurement errors are all
%independent.  This is probably incorrect, as all would likely be measured
%by the same GC.  Thus, they may well be positively covarying, and the
%matrix of covariance of the concentration measurements won't be diagonal.
%This would change the calculated result - either increasing or decreasing
%the uncertainty.  Modeling the matrix of covariance of the measurements is
%often quite tricky...