clear
echo on
%This program illustrates the calculation of error
%in linear regression.  In my laboratory we developed
%a separation process in which the transport of solute 
%molecules is selectively enhanced by oscillating fluid 
%along the length of a pore or a fiber in a membrane. 
%These experiments are described in AIChE J 42(4), 940?952
%1996.
%
%We obtained the following data for enhancement
%in the transport of 1-butanol and t-butanol as a
%function of tidal displacement (the amplitude of
%oscillation):
tidal=[.0544;.102;.131;.276;.132;.131;.161;.163;.182;.181];
enhancetbut=[.287;1.15;1.45;8.17;1.44;1.58;2.29;2.19;2.88;2.04]*10^4;
enhance1but=[.407;1.72;2.18;10.06;1.93;2.18;3.17;3.08;4.04;2.66]*10^4;
%Let's display this in matrix form:
format short e
[tidal,enhance1but,enhancetbut]
format short
pause
%Let's plot this up:
figure(1)
set(gca,'FontSize',18) 
loglog(tidal,[enhance1but,enhancetbut],'o')
xlabel('tidal displacement')
ylabel('enhancement in transport')
legend('t-butanol','1-butanol','Location','NorthWest')
grid on

%As you can see from this plot, we have a power law
%dependence of the enhancement on the tidal displacement.
%Also, the enhancement for 1-butanol is greater than
%that of t-butanol, forming the basis for separation.
pause
%We want to analyze the performance of this separation method.
%We are going to ask the following questions:
%
%1. Theory predicts that the enhancement should vary as the
%square of the tidal displacement.  Is this consistent with
%the data to within random error?
%
%2. The idea is that 1-butanol should experience a greater
%enhancement in mass transfer than t-butanol.  Is the observed
%ratio statistically significant?
%
%3. Does the tidal volume have a statistically significant
%effect on the ratio of the tidal displacements?
%
%To answer these questions we will need to do linear regression
%and error propagation!
pause
%We expect a power law dependence on tidal displacement,
%thus we must first convert the model from this non-linear
%form to a linear form:
%
%   enhancement = c1 * tidal ^ c2
%
%becomes:
%
%   log(enhancement) = log(c1) + c2 * log(tidal)
%
%The modeling parameters are thus the log of the amplitude
%and the exponent.
pause
%We can use this to set up the matrix A (it will be the
%same for both 1 and t butanol):
n=length(tidal);
a=[ones(n,1),log(tidal)]

%and the two right hand sides:
b1=log(enhance1but);
bt=log(enhancetbut);
%or for display we can put these together:
bcomb=[b1,bt]

%Using the normal equations we can solve for the best
%fit parameters.  We shall solve the problem via the
%relation x = inv(A'*A)*A' * b as described in class.
%Thus:
k=inv(a'*a)*a';
%Let's print out the transpose of this matrix:
k'
pause
%We can actually solve for both of the right hand side
%vectors at the same time:
x=k*bcomb;
x1=x(:,1)
xt=x(:,2)
pause
%As can be seen, the exponent for both sets of data is
%similar, and close to the expected value of 2.0.  We can
%compare this to the data:
figure(1)
hold on
plot(tidal,exp(a*x))
hold off
%which fits the data pretty well.
pause
%Now we calculate the error in the measurements and the error
%in the slope.  First we get the error in the measurements.
%This is just the square of the 2-norm of the residual,
%divided by n-2 because we have two fitting parameters:
r1=a*x1-b1
%and
rt=a*xt-bt
pause
%Note that this got the residual for both data sets at the
%same time!.  The variance is:
var1=norm(r1)^2/(n-2);
vart=norm(rt)^2/(n-2);
std1=var1^.5
stdt=vart^.5
%thus the error in both 1 and t butanol measurements are
%pretty similar.  Note that the error given here is actually
%the error in the logarithm of the enhancement, since that
%is what we are fitting the model to.
pause
%Now we calculate the error this causes in the calculated
%exponent.  We get the matrix of covariance:

varfit1=k*var1*k';
varfitt=k*vart*k';
stdexp1=varfit1(2,2)^.5
stdexpt=varfitt(2,2)^.5
%We can use these to calculate the z-statistic for each:
z1=(x1(2)-2)/stdexp1
zt=(xt(2)-2)/stdexpt
%Which are both less than 2, so there is no statistically
%significant difference from the expected exponent of 2 at
%the 95% confidence level.
pause
%Next we want to look at the ratio of the transport rates
%since this is what determines the selectivity.  Because the
%exponents are slightly different for the two enhancements, it
%is reasonable to compare the enhancements in the middle of the
%range over which measurements are made.  Thus, we compare the
%enhancement at a tidal displacement of 0.15.  We now calculate
%both the enhancement and the error in the enhancement for each
%species investigated at that tidal displacement.  Thus:
enhance1=exp([1,log(.15)]*x1)
enhancet=exp([1,log(.15)]*xt)
ratio=enhance1/enhancet
%so there is some selectivity provided by the oscillation scheme.
pause
%Now for the error in each enhancement at a tidal displacement
%of 0.15.  We can get at this by using the vector formula described
%in class:
avec=[1,log(0.15)];
predvar1=avec*varfit1*avec'
predvart=avec*varfitt*avec'
pause
%This is are the variance in the predicted values - the logarithm
%of the predicted enhancements for 1 butanol and t butanol.  If
%we want the error in the actual enhancement itself, we have to
%use the formula for error propagation for a function of a 
%random variable.  In this case the enhancement is just the
%exponential of the point predicted by the fitted curve, so the
%error calculation is extremely simple.  The error in the -logarithm-
%of a random variable is approximately equal (for small variations)
%to the relative error of the variable itself!  Thus:

relpredstd1=predvar1^.5
relpredstdt=predvart^.5
%where these quantities represent the fractional standard deviation
%in the enhancement (the standard deviation by the enhancement).
%The deviation in each of the predicted enhancements is rather
%small, only a bit more than 5% of the predicted enhancement.
pause
%If we wanted to find out the error in the ratio, we could use the
%standard formula for the error in a ratio assuming that the
%variability in the two enhancements was independent.  This would
%give for the error:

ratio
relratiostd=(predvar1+predvart)^.5
%which is pretty large considering that the deviation of the ratio
%in enhancement from unity (which provides the selectivity of the
%process) is only 0.38.  This error, however,is actually incorrect!
%This is because there is a significant degree of covariance between
%the scatter in the observed enhancements of 1 and t butanol.  
pause
%To see this, let's look at the residual of the data:
figure(2)
plot(rt,r1,'o')
xlabel('t-butanol residual')
ylabel('1-butanol residual')
grid on
%As you can see, the residuals of 1-butanol and t-butanol are almost
%perfectly covarying.  Because the selectivity is the ratio of the
%enhancements, the covariance will greatly reduce the random error
%of the enhancement, and our error estimate is much too high.  OK, 
%so how do we fix this?
pause
%The simplest way of doing this is to go back to the original data
%and fit a power law model directly to the ratio, rather than the
%individual enhancements.  In this case, the expected power law
%exponent should be zero (the dependence on tidal displacement cancels
%out) and the modeling function, evaluated at a tidal displacement
%of 0.15 should yield both the correct value and the correct error.
%Let's do this:
pause
%First we set up the new right hand side for the regression problem:
bratio=b1-bt;
%Note that subtraction of logarithms is equivalent to division in
%the original enhancement.

%The regression matrix remains unchanged, so the solution to the
%regression problem is just:
xratio=k*bratio
pause
%Now we want to determine if the ratio is statistically significant.
%To do this, we first need to calculate the uncertainty in the two
%fitting parameters, and then evaluate the error in the ratio at an
%intermediate tidal volume.
%
%We have the residual:
ratiovar=norm(bratio-a*xratio)^2/(n-2);
%and thus:
varxratio=k*ratiovar*k'
%evaluating this at an intermediate tidal volume yields:
ratiopred=avec*xratio
%and the standard deviation of:
ratiopredstd=(avec*varxratio*avec')^.5
%Which is much smaller!
pause
%Let's plot this up:
figure(3)
set(gca,'FontSize',18)
loglog(tidal,exp(bratio),'o')
grid on
xlabel('tidal displacement')
ylabel('selectivity')
hold on
loglog(tidal,exp(a*xratio))
axis([.05 .3 1 2])
hold off
%So from the plot we see that the selectivity is a slightly decreasing
%function of the tidal displacement, in disagreement with the theory.
%Let's see if this discrepancy is statistically significant.
pause
%To do this, we test if the slope of the line is significantly different
%from zero.  We get a z-statistic for the null hypothesis of:
zratio=xratio(2)/varxratio(2,2)^.5
%Which is larger than 2 in magnitude.  To get the probability of this
%occuring, we need the t-statistic for 8 degrees of freedom:
prob=tcdf(zratio,n-2)
%which is 3%.  Thus, there is a 97% chance that the decrease in
%selectivity with tidal volume really is statistically significant!
pause
%The conclusion from all of this is that you have to be very careful
%in analyzing your data.  If we had just looked at the original slopes
%of the enhancement vs. tidal displacement plots, with error, we would
%conclude that both of the enhancements were well within statistical
%uncertainty of the theoretically predicted slope of two, and even
%more so close to each other.  This last part would have led us to
%the conclusion that selectivity is not affected by the tidal
%displacement.  A more precise statistical analysis shows just the
%opposite - that there is a small but statistically significant
%decrease in selectivity with stroke volume!
pause
%In conclusion, statistics is a rather slippery subject, and you
%must understand -exactly- what you are trying to calculate to do it
%right.  In particular, you must understand how all of the calculated
%variables relate to each other, and whether there is any covariance.
%Finally, remember that all of this deals with random error only - 
%if the number of data points is large, the overall error will almost
%certainly be dominated by the systematic error in the experiment, 
%about which these statistical studies can provide no information.
echo off