clear reset(0) echo on %This program illustrates the calculation of error %in linear regression. In my laboratory we %developed a separation process in which the %transport of solute molecules is selectively %enhanced by oscillating fluid along the length of %a pore or a fiber in a membrane. The paper describing %these experiments appeared in AIChE Journal. pause %The goal of the technique was to 1) greatly increase %mass transfer rates, and 2) enhance the transport of %one species more than the other, so that a net %separation would occur. The application was for %enantiomeric separations, but we tested it using %1-butanol and t-butanol as their concentrations %could be easily resolved using a GC. pause %In our data analysis we wish to answer the following %questions: %1) Does the amplitude of the enhancement scale as %the square of the tidal volume, within error? %2) What is the relative enhancement (selectivity) at %an intermediate tidal volume, with error? %3) Does the relative enhancement depend on the tidal %volume, within error? pause %We obtained the following data for enhancement %in the transport of 1-butanol and t-butanol as a %function of tidal displacement (the amplitude of %oscillation): tidal=[.0544;.102;.131;.276;.132;.131;.161;.163;.182;.181]; enhancetbut=[.287;1.15;1.45;8.17;1.44;1.58;2.29;2.19;2.88;2.04]*10^4; enhance1but=[.407;1.72;2.18;10.06;1.93;2.18;3.17;3.08;4.04;2.66]*10^4; pause %Let's display this in matrix form: format short e [tidal,enhance1but,enhancetbut] format short pause %Let's plot this up: figure(1) loglog(tidal,[enhance1but,enhancetbut],'o') set(gca,'FontSize',18) xlabel('tidal displacement') ylabel('enhancement in transport') legend('1-butanol','t-butanol') %As you can see from this plot, we have a power law %dependence of the enhancement on the tidal displacement. %Also, the enhancement for 1-butanol is greater than %that of t-butanol, forming the basis for separation. pause %From theory we expect the enhancement to go as the %square of the tidal displacement. We wish to determine %if this is satisfied by the data to within error, and %to determine the ratio of the enhancements, again with %error limits. Let's do this using linear regression. pause %We expect a power law dependence on tidal displacement, %thus we must first convert the model from this non-linear %form to a linear form: % % enhancement = c1 * tidal ^ c2 % %becomes: % % log(enhancement) = log(c1) + c2 * log(tidal) % %The modeling parameters are thus the log of the amplitude %and the exponent. pause %We can use this to set up the matrix A (it will be the %same for both 1 and t butanol): n=length(tidal); a=[ones(n,1),log(tidal)] pause %and the two right hand sides: b1=log(enhance1but); bt=log(enhancetbut); %or for display we can put these together: bcomb=[b1,bt] pause %Using the normal equations we can solve for the best %fit parameters. We shall solve the problem via the %relation x = inv(A'*A)*A' * b as described in class. %Thus: k=inv(a'*a)*a'; %Let's print out the transpose of this matrix: k' pause %We can actually solve for both of the right hand side %vectors at the same time: x=k*bcomb; x1=x(:,1) xt=x(:,2) pause %As can be seen, the exponent for both sets of data is %similar, and close to the expected value of 2.0. We can %compare this to the data: hold on plot(tidal,exp(a*x)) hold off %which fits the data pretty well. pause %Now we calculate the error in the measurements and the error %in the slope. First we get the error in the measurements. %This is just the square of the 2-norm of the residual, %divided by n-2 because we have two fitting parameters: r=a*x-bcomb pause %Note that this got the residual for both data sets at the %same time!. The variance is: var=sum(r.*r)/(n-2); std1=var(1)^.5 stdt=var(2)^.5 %thus the error in both 1 and t butanol measurements are %pretty similar. Note that the error given here is actually %the error in the logarithm of the enhancement, since that %is what we are fitting the model to. pause %Now we calculate the error this causes in the calculated %exponent. We get the exponent using the second column of %the matrix K, thus: varexp=k(2,:)*k(2,:)'*var; stdexp1=varexp(1)^.5 stdexpt=varexp(2)^.5 %Thus both lie within about one standard deviation of the %expected value of 2.0. pause %Now let's look at the ratio of the transport rates %since this is what determines the selectivity. Because the %exponents are slightly different for the two enhancements, it %is reasonable to compare the enhancements in the middle of the %range over which measurements are made. Thus, we compare the %enhancement at a tidal displacement of 0.15. We now calculate %both the enhancement and the error in the enhancement for each %species investigated at that tidal displacement. Thus: pause enhance=exp([1,log(.15)]*x); enhance1=enhance(1) enhancet=enhance(2) ratio=enhance1/enhancet %so there is some selectivity provided by the oscillation scheme. pause %Now for the error in each enhancement at a tidal displacement %of 0.15. We can get at this by using the vector formula described %in class: avec=[1;log(0.15)]; prod=(k'*avec)'*(k'*avec); predvar=prod*var pause %This is the variance in the predicted values - the logarithm %of the predicted enhancements for 1 butanol and t butanol. If %we want the error in the actual enhancement itself, we have to %use the formula for error propagation for a function of a %random variable. In this case the enhancement is just the %exponential of the point predicted by the fitted curve, so the %error calculation is extremely simple. The error in the -logarithm- %of a random variable is approximately equal (for small variations) %to the relative error of the variable itself! Thus: pause relpredstd1=predvar(1)^.5 relpredstdt=predvar(2)^.5 %where these quantities represent the fractional standard deviation %in the enhancement (the standard deviation by the enhancement). %The deviation in each of the predicted enhancements is rather %small, only a bit more than 5% of the predicted enhancement. pause %If we wanted to find out the error in the ratio, we could use the %standard formula for the error in a ratio assuming that the %variability in the two enhancements was independent. This would %give for the error: pause ratio relratiostd=sum(predvar)^.5 %which is pretty large considering that the deviation of the ratio %in enhancement from unity (which provides the selectivity of the %process) is only 0.38. This error, however,is actually incorrect! %This is because there is a significant degree of covariance between %the scatter in the observed enhancements of 1 and t butanol. You %can see this from the original plot of the data, noticing that while %the enhancement about each of the fitted lines has a fair amount of %scatter, the separation between the 1 and t butanol data is pretty %consistent, indicating a constant ratio. How do we deal with this %covariance? pause %The simplest way of doing this is to go back to the original data %and fit a power law model directly to the ratio, rather than the %individual enhancements. In this case, the expected power law %exponent should be zero (the dependence on tidal displacement cancels %out) and the modeling function, evaluated at a tidal displacement %of 0.15 should yield both the correct value and the correct error. %Let's do this: pause %First we set up the new right hand side for the regression problem: bratio=b1-bt %Note that subtraction of logarithms is equivalent to division in %the original enhancement. pause %The regression matrix remains unchanged, so the solution to the %regression problem is just: xratio=k*bratio pause %Let's plot this up: plot(tidal,exp(bratio),'o') set(gca,'FontSize',18) xlabel('tidal displacement') ylabel('selectivity') %We want to compare the data to the power-law model. We clean things %up a bit using the "sort" command: fit=exp(a*xratio); [junk,i]=sort(tidal); %We're after the index i of sorted tidal volumes. hold on plot(tidal(i),fit(i)) hold off %So from the plot we see that the selectivity is a slightly decreasing %function of the tidal displacement, in disagreement with the theory. %Let's see if this discrepancy is statistically significant. pause %To do this we calculate the variability in the ratio (actually the %log of the ratio, remember): ratiovar=(bratio-a*xratio)'*(bratio-a*xratio)/(n-2); %And thus we get the error in the two coefficients: xratiovar=sum(k'.^2)'*ratiovar; xratio(2) expstd=xratiovar(2)^.5 pause %So the decrease in the selectivity with tidal displacement is %significant at the 97.5% confidence level. This is because the %probability of the true exponent being more than two standard %deviations -above- the observed value is about 2.5%. Note that %there is also a 2.5% probability that the exponent is more than %2 sigma -below- the observed mean as well, so the 2 sigma confidence %-interval- is the 95% confidence interval. pause %OK, now let's get the selectivity ratio at the intermediate tidal %displacement: exactratio=exp(avec'*xratio) prod=(k'*avec)'*(k'*avec); exactvar=prod*ratiovar; exactratiorelstd=exactvar^.5 %So the true error in the selectivity ratio is five times smaller %than what we calculated by ignoring the covariance!! pause %The conclusion from all of this is that you have to be very careful %in analyzing your data. If we had just looked at the original slopes %of the enhancement vs. tidal displacement plots, with error, we would %conclude that both of the enhancements were well within statistical %uncertainty of the theoretically predicted slope of two, and even %more so close to each other. This last part would have led us to %the conclusion that selectivity is not affected by the tidal %displacement. A more precise statistical analysis shows just the %opposite - that there is a small but statistically significant %decrease in selectivity with stroke volume! pause %In conclusion, statistics is a rather slippery subject, and you %must understand -exactly- what you are trying to calculate to do it %right. In particular, you must understand how all of the calculated %variables relate to each other, and whether there is any covariance. %Finally, remember that all of this deals with random error only - %if the number of data points is large, the overall error will almost %certainly be dominated by the systematic error in the experiment, %about which these statistical studies can provide no information. pause echo off