clear
echo on
format compact
%This example demonstrates the consequences of
%systematic error in parameter estimation and
%error calculations.  Suppose we are trying to
%determine the position of a ball at some time
%tp.  It is proposed that we do this by making
%a series of measurements at different times t
%near this value, fitting a line to the data, and
%then evaluating this line at time tp.  We shall
%use standard error propagation formulae to determine
%the error in our prediction.
pause

%First we generate the data set artificially:
t=[0:.1:5]';
c=[.1 1 1]';
signoise=.15;
b=[t.^2,t,ones(size(t))]*c+signoise*randn(size(t));
%and we plot this up:
figure(1)
plot(t,b,'o')
set(gca,'FontSize',18)
xlabel('time')
ylabel('position')
pause

%Next we fit a line to the data:
a=[t,ones(size(t))];
k=inv(a'*a)*a';
x=k*b;
figure(1)
hold on
plot(t,a*x,'g')
hold off
%which fits the data pretty well.
pause

%We also calculate the error making the usual
%assumption that 1) the error is random, and 2)
%the model itself is perfect.
r=b-a*x;
n=length(t);
varb=r'*r/(n-2);
varx=k*k'*varb;
sigx=diag(varx).^.5;
%This yields the slope and intercept with one SD error:
[x,sigx]
pause

%The predicted value at tp = 2.5, with error, is:
tp=2.5;
ap=[tp,1];
bp=ap*x;
bpvar=ap*varx*ap';
[bp,bpvar.^.5]
%vs. the actual value:
actual=[tp^2,tp,1]*c
%which is off by:
abs(bp-actual)/bpvar^.5
%which is very large!
pause

%The explanation for this is that we have dramatically
%underestimated our standard deviation by assuming that
%the error is simply random - we have ignored our systematic
%error!  We can see that the systematic error is significant
%by plotting the residuals:
figure(2)
plot(t,r,'og')
set(gca,'FontSize',18)
xlabel('time')
ylabel('residual')
pause

%We can quantify the degree of non-randomness by looking at the
%normalized covariance of adjacent points:
covariance=r(1:n-1)'*r(2:n)/varb/(n-1)
%which is close to unity.  This value would be much smaller if the 
%data were, in fact, random.
pause

%Judging from the plot of residuals, our error cannot be calculated
%correctly using our standard statistical formulae.  Rather than
%the error being random, instead the deviation between the model
%and the data is because we have left something out of the -model-!
%In this case we can get a better estimate (and error estimate) if
%we include the quadratic term in our model:
a=[t.^2,t,ones(size(t))];
ap=[tp.^2,tp,ones(size(tp))];
k=inv(a'*a)*a';
x=k*b;
figure(1)
hold on
plot(t,a*x,'r')
hold off
%which is pretty similar to the linear fit.
pause

%Now for the error calculation and residual:
r=b-a*x;
varb=r'*r/(n-3);
varx=k*k'*varb;
bp=ap*x;
bpvar=ap*varx*ap';
%We have the new predicted value:
[bp,bpvar.^.5]
%vs. the actual value:
actual=[tp^2,tp,1]*c
%which is off by:
abs(bp-actual)/bpvar^.5
%which is now order one!
pause

%We can also plot up the new residual:
figure(2)
hold on
plot(t,r,'or')
hold off
%which is now random, as was assumed in our
%error calculation.  We can also look at the 
%normalized covariance:
covariance=r(1:n-1)'*r(2:n)/varb/(n-1)
%In conclusion, *always* plot your residuals when 
%doing curve fitting and error calculations!
pause
%In general, if you ignore systematic error in the 
%residual, you will -overestimate- the error in the data points
%themselves, but you will (often dramatically) -underestimate-
%the error in the modeling parameters.  That can lead to
%completely incorrect conclusions from your experiments!
%
%           DON'T MAKE THIS ERROR!!!
echo off