clear reset(0) echo on format compact %This example demonstrates the consequences of %systematic error in parameter estimation and %error calculations. Suppose we are trying to %determine the position of a ball at some time %tp. It is proposed that we do this by making %a series of measurements at different times t %near this value, fitting a line to the data, and %then evaluating this line at time tp. We shall %use standard error propagation formulae to determine %the error in our prediction. pause %First we generate the data set artificially: t=[0:.1:5]'; c=[.1 1 1]'; signoise=.1; b=[t.^2,t,ones(size(t))]*c+signoise*randn(size(t)); %and we plot this up: figure(1) plot(t,b,'o') set(gca,'FontSize',18) xlabel('time') ylabel('position') pause %Next we fit a line to the data: a=[t,ones(size(t))]; k=inv(a'*a)*a'; x=k*b; figure(1) hold on plot(t,a*x,'g') hold off %which fits the data pretty well. pause %We also calculate the error making the usual %assumption that 1) the error is random, and 2) %the model itself is perfect. r=b-a*x; n=length(t); varb=r'*r/(n-2); varx=k*k'*varb; sigx=diag(varx).^.5; %This yields the slope and intercept with one SD error: [x,sigx] pause %The predicted value at tp = 2.5, with error, is: tp=2.5; ap=[tp,1]; bp=ap*x; bpvar=ap*varx*ap'; [bp,bpvar.^.5] %vs. the actual value: actual=[tp^2,tp,1]*c %which is off by: abs(bp-actual)/bpvar^.5 %which is very large! pause %The explanation for this is that we have dramatically %underestimated our standard deviation by assuming that %the error is simply random - we have ignored our systematic %error! We can see that the systematic error is significant %by plotting the residuals: figure(2) plot(t,r,'og') set(gca,'FontSize',18) xlabel('time') ylabel('residual') pause %We can quantify the degree of non-randomness by looking at the %normalized covariance of adjacent points: covariance=r(1:n-1)'*r(2:n)/varb/(n-1) %which is close to unity. This value would be much smaller if the %data were, in fact, random. pause %Judging from the plot of residuals, our error cannot be calculated %correctly using our standard statistical formulae. Rather than %the error being random, instead the deviation between the model %and the data is because we have left something out of the -model-! %In this case we can get a better estimate (and error estimate) if %we include the quadratic term in our model: a=[t.^2,t,ones(size(t))]; ap=[tp.^2,tp,ones(size(tp))]; k=inv(a'*a)*a'; x=k*b; figure(1) hold on plot(t,a*x,'r') hold off %which is pretty similar to the linear fit. pause %Now for the error calculation and residual: r=b-a*x; varb=r'*r/(n-3); varx=k*k'*varb; bp=ap*x; bpvar=ap*varx*ap'; %We have the new predicted value: [bp,bpvar.^.5] %vs. the actual value: actual=[tp^2,tp,1]*c %which is off by: abs(bp-actual)/bpvar^.5 %which is now order one! pause %We can also plot up the new residual: figure(2) hold on plot(t,r,'or') hold off %which is now random, as was assumed in our %error calculation. We can also look at the %normalized covariance: covariance=r(1:n-1)'*r(2:n)/varb/(n-1) %In conclusion, *always* plot your residuals when %doing curve fitting and error calculations! pause %In general, if you ignore systematic error in the %residual, you will -overestimate- the error in the data points %themselves, but you will (often dramatically) -underestimate- %the error in the modelling parameters. That can lead to %completely incorrect conclusions from your experiments! % % DON'T MAKE THIS ERROR!!! echo off