clear format compact echo on %%The Bootstrap %In this example, we demonstrate the bootstrap technique to estimate the %matrix of covariance. It is similar to the jacknife, except this time we %analyze the data by taking a random sample of data drawn from an %infinitely replicated data set. The assumption is that our data is drawn %from an infinite array of possible observations, and that the actual data %is a good representation of this data set. %We use the "ball in air" problem again: t=[0:.02:1]'; %and we generate some "data": xexact=[0,2,-2]'; %The exact values a=[ones(size(t)),t,t.^2]; noise=0.05; %The amplitude of the noise b=a*xexact+noise*randn(size(t)); %Our artificial data set. pause %%Basic Regression: %We solve this using the usual regression formula: k=inv(a'*a)*a'; x=k*b %We calculate the error in the usual way: r=a*x-b; varb=r'*r/(length(r)-3); %Three degrees of freedom lost sigb=varb^.5 %Which is (usually) close to the value of noise we put in varx=k*varb*k' %and in particular we have the 2-sigma confidence intervals of the x %values: xinterval=[x-2*diag(varx).^.5,x+2*diag(varx).^.5] %which usually contains the exact values: xexact %Where probabilities are governed by the t-distribution: prob=tcdf(2,length(b)-3)-tcdf(-2,length(b)-3) %So the 2-sigma probability in the 90% range (depending on n-m). %Now let's plot this up. We want some smooth plotting: tp=[0:.01:1]'; ap=[ones(size(tp)),tp,tp.^2]; bp=ap*x; sigbp=diag(ap*varx*ap').^.5; figure(1) plot(t,b,'o',tp,ap*xexact,'k',tp,bp,tp,bp+sigbp,':r',tp,bp-sigbp,':r') set(gca,'FontSize',14) xlabel('time') ylabel('position') legend('data','exact','model','1sig confidence interval','Location','South') pause %%The Bootstrap %Now we look at how to estimate the matrix of covariance using the %bootstrap approach. We keep our matrix a, but we "sample" it nbs times. %For this to work, nbs has to be a pretty large number. [n m]=size(a); xbssqall=zeros(m,m); %We initialize the array. xbsall=zeros(m,1); nbs=100; for i=1:nbs ikeep=ceil(rand(n,1)*n); %The indices we keep this time around a_bs=a(ikeep,:); b_bs=b(ikeep); xbs=a_bs\b_bs; xbsall=xbsall+xbs; xbssqall=xbssqall+xbs*xbs'; echo off end echo on %Now we calculate the mean and matrix of covariance of these values: xbs=xbsall/nbs %And the covariance: varxbs=(xbssqall-nbs*xbs*xbs')/(nbs-1) %And that generates the matrix, without having to assume anything about the %matrix of covariance of b - other than assuming that the data points are %independent, of course. pause %Let's compare this to the values we obtained using the error propagation %formula: var_ratio=varxbs./varx %Which is (usually) close to one - it will change every time you run it. If %you have a large number of data points it will converge exactly to one, as %expected, but it takes about a thousand or so. pause %We can also add this to our plot: bpbs=ap*xbs; sigbpbs=diag(ap*varxbs*ap').^.5; figure(2) plot(t,b,'o',tp,ap*xexact,'k',tp,bp,tp,bp+sigbp,':r',tp,bpbs+sigbpbs,'--g',tp,bp-sigbp,':r',tp,bpbs-sigbpbs,'--g') set(gca,'FontSize',14) xlabel('time') ylabel('position') legend('data','exact','model','normal error','bootstrap error','Location','South') pause %The bootstrap is an interesting approach, but it suffers from the problem %that you have to solve the problem a fairly large number of times to get a %reasonable estimate of the variance. Even more than the jacknife, it runs %into trouble with small data sets: there is a finite probability that all %n data points it picks will be the same one, leading to a degenerate %matrix and lots of warning messages! The variance calculated in this way %is usually higher than the "correct" variance because of this effect. If %both n and nbs are large, however, the variances will be the same. echo off