clear format compact echo on %%Monte Carlo Simulation %In this example, we look at the approach of Monte Carlo simulation for %error estimation. This method requires knowledge of the residual (error) %in the data set. If we know this (via calculation or via other %experiments) we can create "artificial" data sets with data that has an %added error characterized by this residual. We determine the fitting %parameters for these, average them together, and compute the statistics. %We use the "ball in air" problem again: t=[0:.1:1]'; %and we generate some "data": xexact=[0,2,-2]'; %The exact values a=[ones(size(t)),t,t.^2]; noise=0.05; %The amplitude of the noise b=a*xexact+noise*randn(size(t)); %Our artificial data set. pause %%Basic Regression: %We solve this using the usual regression formula: k=inv(a'*a)*a'; x=k*b %We calculate the error in the usual way: r=a*x-b; varb=r'*r/(length(r)-3); %Three degrees of freedom lost sigb=varb^.5 %Which is (usually) close to the value of noise we put in varx=k*varb*k' %and in particular we have the 2-sigma confidence intervals of the x %values: xinterval=[x-2*diag(varx).^.5,x+2*diag(varx).^.5] %which usually contains the exact values: xexact %Where probabilities are governed by the t-distribution: prob=tcdf(2,length(b)-3)-tcdf(-2,length(b)-3) %So the 2-sigma probability in the 90% range (depending on n-m). %Now let's plot this up. We want some smooth plotting: tp=[0:.01:1]'; ap=[ones(size(tp)),tp,tp.^2]; bp=ap*x; sigbp=diag(ap*varx*ap').^.5; figure(1) plot(t,b,'o',tp,ap*xexact,'k',tp,bp,tp,bp+sigbp,':r',tp,bp-sigbp,':r') set(gca,'FontSize',14) xlabel('time') ylabel('position') legend('data','exact','model','1sig confidence interval','Location','South') pause %%Monte Carlo %Now we estimate our fitting parameters from Monte Carlo simulation. We %want to use a fair number of samples to get reasonable statistics. [n m]=size(a); xmcsqall=zeros(m,m); %We initialize the array. xmcall=zeros(m,1); nmc=100 for i=1:nmc bmc=b+randn(n,1)*sigb; %We add in noise based on our residuals xmc=k*bmc; %We use all the same times, so k doesn't change. xmcall=xmcall+xmc; xmcsqall=xmcsqall+xmc*xmc'; echo off end echo on %Now we calculate the mean and matrix of covariance of these values: xmc=xmcall/nmc %And the covariance: varxmc=(xmcsqall-nmc*xmc*xmc')/(nmc-1) %And that generates the matrix. pause %Let's compare this to the values we obtained using the error propagation %formula: var_ratio=varxmc./varx %Which is (usually) close to one - it will change every time you run it. %You don't need a large number of data points, but you do need to have a %large number of monte carlo simulation runs! pause %We can also add this to our plot: bpmc=ap*xmc; sigbpmc=diag(ap*varxmc*ap').^.5; figure(2) plot(t,b,'o',tp,ap*xexact,'k',tp,bp,tp,bp+sigbp,':r',tp,bpmc+sigbpmc,'--g',tp,bp-sigbp,':r',tp,bpmc-sigbpmc,'--g') set(gca,'FontSize',14) xlabel('time') ylabel('position') legend('data','exact','model','normal error','monte carlo error','Location','South') pause %Monte Carlo Simulation is an easy technique for estimating the statistics %of the fitting parameters. Unlike other resampling techniques, however, %it does require knowledge of the residual: exactly the same information %required of the normal error propagation formulas. The computational %requirement is much higher than that of other methods (at least 100 %simulations for decent statistics), and yields the exact same matrix %of covariance (if the number of simulations is high enough). There are %two advantages: it does not require taking any gradients (this may be %significant in non-linear regression, but is irrelevant in linear %regression), and it can yield the distribution of the fitting parameters %(more than just the covariance). Of the resampling approaches, it is the %only one which is suitable for small data sets. echo off