%% Demonstration of Linear Regression / Error Analysis
% In senior lab students study the velocity due to natural convection
% from a heated wire.  There are many aspects to this experiments, but one
% result is the magnitude of the centerline velocity as a function of
% height above the wire at constant power.  Theory predicts that this
% should obey a power law, going as height ^ (1/5).  The question you can
% use regression to answer is whether the data is consistent with this
% theory.
%% Data
% Let z be the height above the wire and u be the observed velocities, all
% in cgs units.  We have from their measurements:

z = [1.49 2.76 4.03 5.30]'; % Make it a column vector!
u = [0.486 0.554 0.545 0.547]';

% and we plot it up:

figure(1)
loglog(z,u,'*')
grid on
xlabel('z (cm)')
ylabel('u (cm/s)')

%% Regression
% In linear regression, the model must be linear in the modeling parameters
% (not necessarily a line!), so for a power law we transform the model by
% taking the natural log.  Our matrix A is the matrix of modeling function
% (evaluated at different z) and our "right hand side" vector b is the data
% observed at each z.  Our vector x is the fitting parameters associated
% with the modeling functions.  So:

A = [log(z),ones(size(z))];
b = log(u);

x = A\b % Solved in a least squares sense!

figure(1)
hold on
plot(z,exp(A*x),'k')
hold off
legend('data','power law fit','Location','SouthEast')

%% Error Analysis
% It is always more complicated to calculate the uncertainty in a quantity
% than to get it the first time!  Here we use the standard method for
% getting the uncertainty in the slope (e.g., x(1) from our model) and
% calculate the 95% confidence interval based on the assumptions that:
%
% 1) The model is perfect: it really is a power law
%
% 2) The error in the log of the velocities is random and independent
%
% 3) The data points all have the same variance
%
% Note that all of these assumptions are questionable, which is why
% statistics is so slippery!
%
% We do the error calculation by applying the normal equations (which would
% yield the same values of x!) but which are more convenient for the error
% calculation:

k = inv(A'*A)*A';
x = k*b % Same values as before!

[n m] = size(A)

% The residual:
r = A*x - b;

% The data variance:
varb = r'*r/(n-m);

% The matrix of covariance of x:
varx = k*varb*k'

% The standard deviation of the slope x(1):
sigslope = varx(1,1)^.5

%% The Confidence Interval
% For a large number of data points, the 95% confidence interval is just
% +/- 2 sigma.  Because the number of degrees of freedom is so small,
% however, the confidence interval is governed by the t-distribution rather
% than the normal distribution.  You can look this up in tables, but matlab
% has the built in function tinv (inverse of the t distribution).  So:

confint = x(1) + sigslope*tinv([0.025 0.975],n-m)

%% Conclusion
% If you had just taken the +/- 2 sigma bounds, the theoretical slope of
% 0.2 would lie outside the bounds by a little bit.  Because of the very
% small number of data points, however, and the significant scatter, the
% 95% confidence interval contains the theoretical prediction.  This is
% true even at the 87% confidence level.  Because of the broad tails of the
% t-distribution for small numbers of degrees of freedom, you really need
% to have a larger number of data points to get a high percentage
% confidence interval.

conflevel = 1-2*tcdf((x(1)-0.2)/sigslope,n-m)