clear
echo on
format compact
%In this example we shall do a statistical analysis
%of the consumption of energy in the United States
%broken down into the source of the energy.  We
%shall analyze the data for 1993 from the US
%Department of Commerce's Economic Bulletin Board
%(you can download this stuff off of the net).  The
%data is given below:
pause

%1993 consumption by month, 10^15 BTU
%                  coal         nat gas        oil           nuclear
% January....      1.660         2.357         2.697          .631   
% February...      1.540         2.235         2.611          .548   
% March......      1.609         2.205         2.931          .498   
% April......      1.442         1.731         2.708          .461   
% May........      1.448         1.338         2.753          .538   
% June.......      1.618         1.328         2.759          .562   
% July.......      1.840         1.388         2.894          .603   
% August.....      1.823         1.405         2.890          .600   
% September..      1.580         1.315         2.848          .534   
% October....      1.566         1.533         2.889          .474   
% November...      1.584         1.819         2.869          .500   
% December...      1.720         2.192         2.994          .567   
pause
%and now for the other half:
%
%                   hydro      geothermal       other
% January....       .278          .014          .006
% February...       .229          .013          .001
% March......       .267          .014          .005
% April......       .278          .014          .004
% May........       .315          .012          .004
% June.......       .287          .012          .004
% July.......       .275          .013          .001
% August.....       .245          .014          .004
% September..       .212          .013          .001
% October....       .208          .013          .003
% November...       .213          .013          .002
% December...       .248          .013          .004
pause
%In order to do anything with this, we first put
%the data into a format that can be read by matlab.
%The data is saved under the name energy.dat as a
%12x7 array (you will have to clip off the bit at
%the bottom and save it yourself).  We thus read it
%all in:
pause
load energy.dat
month = [1:12]-.5;
figure(1)
semilogy(month,energy)
set(gca,'FontSize',18)
xlabel('month')
ylabel('energy use, 10^1^5 BTU')
legend('coal','nat gas','oil','nuclear','hydro','geothermal','other',-1)
title('US Monthly Energy Usage, 1993')
axis([0,12,0,10])
grid on
pause
%From the plot, we can see that oil usage is nearly
%flat, natural gas peaks during the winter, coal
%peaks during the summer, and nuclear peaks during
%both summer and winter.  Intriguingly, hydroelectric
%peaks during the late spring - this corresponds to
%the spring snow melt in the mountains.  The other
%sources are insignificant.
pause
%We want to calculate the average fuel usage per
%month from all sources, and look at the covariance
%as well.  Let's do this:
[n m]=size(energy);
mean=sum(energy)/n;
echo off
disp('     coal     nat gas     oil     nuclear   hydro   geothermal  other')
disp(mean)
echo on
pause
for i=1:m
 for j=1:m
  var(i,j)=(energy(:,i)'*energy(:,j)-mean(i)*mean(j)*n)/(n-1);
  echo off
 end
end
echo on
%This produces the covariance matrix:
echo off
disp('     coal     nat gas     oil     nuclear   hydro   geothermal  other')
disp(var)
echo on
pause
%The matrix actually makes more sense if we normalize
%the elements with the values standard deviation of
%the ith and jth variables.  The diagonal elements
%should thus just become unity, and the off-diagonal
%elements compare the magnitude of the covariance to
%the variability of each variable.  Perfect covariance
%in this formulation just yields unity.  Thus:
std=diag(var).^.5;
relvar=var./(std*std');
echo off
disp('     coal     nat gas     oil     nuclear   hydro   geothermal  other')
disp(relvar)
echo on
pause
%Looking at this matrix we conclude that the strongest
%positive correlations are between coal and nuclear
%energy sources (the first and fourth columns).  This
%makes sense, because both are used in the production 
%of electricity.
pause
%We can also examine the variability in each type of
%energy source.  In this case, it makes sense to normalize
%the variance with the mean usage.  The standard deviation
%of each is thus given by:
normsig=diag(var./(mean'*mean))'.^0.5;
echo off
disp('     coal     nat gas     oil     nuclear   hydro   geothermal  other')
disp(normsig)
echo on
%Examination of this array shows that the largest 
%variability in energy usage is in natural gas, 
%closely followed by hydro electric power.  Both of
%these sources are highly seasonal.  The variability
%in "other" isn't really significant as the source
%is so small.
pause
%OK, now let's calculate something.  Let's look at the
%ratio of the average monthly use of hydroelectric
%power to the consumption of fossil fuels.  This
%would be the ratio of the fifth column to the sum of
%the first three. If all we kept was the covariance
%matrix and the means we could still calculate this:
ratio=mean(5)/(sum(mean(1:3)))
%which isn't very big!
pause
%To calculate the variance in the ratio we need the
%derivatives of the ratio with respect to the variables.
%We shall do this numerically:
eps = 1e-8;
for i=1:m
 tmean=mean;
 tmean(i)=mean(i)+eps;
 tratio=tmean(5)/(sum(tmean(1:3)));
 deriv(i)=(tratio-ratio)/eps;
 echo off
end
echo on
pause
%This produces the derivative of the ratio:
deriv
%and we can use this to calculate the variance:
ratiovar=deriv*var*deriv'
%and the relative standard deviation:
relstd=ratiovar^.5/ratio
%which is pretty small.
pause
%Since the data itself is in the computer, we can
%calculate this more exactly:
ratiovec=energy(:,5)./sum(energy(:,1:3)')';
%where I have played around with the transpose command
%and the definition of 'sum' to compress the code a bit.
%We thus get the average value:
exactratio=sum(ratiovec)/n
%and the variance:
exactvar=(ratiovec'*ratiovec-exactratio^2*n)/(n-1)
%and the relative standard deviation:
exactrelstd=exactvar^.5/exactratio
%which is close to that calculated by the formula above.
pause
%The deviations are:
ratio/exactratio-1
relstd/exactrelstd-1
%which are only a few percent off.  The discrepancy
%arises because the variability in each variable feeding
%into the ratio is not vanishingly small with respect
%to each mean.
echo off