clear reset(0) echo on %In this example we shall do a statistical analysis %of the consumption of energy in the United States %broken down into the source of the energy. We %shall analyze the data for 1993 from the US %Department of Commerce's Economic Bulletin Board %(you can download this stuff off of the net). The %data is given below: pause %1993 consumption by month, 10^15 BTU % coal nat gas oil nuclear % January.... 1.660 2.357 2.697 .631 % February... 1.540 2.235 2.611 .548 % March...... 1.609 2.205 2.931 .498 % April...... 1.442 1.731 2.708 .461 % May........ 1.448 1.338 2.753 .538 % June....... 1.618 1.328 2.759 .562 % July....... 1.840 1.388 2.894 .603 % August..... 1.823 1.405 2.890 .600 % September.. 1.580 1.315 2.848 .534 % October.... 1.566 1.533 2.889 .474 % November... 1.584 1.819 2.869 .500 % December... 1.720 2.192 2.994 .567 pause %and now for the other half: % % hydro geothermal other % January.... .278 .014 .006 % February... .229 .013 .001 % March...... .267 .014 .005 % April...... .278 .014 .004 % May........ .315 .012 .004 % June....... .287 .012 .004 % July....... .275 .013 .001 % August..... .245 .014 .004 % September.. .212 .013 .001 % October.... .208 .013 .003 % November... .213 .013 .002 % December... .248 .013 .004 pause %In order to do anything with this, we first put %the data into a format that can be read by matlab. %The data is saved under the name energy.dat as a %12x7 array (you will have to clip off the bit at %the bottom and save it yourself). We thus read it %all in: pause load energy.dat month = [1:12]; figure(1) plot(month,energy) set(gca,'FontSize',18) xlabel('month') ylabel('energy use, 10^{15} BTU') legend('coal','nat gas','oil','nuclear','hydro','geothermal','other',-1) pause %From the plot, we can see that oil usage is nearly %flat, natural gas peaks during the winter, coal %peaks during the summer, and nuclear peaks during %both summer and winter. Intriguingly, hydroelectric %peaks during the late spring - this corresponds to %the spring snow melt in the mountains. The other %sources are insignificant. pause %We want to calculate the average fuel usage per %month from all sources, and look at the covariance %as well. Let's do this: [n m]=size(energy); mean=sum(energy)/n; echo off disp(' coal nat gas oil nuclear hydro geothermal other') disp(mean) echo on pause for i=1:m for j=1:m var(i,j)=(energy(:,i)'*energy(:,j)-mean(i)*mean(j)*n)/(n-1); echo off end end echo on %This produces the covariance matrix: echo off disp(' coal nat gas oil nuclear hydro geothermal other') disp(var) echo on pause %The matrix actually makes more sense if we normalize %the elements with the values standard deviation of %the ith and jth variables. The diagonal elements %should thus just become unity, and the off-diagonal %elements compare the magnitude of the covariance to %the variability of each variable. Perfect covariance %in this formulation just yields unity. Thus: std=diag(var).^.5; relvar=var./(std*std'); echo off disp(' coal nat gas oil nuclear hydro geothermal other') disp(relvar) echo on pause %Looking at this matrix we conclude that the strongest %positive correlations are between coal and nuclear %energy sources (the first and fourth columns). This %makes sense, because both are used in the production %of electricity. pause %We can also examine the variability in each type of %energy source. In this case, it makes sense to normalize %the variance with the mean usage. The standard deviation %of each is thus given by: normsig=diag(var./(mean'*mean))'.^0.5; echo off disp(' coal nat gas oil nuclear hydro geothermal other') disp(normsig) echo on %Examination of this array shows that the largest %variability in energy usage is in natural gas, %closely followed by hydro electric power. Both of %these sources are highly seasonal. The variability %in "other" isn't really significant as the source %is so small. pause %OK, now let's calculate something. Let's look at the %ratio of the average monthly use of hydroelectric %power to the consumption of fossil fuels. This %would be the ratio of the fifth column to the sum of %the first three. If all we kept was the covariance %matrix and the means we could still calculate this: ratio=mean(5)/(sum(mean(1:3))) %which isn't very big! pause %To calculate the variance in the ratio we need the %derivatives of the ratio with respect to the variables. %We shall do this numerically: eps = 1e-8; for i=1:m tmean=mean; tmean(i)=mean(i)+eps; tratio=tmean(5)/(sum(tmean(1:3))); deriv(i)=(tratio-ratio)/eps; echo off end echo on pause %This produces the derivative of the ratio: deriv %and we can use this to calculate the variance: ratiovar=deriv*var*deriv' %and the relative standard deviation: relstd=ratiovar^.5/ratio %which is pretty small. pause %Since the data itself is in the computer, we can %calculate this more exactly: ratiovec=energy(:,5)./sum(energy(:,1:3)')'; %where I have played around with the transpose command %and the definition of 'sum' to compress the code a bit. %We thus get the average value: exactratio=sum(ratiovec)/n %and the variance: exactvar=(ratiovec'*ratiovec-exactratio^2*n)/(n-1) %and the relative standard deviation: exactrelstd=exactvar^.5/exactratio %which is close to that calculated by the formula above. pause %The deviations are: ratio/exactratio-1 relstd/exactrelstd-1 %which are only a few percent off. The discrepancy %arises because the variability in each variable feeding %into the ratio is not vanishingly small with respect %to each mean. echo off