clear echo on format compact %In this example we shall do a statistical analysis of corn production in %the five state area of Illinois, Indiana, Iowa, Missouri, and Nebraska. %The data is accessible from the USDA website at: % % http://quickstats.nass.usda.gov/ % %Putting in the appropriate request yields an exel file which can be %trimmed to yield: % Year State Data Item Value % 2014 ILLINOIS CORN, GRAIN - PRODUCTION, MEASURED IN BU 2,350,000,000 % 2014 INDIANA CORN, GRAIN - PRODUCTION, MEASURED IN BU 1,084,760,000 % 2014 IOWA CORN, GRAIN - PRODUCTION, MEASURED IN BU 2,367,400,000 % 2014 MINNESOTA CORN, GRAIN - PRODUCTION, MEASURED IN BU 1,177,800,000 % 2014 NEBRASKA CORN, GRAIN - PRODUCTION, MEASURED IN BU 1,602,050,000 % 2013 ILLINOIS CORN, GRAIN - PRODUCTION, MEASURED IN BU 2,100,400,000 % 2013 INDIANA CORN, GRAIN - PRODUCTION, MEASURED IN BU 1,031,910,000 % 2013 IOWA CORN, GRAIN - PRODUCTION, MEASURED IN BU 2,140,200,000 % 2013 MINNESOTA CORN, GRAIN - PRODUCTION, MEASURED IN BU 1,294,260,000 % 2013 NEBRASKA CORN, GRAIN - PRODUCTION, MEASURED IN BU 1,613,950,000 % 2012 ILLINOIS CORN, GRAIN - PRODUCTION, MEASURED IN BU 1,286,250,000 % 2012 INDIANA CORN, GRAIN - PRODUCTION, MEASURED IN BU 596,970,000 % 2012 IOWA CORN, GRAIN - PRODUCTION, MEASURED IN BU 1,876,900,000 % 2012 MINNESOTA CORN, GRAIN - PRODUCTION, MEASURED IN BU 1,374,450,000 % 2012 NEBRASKA CORN, GRAIN - PRODUCTION, MEASURED IN BU 1,292,200,000 % 2011 ILLINOIS CORN, GRAIN - PRODUCTION, MEASURED IN BU 1,938,950,000 % 2011 INDIANA CORN, GRAIN - PRODUCTION, MEASURED IN BU 839,500,000 % 2011 IOWA CORN, GRAIN - PRODUCTION, MEASURED IN BU 2,356,400,000 % 2011 MINNESOTA CORN, GRAIN - PRODUCTION, MEASURED IN BU 1,193,500,000 % 2011 NEBRASKA CORN, GRAIN - PRODUCTION, MEASURED IN BU 1,536,000,000 % 2010 ILLINOIS CORN, GRAIN - PRODUCTION, MEASURED IN BU 1,946,800,000 % 2010 INDIANA CORN, GRAIN - PRODUCTION, MEASURED IN BU 898,040,000 % 2010 IOWA CORN, GRAIN - PRODUCTION, MEASURED IN BU 2,153,250,000 % 2010 MINNESOTA CORN, GRAIN - PRODUCTION, MEASURED IN BU 1,292,100,000 % 2010 NEBRASKA CORN, GRAIN - PRODUCTION, MEASURED IN BU 1,469,100,000 % 2009 ILLINOIS CORN, GRAIN - PRODUCTION, MEASURED IN BU 2,053,200,000 % 2009 INDIANA CORN, GRAIN - PRODUCTION, MEASURED IN BU 933,660,000 % 2009 IOWA CORN, GRAIN - PRODUCTION, MEASURED IN BU 2,407,300,000 % 2009 MINNESOTA CORN, GRAIN - PRODUCTION, MEASURED IN BU 1,244,100,000 % 2009 NEBRASKA CORN, GRAIN - PRODUCTION, MEASURED IN BU 1,584,150,000 % 2008 ILLINOIS CORN, GRAIN - PRODUCTION, MEASURED IN BU 2,130,100,000 % 2008 INDIANA CORN, GRAIN - PRODUCTION, MEASURED IN BU 873,600,000 % 2008 IOWA CORN, GRAIN - PRODUCTION, MEASURED IN BU 2,188,800,000 % 2008 MINNESOTA CORN, GRAIN - PRODUCTION, MEASURED IN BU 1,166,400,000 % 2008 NEBRASKA CORN, GRAIN - PRODUCTION, MEASURED IN BU 1,393,650,000 % 2007 ILLINOIS CORN, GRAIN - PRODUCTION, MEASURED IN BU 2,283,750,000 % 2007 INDIANA CORN, GRAIN - PRODUCTION, MEASURED IN BU 980,980,000 % 2007 IOWA CORN, GRAIN - PRODUCTION, MEASURED IN BU 2,376,900,000 % 2007 MINNESOTA CORN, GRAIN - PRODUCTION, MEASURED IN BU 1,146,100,000 % 2007 NEBRASKA CORN, GRAIN - PRODUCTION, MEASURED IN BU 1,472,000,000 % 2006 ILLINOIS CORN, GRAIN - PRODUCTION, MEASURED IN BU 1,817,450,000 % 2006 INDIANA CORN, GRAIN - PRODUCTION, MEASURED IN BU 844,660,000 % 2006 IOWA CORN, GRAIN - PRODUCTION, MEASURED IN BU 2,050,100,000 % 2006 MINNESOTA CORN, GRAIN - PRODUCTION, MEASURED IN BU 1,102,850,000 % 2006 NEBRASKA CORN, GRAIN - PRODUCTION, MEASURED IN BU 1,178,000,000 % 2005 ILLINOIS CORN, GRAIN - PRODUCTION, MEASURED IN BU 1,708,850,000 % 2005 INDIANA CORN, GRAIN - PRODUCTION, MEASURED IN BU 888,580,000 % 2005 IOWA CORN, GRAIN - PRODUCTION, MEASURED IN BU 2,162,500,000 % 2005 MINNESOTA CORN, GRAIN - PRODUCTION, MEASURED IN BU 1,191,900,000 % 2005 NEBRASKA CORN, GRAIN - PRODUCTION, MEASURED IN BU 1,270,500,000 % 2004 ILLINOIS CORN, GRAIN - PRODUCTION, MEASURED IN BU 2,088,000,000 % 2004 INDIANA CORN, GRAIN - PRODUCTION, MEASURED IN BU 929,040,000 % 2004 IOWA CORN, GRAIN - PRODUCTION, MEASURED IN BU 2,244,400,000 % 2004 MINNESOTA CORN, GRAIN - PRODUCTION, MEASURED IN BU 1,120,950,000 % 2004 NEBRASKA CORN, GRAIN - PRODUCTION, MEASURED IN BU 1,319,700,000 pause %In order to do anything with this, we first put the data into a format %that can be read by matlab. In this case, we can just define the data in %the script. Alternatively, you can read it from an excel file, a data %file, or any other method you like! Just make sure you get rid of the %commas! echo off data=[2350000000 1084760000 2367400000 1177800000 1602050000 2100400000 1031910000 2140200000 1294260000 1613950000 1286250000 596970000 1876900000 1374450000 1292200000 1938950000 839500000 2356400000 1193500000 1536000000 1946800000 898040000 2153250000 1292100000 1469100000 2053200000 933660000 2407300000 1244100000 1584150000 2130100000 873600000 2188800000 1166400000 1393650000 2283750000 980980000 2376900000 1146100000 1472000000 1817450000 844660000 2050100000 1102850000 1178000000 1708850000 888580000 2162500000 1191900000 1270500000 2088000000 929040000 2244400000 1120950000 1319700000]; echo on year=[2014:-1:2004]; corn=reshape(data,5,11)'; %The corn production in bushels. %We now plot this up: figure(1) plot(year,corn/1e9) set(gca,'FontSize',18) xlabel('year') ylabel('10^9 bushels') legend('Illinois','Indiana','Iowa','Minnesota','Nebraska') title('Annual Corn Production by State') axis([2004 2014 0 3]) grid on pause %From the plot, we can see that Iowa is the biggest producer of corn, %closely followed by Illinois. Also, 2012 was a really low production %year. % %We want to determine the matrix of covariance of these states. %First, we calculate the mean: [n m]=size(corn); mf=sum(corn)/n; %We could have gotten this by the command mf=mean(corn) too. %Now for the covariance: for i=1:m for j=1:m var(i,j)=(corn(:,i)'*corn(:,j)-mf(i)*mf(j)*n)/(n-1); echo off end end echo on %This produces the covariance matrix: echo off disp(' Illinois Indiana Iowa Minnesota Nebraska') disp(var) echo on pause %We could also do this without looping, if we use matrix multiplication. %For large matrices this is a much faster operation. First we subtract off %the mean: dev=corn-mean(corn); %subtracting the average along the columns varoneline=dev'*dev/(n-1); %we take the dot product between columns %and we look at the ratio: var./varoneline %so they match! pause %The matrix of variances actually makes more sense if we normalize the %elements with the values standard deviation of the ith and jth variables. %The diagonal elements should thus just become unity, and the off-diagonal %elements compare the magnitude of the covariance to the variability of %each variable. Perfect covariance in this formulation just yields unity. std=diag(var).^.5; relvar=var./(std*std'); echo off disp(' Illinois Indiana Iowa Minnesota Nebraska') disp(relvar) echo on pause %Looking at this matrix we conclude that Illinois production is almost %perfectly correlated with that of Indiana, with Iowa a close second. %Minnesota is negatively correlated with all states except Nebraska. The %variability is due to two effects: the acreage planted and the yield/acre. %While we could tease apart these two effects by looking at other data %fields, of course, the positive correlation between neighboring states %such as Illinois and Indiana and negative correlation between Illinois and %Minnesota suggests that yield has a larger effect here. 2012 was the year %of the summer drought that had a big effect in our area, and less so in %the more northerly Minnesota. pause %It is useful to examine the variability in each state's production. The %standard deviation of each is thus given by: normsig=diag(var./(mf'*mf))'.^0.5; echo off disp(' Illinois Indiana Iowa Minnesota Nebraska') disp(normsig) echo on %Here we see that the relative variability is highest in Illinois and %Indiana, in part due to the big drought. Minnesota, which was not greatly %affected by the drought, had the smallest variability. pause %OK, now let's calculate something. Let's look at the ratio of the %production of corn in Indiana to the production of the five top states %combined: ratio=mf(2)/(sum(mf(1:5))) %which shows that Indiana production is a pretty small part of the total. pause %To calculate the variance in the ratio we need the %derivatives of the ratio with respect to the variables. %We shall do this numerically: dratio=zeros(size(mf)); for i=1:m tmf=mf; %In taking derivatives numerically, it is important to scale the step size %appropriately: too small leads to round off errors (particularly when the %function isn't determined all that accurately) and too large yields %algorithm errors. For this class of problems (statistics calculations) %the best choice is to scale the step size with the standard deviation of %the variable in question: sigs=diag(var).^.5; dmf=max(0.01*sigs(i),1e-8); %We put a lower bound on it too! tmf(i)=mf(i)+dmf; dratio(i)=(tmf(2)/(sum(tmf(1:5)))-ratio)/dmf; echo off end echo on pause %This produces the derivative of the ratio: dratio %and we can use this to calculate the variance: ratiovar=dratio*var*dratio' %and the relative standard deviation: relstd=ratiovar^.5/ratio %which is pretty small. Note that this is much smaller than the relative %standard deviation of Indiana alone, because of the positive covariance of %both the numerator and denominator! pause %Since the data itself is in the computer, we can %calculate this more exactly: ratiovec=corn(:,2)./sum(corn(:,1:5)')'; %where I have played around with the transpose command %and the definition of 'sum' to compress the code a bit. %We thus get the average value: exactratio=sum(ratiovec)/n %and the variance: exactvar=(ratiovec'*ratiovec-exactratio^2*n)/(n-1) %and the relative standard deviation: exactrelstd=exactvar^.5/exactratio %which is a bit higher than the formula given above. pause %The deviations are: ratio/exactratio-1 relstd/exactrelstd-1 %the ratio is pretty close (within 1% of the values calculated from the %individual data points), but the standard deviation is off by 8%. This %is because the variability in the production is actually fairly large with %respect to the mean values, not small as the formula requires. Still, %even though the relative standard deviation is around 14%, we do a %reasonable job calculating the standard deviation of the ratio with the %formula. echo off