echo on %% Tutorial 4. Multivariate Error Calculations. % In this tutorial, we study the variability in the production of corn (in % bushels) between Illinois and Iowa from 2004-2014. During this period, the % production in these two states was very well correlated due to weather % and pricing, which are factors that have a significant effect on how much % corn is planted during harvest. The original data set can be found in the % following URL: http://quickstats.nass.usda.gov/ % % Covariance is a concept used in mathematics to understand the variation % between two random variables. The goal of the demonstration will be to % illustrate how to calculate a covariance matrix for both quantities (i.e. % corn produced in Illinois and Iowa) and creating a contour map for % displaying bivariate error calculations using a statistical measure. %% %% 1. Generating the data set and calculating the covariance matrix. % In this part of the script, we will start by constructing an array from % the given data set and then calculating the covariance matrix. clear clc % Create an array containing two columns, one for corn production quantities % in Illinois and another for Iowa. Data = [2350000000 2367400000; ... 2100400000 2140200000; ... 1286250000 1876900000; ... 1938950000 2356400000; ... 1946800000 2153250000; ... 2053200000 2407300000; ... 2130100000 2188800000; ... 2283750000 2376900000; ... 1817450000 2050100000; ... 1708850000 2162500000; ... 2088000000 2244400000]; % We will start by redefining the first and second columns of our original % array as the x and y variables that we are analyzing. x = Data(:,1); y = Data(:,2); % We also have the years corresponding to the data: year = [2014:-1:2004]; % We need to calculate the means for x and y that we will substitute into the % covariance formula. xmean = mean(x); ymean = mean(y); % Now, we compute the matrix of covariance. n = length(Data); covmat = [mean((x - xmean).^2),mean((x - xmean) .* (y - ymean)); ... mean((x - xmean) .* (y - ymean)),mean((y - ymean).^2)] * n / (n-1); %% 2. Visualizing the covariance matrix. % In order to display the data, we will create a contour map displaying % an ellipse representing 2-sigma error for our initial data set. % We need to select the appropriate x, y and z ranges to generate the ellipse. xrange = linspace(1.2e9,2.6e9,250); yrange = linspace(1.8e9,2.6e9,250); z_sq=zeros(length(xrange),length(yrange)); % Compute the inverse of the covariance matrix. varinv=inv(covmat); % In this step, we compute z^2 values for an ellipse that will be displayed % in the contour map. for j=1:length(xrange) for k=1:length(yrange) z_sq(j,k) = [xrange(j)-xmean,yrange(k)-ymean] * varinv * [xrange(j)-xmean;yrange(k)-ymean]; echo off end end echo on % The last step is to construct the plot, which will require us to take all % x, y and z points and locate the 2-sigma points, which need to be equal % to 4 since the ellipse equation is giving us values for z^2 and not z. We % will plot the 2-sigma ellipse versus the original data set. figure(1) contour(xrange,yrange,z_sq',[4 4]) l = legend('2\sigma error ellipse','location','best'); set(l,'FontSize',14) hold on plot(x,y,'.','MarkerSize',20) title('Variability in yearly corn production between Illinois and Iowa from 2004-2014') xlabel('Corn Production in Illinois (Bushels)') ylabel('Corn Production in Iowa (Bushels)') %% 3. Analyzing the covariance graph % The data clearly indicates a positive covariance, likely due to the % similar climates of Illinois and Iowa. You can see from this plot that % one data point falls outside the 2-sigma ellipse. We can identify this % point by calculating the z-values of all the data points in our data set: z_sqdata=zeros(size(x)); for i=1:length(x) z_sqdata(i) = [x(i)-xmean,y(i)-ymean] * varinv * [x(i)-xmean;y(i)-ymean]; echo off end echo on % Now we find the indices of any data point with a z^2 > 4: i = find(z_sqdata>4); % And we plot that one up and label the year: figure(1) hold on plot(x(i),y(i),'or','MarkerSize',20) hold off text(x(i)*1.05,y(i),['Production in year ',num2str(year(i))]) % If you recall, that was the year of the historic midwestern drought. It % certainly showed up in the production! echo off