Introduction

This analysis explores the relationships of agricultural commodity loss, at a county level, from 1989-2015, for the 26 county region of the Palouse, in Washington, Idaho, and Oregon. Here we explore the entire range of commodities and damage causes, identifying the top revenue loss commodities and their most pertinent damage causes - as indicated from the USDA’s agricultural commodity loss insurance archive.

Phase 3

In Phase 3, we perform a APPLES mixed modeling analysis using a two-step hurdle technique, for a selected set of damage causes. The following analysis builds on Phases 1 and 2.

Hurdle Mixed Models

Hurdle model techniques allow us to address zero inflated datasets, by first running a logstical regression model to determine the probability of zeros occuring. Then we use the non-zero values in a separate, mixed model. In this instance, we use county as a random effect.

In our two part hurdle model, we identify zero values - that is, counties and years that have zero loss for particular damage causes for apples. Previously we removed counties that we have determined have no apples being grown - based on known crop yield data. The counties we are identifying are those where we KNOW apples are being grown, but in some instances, there are no loss claims being filed in particular years.

As such, these are not missing data, but actual zero values that we do not want to exclude from our model. However we want to be able to use a normalized distribution that is not positively skewed/zero inflated.


Hurdle Model - APPLES

Here we run our hurdle technique for APPLES, using a generalized linear model with a binomal function to delineate between zero and non-zero values. Given this model, Is our data normally distributed? What (if any) outliers exist? Are residuals well distributed - indicating normality?

Apples Non-zero Goodness of fit hoslem test

##          llh      llhNull           G2     McFadden         r2ML 
## -306.5238554 -423.9973502  234.9469896    0.2770619    0.3112878 
##         r2CU 
##    0.4208145
## 
##  Hosmer and Lemeshow goodness of fit (GOF) test
## 
## data:  alllevs2_apples$non_zero, fitted(m1)
## X-squared = 11.704, df = 8, p-value = 0.1649


Apples zero/non-zero bionomal model to see outliers and zeros values vs non-zero values


Apples multi-collinearity test for our binomal model

##              GVIF Df GVIF^(1/(2*Df))
## year        1.113 14           1.004
## damagecause 1.144  5           1.014
## county      1.068  6           1.005


Apples binomal model summary for only significant factors

##                              pvalue
## year2012               3.213108e-03
## year2014               3.260583e-02
## year2015               1.870911e-02
## damagecauseCold Winter 1.648559e-07
## damagecauseFreeze      5.506979e-04
## damagecauseFrost       7.655528e-06
## damagecauseHeat        1.473945e-05
## countyBenton           7.004177e-03
## countyFranklin         7.004177e-03
## countyGrant            1.163166e-02
## countyWalla Walla      3.370301e-02


Apples mixed model coefficient estimates for damage cause and year

Now subset to just those APPLES observations with a loss greater than zero (so all non-zeros), and run a linear regression (switched to log loss due to outliers), to make sure that our residuals and other parameters suggest normality.

## Linear mixed model fit by REML ['lmerMod']
## Formula: log(loss) ~ year + damagecause + (1 | county)
##    Data: subset(alllevs2_apples, non_zero == 1)
## 
## REML criterion at convergence: 880.7
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.6105 -0.5834  0.1296  0.6039  3.2595 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  county   (Intercept) 0.4085   0.6391  
##  Residual             1.9353   1.3912  
## Number of obs: 252, groups:  county, 7
## 
## Fixed effects:
##                        Estimate Std. Error t value
## (Intercept)             10.0015     0.4420  22.627
## year2002                -0.5154     0.4851  -1.062
## year2003                 0.7557     0.4627   1.633
## year2004                -0.8494     0.5155  -1.648
## year2005                 0.4481     0.4732   0.947
## year2006                 0.6558     0.4989   1.315
## year2007                 1.3272     0.4906   2.705
## year2008                 0.3012     0.5431   0.555
## year2009                 1.4475     0.4465   3.242
## year2010                 0.6472     0.4902   1.320
## year2011                 0.5794     0.4467   1.297
## year2012                 0.8257     0.6462   1.278
## year2013                 1.9262     0.4366   4.412
## year2014                 1.1395     0.5637   2.021
## year2015                 1.5109     0.4388   3.444
## damagecauseCold Winter  -0.9218     0.5054  -1.824
## damagecauseFreeze        0.1111     0.2829   0.393
## damagecauseFrost         0.1800     0.2760   0.652
## damagecauseHail          0.5523     0.3164   1.746
## damagecauseHeat         -0.8392     0.4382  -1.915


Residuals vs fitted values