This analysis explores the relationships of agricultural commodity loss, at a county level, from 1989-2015, for the 26 county region of the Palouse, in Washington, Idaho, and Oregon. Here we explore the entire range of commodities and damage causes, identifying the top revenue loss commodities and their most pertinent damage causes - as indicated from the USDAâ€™s agricultural commodity loss insurance archive.

In Phase 3, we perform a APPLES mixed modeling analysis using a two-step hurdle technique, for a selected set of damage causes. The following analysis builds on Phases 1 and 2.

Hurdle model techniques allow us to address zero inflated datasets, by first running a logstical regression model to determine the probability of zeros occuring. Then we use the non-zero values in a separate, mixed model. In this instance, we use county as a random effect.

In our two part hurdle model, we identify zero values - that is, counties and years that have zero loss for particular damage causes for apples. Previously we removed counties that we have determined have no apples being grown - based on known crop yield data. The counties we are identifying are those where we KNOW apples are being grown, but in some instances, there are no loss claims being filed in particular years.

As such, these are not missing data, but actual zero values that we do not want to exclude from our model. However we want to be able to use a normalized distribution that is not positively skewed/zero inflated.

Here we run our hurdle technique for APPLES, using a generalized linear model with a binomal function to delineate between zero and non-zero values. Given this model, Is our data normally distributed? What (if any) outliers exist? Are residuals well distributed - indicating normality?

```
## llh llhNull G2 McFadden r2ML
## -306.5238554 -423.9973502 234.9469896 0.2770619 0.3112878
## r2CU
## 0.4208145
```

```
##
## Hosmer and Lemeshow goodness of fit (GOF) test
##
## data: alllevs2_apples$non_zero, fitted(m1)
## X-squared = 11.704, df = 8, p-value = 0.1649
```

```
## GVIF Df GVIF^(1/(2*Df))
## year 1.113 14 1.004
## damagecause 1.144 5 1.014
## county 1.068 6 1.005
```

```
## pvalue
## year2012 3.213108e-03
## year2014 3.260583e-02
## year2015 1.870911e-02
## damagecauseCold Winter 1.648559e-07
## damagecauseFreeze 5.506979e-04
## damagecauseFrost 7.655528e-06
## damagecauseHeat 1.473945e-05
## countyBenton 7.004177e-03
## countyFranklin 7.004177e-03
## countyGrant 1.163166e-02
## countyWalla Walla 3.370301e-02
```

Now subset to just those APPLES observations with a loss greater than zero (so all non-zeros), and run a linear regression (switched to log loss due to outliers), to make sure that our residuals and other parameters suggest normality.

```
## Linear mixed model fit by REML ['lmerMod']
## Formula: log(loss) ~ year + damagecause + (1 | county)
## Data: subset(alllevs2_apples, non_zero == 1)
##
## REML criterion at convergence: 880.7
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.6105 -0.5834 0.1296 0.6039 3.2595
##
## Random effects:
## Groups Name Variance Std.Dev.
## county (Intercept) 0.4085 0.6391
## Residual 1.9353 1.3912
## Number of obs: 252, groups: county, 7
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 10.0015 0.4420 22.627
## year2002 -0.5154 0.4851 -1.062
## year2003 0.7557 0.4627 1.633
## year2004 -0.8494 0.5155 -1.648
## year2005 0.4481 0.4732 0.947
## year2006 0.6558 0.4989 1.315
## year2007 1.3272 0.4906 2.705
## year2008 0.3012 0.5431 0.555
## year2009 1.4475 0.4465 3.242
## year2010 0.6472 0.4902 1.320
## year2011 0.5794 0.4467 1.297
## year2012 0.8257 0.6462 1.278
## year2013 1.9262 0.4366 4.412
## year2014 1.1395 0.5637 2.021
## year2015 1.5109 0.4388 3.444
## damagecauseCold Winter -0.9218 0.5054 -1.824
## damagecauseFreeze 0.1111 0.2829 0.393
## damagecauseFrost 0.1800 0.2760 0.652
## damagecauseHail 0.5523 0.3164 1.746
## damagecauseHeat -0.8392 0.4382 -1.915
```