DMINE Methodology – Step 5: Mixed Modeling

In Steps – 1 thru 4 – we explored the relationships of agricultural commodity loss, at a county level, from 1989-2015, for the 26 county region of the Palouse, in Washington, Idaho, and Oregon. Here we explore the entire range of commodities and damage causes, identifying the top revenue loss commodities and their most pertinent damage causes – as indicated from the USDA’s agricultural commodity loss insurance archive.

In Step 5, we perform a mixed modeling analysis using a two-step hurdle technique, for a selected set of damage causes, for individual commodities. For the purposes of efficiency, we have reduced our commodities to the top five for the Palouse region:

  • wheat
  • barley
  • dry peas
  • apples, and
  • cherries

Hurdle Mixed Models

Hurdle model techniques allow us to address zero inflated datasets, by first running a logstical regression model to determine the probability of zeros occuring. Then we use the non-zero values in a separate, mixed model. In this instance, we use county as a random effect.

In our two part hurdle model, we identify zero values – that is, counties and years that have zero loss for particular damage causes for the five commodities listed above. Previously we removed counties that we have determined have none of an individual commodity being grown – based on known crop yield data. The counties we are identifying are those where we KNOW a particular commodity is being grown, but in some instances, there are no loss claims being filed in particular years.

As such, these are not missing data, but actual zero values that we do not want to exclude from our model. However we want to be able to use a normalized distribution that is not positively skewed/zero inflated.



Commodity Specific Mixed Model Results


Dry Peas