Agricultural Data Mining Methodology

DMINE is a system of services and code that collect, transform,and model climate impacts.  We do this using R and python and other HPC mechanisms.

Study Area for analysis. The three state region of Idaho, Washington, and Oregon is used as a basis for our overall analysis – and zeroing into the three agricultural regions for specific model development.

Our approach uses data extraction and transformation techniques in R and python to organize and filter data, for use in machine learning predictive models.

Our models are visualized in data dashboards as well as application programming interfaces (API).  Our data dashboards allow a user to review and predict outcomes of a particular area, with our initial efforts focusing on agricultural systems for insurance commodity loss relate to climate.

DMINE Methodology

In order to describe how DMINE works, we have developed a specific, 7 step methodology that walks thru how an examination of agricultural insurance loss in relationship to climate:

  • Provide an overview of the datasets to be examined, (Step 1)
  • Assemble our data and perform any necessary data preparation and organization, (Step 2)
  • Perform an initial exploratory data analysis, (Step 3)
  • Perform initial feature extraction and transformation, (Step 4)
  • Examine commodity variability by year and damage cause using generalized linear mixed modeling, (Step 5)
  • Construct a climate association algorithm to connect climate variables to insurance loss using time lagged correlations, (Step 6) and
  • Model commodity based results of climate time lagged data with insurance loss using Decision Tree techniques (Step 7).

These methodology steps use agricultural systems, drought, and commodity insurance claims as an example topic area.

Case Example Area: Climate Impacts & agricultural systems

Summary: Agricultural insurance crop loss has a close relationship to climate outcomes.  Under this premise, we have been developing a case scenario example of data mining and machine learning to explore agricultural commodity loss and its relationship to drought and water scarcity.  Here we focus on five key commodities for the Pacific Northwest:

  • wheat,
  • apples,
  • cherries,
  • dry peas, and
  • barley.