About dMINE labs

Dmine.io reproducible science framework for agricultural and climatic data mining. The above framework outlines the conceptual components that should be part of an effective, dynamic reproducible science effort.

DMINE is developing a framework for data mining transformations + machine learning models that predict the impact of climate change on agricultural systems.  
DMINE.io is our research development server

Agricultural systems and their products are essential components to our society.  In 2014, the U.S. agricultural sector created a gross output of more than 835 billion dollars, and had an employee base of approximately 750,000 people.  With roughly 2 million farms in the US, with an average size of about 435 acres, total grain production alone was $436 million (USDA Economic Research Service, 2014).

Our DMINE team assembles agricultural data for the purposes of developing statistical models that can tell us more about the interactions of climate and agricultural commodity systems.  We are using the USDA’s insurance program’s 2.8 million records of commodity losses in the Pacific Northwest from 1989-2015, in combination with related climatic variables.


Data Acquisition for Agriculture

Palouse region spring wheat – August 2017

Several key datasets have been initially identified, including:

  • The USDA’s agricultural crop loss data archive (1989-2016). The USDA’s Risk Management Agency has insurance claim records associated with commodity crop loss from 1980 to 2016.  Specifically, we are using the cause of loss archive datasets, which are .csv files which summarize insurance claims by month and by county.  This data is available for the entire United States.  For this analysis we have focused on the three state region of Idaho, Oregon, and Washington.
  • NASS crop commodity results.  The USDA’s agricultural statistical service provides extensive, county based information on commodity outputs nationwide, including variables such as annual area harvested, production, sales, and water applied.  We also use NASS CDL acreage for cropland commodities by year and county.  NASS Cropland Data Layer commodity codes (csv)
  • Associated climate and geophysical variables.  With all dashboards and functional data areas, we are combining with climate/geophysical variables to explore patterns.  For agriculture, we aggregate climate data at a county level, and then compare with 27 years of agricultural monthly commodity loss claims across the Pacific Northwest.  See our Methodology for  more information, as well as our Agriculture Data Portal.


EXAMPLE: Below is an animation for agricultural commodity loss from 1989 – 21015 for Wheat claims due to drought.  Our Agricultural Dashboards provides more information related to the subject of agricultural commodity loss and commodity outputs in the future.


Example WHEAT Commodity Loss Animation, 1989 2015 


This work is developed as part of the Pacific Northwest Climate Impacts Research Consortium (CIRC), a climate-science-to-climate-action team funded by the National Oceanic and Atmospheric Administration (NOAA), and the National Integrated Drought Information System (NIDIS).  A mix of scientists from disciplines as varied as atmospheric and social science, CIRC is a proud member of NOAA’s Regional Integrated Sciences and Assessments (RISA) program, a national leader in climate science and adaptation.