Skip to content

Use python to get data and import to GAMS 25.0

License

Notifications You must be signed in to change notification settings

diligentwang1998/PythonGAMS

 
 

Repository files navigation

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Import NASS using Python and Pass to GAMS Directly\n",
    "\n",
    "The is a quick guide to download data from **USDA NASS** dataset and import to GAMS directly using the newly added GAMS 25 Embedded Python feature. For detailed introduction on the feature see the\n",
    "__[recent newsletter from Dr. McCarl](https://www.gams.com/fileadmin/community/mccarlarchive/news42.pdf)__.\n",
    "\n",
    "Last update: Mark 3/8/2018\n",
    "\n",
    "Before you start, make sure you are using the new **GAMS 25.0.2** version."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Install Python Module calling NASS API \n",
    "\n",
    "USDA have a API to download their dataset. More information about it and to request a **FREE** API for yourself __[here](http://quickstats.nass.usda.gov/api)__.\n",
    "\n",
    "To use the API, first you need to install the Python module NASS. Here are the steps:\n",
    "\n",
    "- open windows command prompt\n",
    "- navigate to folder where GAMSPython and PIP is istalled using <span style=\"color:red\">\"cd C:\\GAMS\\win64\\25.0\\GMSPython\\Scripts\"</span>  (make sure you see something like 'pip3.exe' so that pip is installed here)\n",
    "- use <span style=\"color:red\">'pip install nass'</span> 'pip install nass' to install nass module in GAMSPython\n",
    "\n",
    "You can use other moduels the same way. For the data cleanups, I also used two other moduels \"pandas\" and \"numpy\".\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Importing NASS in GAMS\n",
    "\n",
    "- import the module used \n",
    "- navigate to the working directory \n",
    "- set up the API (**Below is the one I requested. Please use link provided above to request you own API for free**)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "ename": "FileNotFoundError",
     "evalue": "[WinError 2] The system cannot find the file specified: 'YourWorkingPath'",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[1;31mFileNotFoundError\u001b[0m                         Traceback (most recent call last)",
      "\u001b[1;32m<ipython-input-1-16329738047c>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m      4\u001b[0m \u001b[1;32mimport\u001b[0m \u001b[0mos\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m      5\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 6\u001b[1;33m \u001b[0mos\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mchdir\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m\"YourWorkingPath\"\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m      7\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m      8\u001b[0m \u001b[0mapi\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mnass\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mNassApi\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'4A3B90E6-199F-344A-B466-3236FE813C2B'\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;31mFileNotFoundError\u001b[0m: [WinError 2] The system cannot find the file specified: 'YourWorkingPath'"
     ]
    }
   ],
   "source": [
    "import nass\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import os\n",
    "\n",
    "os.chdir(\"YourWorkingPath\")\n",
    "\n",
    "api = nass.NassApi('4A3B90E6-199F-344A-B466-3236FE813C2B')\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "## Do the query\n",
    "\n",
    "The basic logic is that, <span style=\"color:red\">\"q = api.query()\"</span> starts querying process and then you can use the query subjects to filter the database. For example, <span style=\"color:red\">q.filter('source_desc','SURVEY')</span> filtered only the data from SURVEY; q.filter('state_name','TEXAS') filters only information for TEXAS . <span style=\"color:red\">\"q.count()\"</span> will give you the size of current database. \n",
    "\n",
    "In this example, all the filters reduced the database from 3,4659,577 to 49,477.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "ename": "NameError",
     "evalue": "name 'api' is not defined",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[1;31mNameError\u001b[0m                                 Traceback (most recent call last)",
      "\u001b[1;32m<ipython-input-2-52aa791d4d9c>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0mq\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mapi\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mquery\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m      2\u001b[0m \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m\"The total number of NASS dataset is \"\u001b[0m \u001b[1;33m+\u001b[0m \u001b[0mstr\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mq\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mcount\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m      3\u001b[0m \u001b[0mq\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mfilter\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'source_desc'\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;34m'SURVEY'\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m      4\u001b[0m \u001b[0mq\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mfilter\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'sector_desc'\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;34m'ANIMALS & PRODUCTS'\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m      5\u001b[0m \u001b[0mq\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mfilter\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'group_desc'\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;34m'LIVESTOCK'\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;31mNameError\u001b[0m: name 'api' is not defined"
     ]
    }
   ],
   "source": [
    "q = api.query()\n",
    "print(\"The total number of NASS dataset is \" + str(q.count()))\n",
    "q.filter('source_desc','SURVEY')\n",
    "q.filter('sector_desc','ANIMALS & PRODUCTS')\n",
    "q.filter('group_desc','LIVESTOCK')\n",
    "q.filter('agg_level_desc','COUNTY')\n",
    "q.filter('state_name','TEXAS')\n",
    "print(\"After filtering we have \" + str(q.count()))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "   year      freq_desc state_fips_code county_code asd_code commodity_desc  \\\n",
      "0  2007  POINT IN TIME              48         998       11         CATTLE   \n",
      "1  2006  POINT IN TIME              48         998       11         CATTLE   \n",
      "2  2005  POINT IN TIME              48         998       11         CATTLE   \n",
      "3  2004  POINT IN TIME              48         998       11         CATTLE   \n",
      "4  2003  POINT IN TIME              48         998       11         CATTLE   \n",
      "\n",
      "  prodn_practice_desc                   short_desc unit_desc      Value  \n",
      "0             ON FEED  CATTLE, ON FEED - INVENTORY      HEAD  2,396,000  \n",
      "1             ON FEED  CATTLE, ON FEED - INVENTORY      HEAD  2,415,000  \n",
      "2             ON FEED  CATTLE, ON FEED - INVENTORY      HEAD  2,252,000  \n",
      "3             ON FEED  CATTLE, ON FEED - INVENTORY      HEAD  2,325,000  \n",
      "4             ON FEED  CATTLE, ON FEED - INVENTORY      HEAD  2,193,000  \n"
     ]
    }
   ],
   "source": [
    "# Generate the output DataFrame\n",
    "columns_to_keep = ['year','freq_desc','state_fips_code','county_code','asd_code',\n",
    "'commodity_desc','prodn_practice_desc','short_desc','unit_desc','Value']\n",
    "livestock_db_all = pd.DataFrame(q.execute())\n",
    "livestock_db = livestock_db_all[columns_to_keep]\n",
    "print(livestock_db.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### A few things in mind.\n",
    "- first, NASS only allows you to download less than 50,000 data at one time. So if you want to do download more than that, you have to do it in loops. See the attached example of looping in \"part4_cropmix.py\".\n",
    "- second, you can see the name of filters and the options from the html source code at: view-source:https://quickstats.nass.usda.gov/. For example, in line 1114 you can see the NAME for source is <span style=\"color:red\">\"source_desc\"</span> and the two available options are <span style=\"color:red\"> CENSUS </span> and <span style=\"color:red\"> SURVEY </span>. Similar for others. You can also explore the options for a particular filter with the following python command: api.param_values() see the example below.\n",
    "\n",
    "\n",
    "\n",
    "\t\t   "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['CENSUS', 'SURVEY']\n"
     ]
    }
   ],
   "source": [
    "## options for a filter\n",
    "print(api.param_values('source_desc'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data clean ups \n",
    "You need to do some further query and cleanups before get it to GAMS. The following python codes to the following:\n",
    "- further query to only get data with units as \"HEAD\"\n",
    "- rename to short colunm names\n",
    "- remove missing informations"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Dataframe ready to export is 'livestock_export'\n",
      "**************************************************\n",
      "     fips asd_code    livestock1  year  Value\n",
      "94  48011       11  cattleonfeed  1980  3,000\n",
      "95  48011       11  cattleonfeed  1979  4,000\n",
      "96  48011       11  cattleonfeed  1978  4,000\n",
      "97  48011       11  cattleonfeed  1977  4,000\n",
      "98  48011       11  cattleonfeed  1976  5,000\n"
     ]
    }
   ],
   "source": [
    "pd.options.mode.chained_assignment = None\n",
    "## further query to only get data with units as \"HEAD\"\n",
    "\n",
    "set(livestock_db['unit_desc'])\n",
    "livestock_heads = livestock_db.loc[livestock_db['unit_desc']=='HEAD',]\n",
    "set(livestock_heads['commodity_desc'])\n",
    "\n",
    "## rename to short column names\n",
    "short_desc=list(set(livestock_heads['short_desc']))\n",
    "\n",
    "conditions = [\n",
    " livestock_heads['short_desc']=='CATTLE, COWS, MILK - INVENTORY',\n",
    " livestock_heads['short_desc']=='WOOL - SHORN, MEASURED IN HEAD',\n",
    " livestock_heads['short_desc']=='CATTLE, ON FEED - PLACEMENTS, MEASURED IN HEAD',\n",
    "\n",
    " livestock_heads['short_desc']=='SHEEP, INCL LAMBS - INVENTORY',\n",
    " livestock_heads['short_desc']=='GOATS, ANGORA - INVENTORY',\n",
    " livestock_heads['short_desc']=='CATTLE, INCL CALVES - INVENTORY',\n",
    "\n",
    " livestock_heads['short_desc']=='GOATS, MILK - INVENTORY',\n",
    " livestock_heads['short_desc']=='GOATS, MEAT & OTHER - INVENTORY',\n",
    " livestock_heads['short_desc']=='SHEEP, EWES, BREEDING, GE 1 YEAR - INVENTORY',\n",
    "\n",
    " livestock_heads['short_desc']=='GOATS - INVENTORY',\n",
    " livestock_heads['short_desc']=='CATTLE, ON FEED - SALES FOR SLAUGHTER, MEASURED IN HEAD',\n",
    " livestock_heads['short_desc']=='HOGS - INVENTORY',\n",
    "\n",
    " livestock_heads['short_desc']=='MOHAIR, ANGORA - CLIPPED, MEASURED IN HEAD',\n",
    " livestock_heads['short_desc']=='CATTLE, ON FEED - INVENTORY',\n",
    " livestock_heads['short_desc']=='CATTLE, COWS, BEEF - INVENTORY'\n",
    "    ]\n",
    "\n",
    "choices=['cattlecowmilk','sheepwool','cattleonfeed_placement',\n",
    "        'sheeplambs','goatsangora','cattlecalves',\n",
    "        'milkgoats','meatgoats','sheepewes',\n",
    "        'goats','cattleonfeed4slaughter','hogs',\n",
    "        'angora_mohair','cattleonfeed','beefcow']\n",
    "\n",
    "livestock_heads['livestock1']=np.select(conditions,choices,default=\"other\")\n",
    "## the above chunk of code just rename to shorter names for livestocks, if you don't want any to be included, just change the corresponding\n",
    "## choices to \"other\" and it would be dropped by the next line of code\n",
    "\n",
    "## remove missing informations\n",
    "livestock_heads = livestock_heads.loc[livestock_heads['livestock1'] != 'other']\n",
    "livestock_heads = livestock_heads.loc[livestock_heads['Value'] != '                 (D)',]\n",
    "livestock_heads = livestock_heads.loc[livestock_heads['asd_code'] != '99',]\n",
    "livestock_heads = livestock_heads.loc[livestock_heads['county_code'] != '998',]\n",
    "livestock_heads['fips']=livestock_heads['state_fips_code']+livestock_heads['county_code']\n",
    "livestock_export=livestock_heads[['fips','asd_code','livestock1','year','Value']]\n",
    "\n",
    "livestockmixnew=livestock_export.values.tolist()\n",
    "for i in range(len(livestockmixnew)):\n",
    "    try:\n",
    "        temp=livestockmixnew[i][4].replace(',','')\n",
    "        int(temp)\n",
    "    except :\n",
    "        print(livestockmixnew[i])\n",
    "        \n",
    "print(\"Dataframe ready to export is 'livestock_export'\")\n",
    "print('*' * 50)\n",
    "print(livestock_export.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Pass to GAMS\n",
    "Now in the last step we want to pass the generated dataframe **livestock_export** into GAMS. Note it is defined on four domains __fips__, __asd_code__, __livestock1__ and __Value__, we actually need to import four sets and 1 parameter here.  \n",
    "\n",
    "The GAMS only takes specific forms of python data. Specifically:\n",
    "- for sets: it only takes <span style=\"color:red\">list</span>\n",
    "- for parameters: it only takes <span style=\"color:red\">tuples</span> with all domains enter as \"**str**\" and the value as \"**float**\" or \"**int**\".\n",
    "    \n",
    "Such conversion is showed as in the code below. Note for the parameter, we need to convert the fifth element (namely x[4]) from **str** to **int** it is the value of the parameter."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [],
   "source": [
    "livestockmixnew=livestock_export.values.tolist()\n",
    "\n",
    "## create list for sets\n",
    "countyfips=list(set(livestock_export['fips']))\n",
    "usda_dist=list(set(livestock_export['asd_code']))\n",
    "livestockall=list(set(livestock_export['livestock1']))\n",
    "mixesa=list(set(livestock_export['year']))\n",
    "\n",
    "## create tuples for parameters\n",
    "livestockmixnew=[( x[0],x[1],x[2],x[3],int(x[4].replace(',','')) ) for x in livestockmixnew]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Run Python in GAMS\n",
    "now all the above python lines can now be called directly in GAMS with the EmbeddedCode feature. Specifically, you just need to put them between the following two lines.\n",
    "\n",
    "$onEmbeddedCode Python:\n",
    "\n",
    "<span style=\"color:red\">ALL the Python Code used so far !!! </span>\n",
    "\n",
    "$offEmbeddedCode \n",
    "\n",
    "\n",
    "## Setups in GAMS\n",
    "Finally, few steps are needed to complete the whole procedure in the GAMS file below.\n",
    "- the first line checks if GAMSPython is ready to be used\n",
    "- 4 empty sets and 1 empty parameters are defined.\n",
    "- The Embedded Python code we illustrated before\n",
    "    - With five lines starts with **gams.set** which pass the python data into GAMS\n",
    "- after the end of the **offEmbeededCode** put the names of imported set and parameters    \n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#############################################################################\n",
    "### This is a GAMS file, Not Python file. So it is NOT runnable here     ####\n",
    "### Run the Example1_single_import.gms in GAMS instead                   ####\n",
    "#############################################################################\n",
    "\n",
    "\n",
    "$if not setenv GMSPYTHONHOME $abort.noError Embedded code Python not ready to be used\n",
    "\n",
    "set countyfips   /system.empty/\n",
    "    usda_dist    /system.empty/\n",
    "    livestockall /system.empty/\n",
    "    mixesa       /system.empty/\n",
    "   ;\n",
    "parameter livestockmixnew(*,usda_dist,livestockall,mixesa) original livestock mix data from usda nass;\n",
    "\n",
    "$onEmbeddedCode Python:\n",
    "\n",
    "!!!! ALL the Python Code used so far !!!!!!!\n",
    "\n",
    "** five additional lines to pass data to GAMS\n",
    "gams.set(\"countyfips\",countyfips)\n",
    "gams.set(\"usda_dist\",usda_dist)\n",
    "gams.set(\"livestockall\",livestockall)\n",
    "gams.set(\"mixesa\",mixesa)\n",
    "gams.set(\"livestockmixnew\",livestockmixnew)\n",
    "\n",
    "$offEmbeddedCode  countyfips usda_dist  livestockall mixesa livestockmixnew\n",
    "display countyfips, usda_dist, livestockall, mixsa, livestockmixnew;\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}

About

Use python to get data and import to GAMS 25.0

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • GAMS 100.0%