Add files via upload

mike-gangl · web-flow · commit 7c1cb2d1d2c7 · 2023-11-07T09:49:02.000-08:00
diff --git a/jupyter-notebooks/tutorials/5_working_with_chirp_data.ipynb b/jupyter-notebooks/tutorials/5_working_with_chirp_data.ipynb
@@ -0,0 +1,321 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "3be345ad-b008-45c6-b481-46888d2d2ebc",
+   "metadata": {},
+   "source": [
+    "# Working with Data\n",
+    "\n",
+    "The intent of this tutorial is to help familiarize yourself with browsing for data that will be used along with an application to generate data by submitting a job. Job submission will be covered in the next tutorial. Run each cell in order (shift-enter). The notes will indicate when you need to edit code to customize things (e.g., to indicate a data collection)vs. being prompted by running the cell (e.g. for your username and password)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "93e43227-c229-4ed4-a514-9293eda6c4f6",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import requests\n",
+    "import getpass\n",
+    "import json\n",
+    "from IPython.display import JSON"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b860ceff-8f05-4615-a78b-add89ae59567",
+   "metadata": {},
+   "source": [
+    "First we need some pre-defined environment variables"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "484e55a1-2933-49e7-a684-c38ab269d1e8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# This portion of the code is env specific for Dev, Test, Ops, etc. \n",
+    "# define the environment as our test venue\n",
+    "env = {\n",
+    "    \"clientId\":\"71894molftjtie4dvvkbjeard0\",\n",
+    "    \"url\":\"https://58nbcawrvb.execute-api.us-west-2.amazonaws.com/test/\"\n",
+    "    #\"url\":\"https://dxebrgu0bc9w7.cloudfront.net/\"\n",
+    "      }\n",
+    "\n",
+    "# The auth_json is template for authorizing with AWS Cognito for a token that can be used for calls to the data service.\n",
+    "# For now this is just an empty data structure. You will be prompted for your username and password in a few steps.\n",
+    "auth_json = '''{\n",
+    "     \"AuthParameters\" : {\n",
+    "        \"USERNAME\" : \"\",\n",
+    "        \"PASSWORD\" : \"\"\n",
+    "     },\n",
+    "     \"AuthFlow\" : \"USER_PASSWORD_AUTH\",\n",
+    "     \"ClientId\" : \"\"\n",
+    "  }'''"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "386c712e-307f-48e9-a1ba-f2bb3d378c4e",
+   "metadata": {},
+   "source": [
+    "### Authentication Code\n",
+    "\n",
+    "The below method is a helper function for getting an access token for accessing Unity SDS services. You must pass the token along with any API requests in order to access the various Unity SDS services."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6b9bbd23-1e4b-4286-959a-a00908a8ff7e",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "# This method is used for taking a username and password and client ID and fetching a cognito token\n",
+    "def get_token(username, password, clientID):\n",
+    "    aj = json.loads(auth_json)\n",
+    "    aj['AuthParameters']['USERNAME'] = username\n",
+    "    aj['AuthParameters']['PASSWORD'] = password\n",
+    "    aj['ClientId'] =clientID \n",
+    "    token = None\n",
+    "    try:\n",
+    "        response = requests.post('https://cognito-idp.us-west-2.amazonaws.com', headers={\"Content-Type\":\"application/x-amz-json-1.1\", \"X-Amz-Target\":\"AWSCognitoIdentityProviderService.InitiateAuth\"}, json=aj)\n",
+    "        token = response.json()['AuthenticationResult']['AccessToken']\n",
+    "    except:\n",
+    "        print(\"Error, check username and password and try again.\")\n",
+    "    return token"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5ea61770-5592-4e79-b6c3-8dadcb9301cb",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "### Prompt for your Unity username and password\n",
+    "\n",
+    "These are required to get the token (described above) to connect to the data services."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1f586ee1-003f-40eb-9ae5-3770b4579ead",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "print(\"Please enter your username...\")\n",
+    "user_name = input()\n",
+    "\n",
+    "print(\"Please enter your password...\")\n",
+    "password = getpass.getpass()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "be1b7bcd-eb10-44aa-9940-d92b7bec9817",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "token = get_token(user_name, password, env['clientId'])\n",
+    "\n",
+    "if(token):\n",
+    "    print(\"Token received.\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1a128652-a73b-4b9c-93ce-5f4147d73c32",
+   "metadata": {},
+   "source": [
+    "## List Available Data Collections in the Unity System\n",
+    "\n",
+    "Data is organized into Collections. Any particular data file will be in at least one Collection."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a8e1ffbd-ee53-497b-8c89-ce9929263e4d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# The DAPA-request endpoint to retrieve collections is the base URL plus the following:\n",
+    "url = env['url'] + \"am-uds-dapa/collections\"\n",
+    "\n",
+    "# Make a GET request at the URL you have constructed, using your access token\n",
+    "response = requests.get(url, headers={\"Authorization\": \"Bearer \" + token})\n",
+    "print(response)\n",
+    "\n",
+    "print (\"Data Collections at \" + url)\n",
+    "# To see raw JSON of the API response, uncomment this line:\n",
+    "#print(json.dumps(response.json()))\n",
+    "features = response.json()['features']\n",
+    "for data_set in features:\n",
+    "   print(data_set['id'])\n",
+    "\n",
+    "print(\"\\nFull JSON response object:\")\n",
+    "JSON(response.json())\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1bb34cc9-eaae-4efd-82c0-6f33043b6207",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data_set = \"urn:nasa:unity:ssips:TEST1:CHRP_16_DAY_REBIN___1\"\n",
+    "url = env['url'] + \"am-uds-dapa/collections/\"+data_set+\"/items\"\n",
+    "\n",
+    "params = []\n",
+    "#params.append((\"limit\", 20))\n",
+    "\n",
+    "response = requests.get(url, headers={\"Authorization\": \"Bearer \" + token}, params=params)\n",
+    "\n",
+    "print(f\"Endpoint: \"+url)\n",
+    "#print(f\"Total number of files: {response.json()['numberMatched']}\")\n",
+    "print(\"File IDs, titles, and hrefs in Collection \" + data_set + \"\\n\")\n",
+    "\n",
+    "features = response.json()['features']\n",
+    "\n",
+    "for data_file in features: {\n",
+    "   print(\"For \"+ data_file['id']),\n",
+    "   print(\"File:\\t\\t\"+data_file['assets']['data']['href']),\n",
+    "   print(\"Metadata:\\t\"+data_file['assets']['metadata__data']['href']),\n",
+    "   print(\"\")\n",
+    "}\n",
+    "\n",
+    "\n",
+    "print(\"Full JSON response object:\")\n",
+    "JSON(response.json())\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "42bc4d07-d8de-4298-9d33-6299ae41a024",
+   "metadata": {},
+   "source": [
+    "## Filter the results above by time\n",
+    "\n",
+    "The standards-based API used by the Unity SDS Data Store, DAPA, has a variety of filtering options. Currently we have implemented a time-based filter. See more about the Data Access and Processing API at: https://docs.ogc.org/per/20-025r1.html#_dapa_overview\n",
+    "\n",
+    "This cell will filter the full list of files in the Collection with ID = data_set by a start and end time defined by the datetime parameter."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "70a4ab9e-8693-4948-bc2a-cce1e09d1cab",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "url = env['url'] + \"am-uds-dapa/collections/\"+data_set+\"/items\"\n",
+    "# the datetime,limit, and offset are included due to a current bug in the API Gatway setting these values to 'none'.\n",
+    "# Example date/time params\n",
+    "\n",
+    "params = []\n",
+    "#add a datetime to your request\n",
+    "# Note, this will not currently work as the CHIRP Applicaiton package does not specify the appropriate start or stop dates for the data\n",
+    "params.append((\"datetime\", \"2023-11-02T00:00:00Z/2023-11-09T02:31:12Z\"))\n",
+    "\n",
+    "# limit - how many results to return in a single request\n",
+    "#params.append((\"limit\", 10))\n",
+    "\n",
+    "response = requests.get(url, headers={\"Authorization\": \"Bearer \" + token}, params=params)\n",
+    "\n",
+    "print(f\"Total number of files: {response.json()['numberMatched']}\")\n",
+    "print(\"File IDs, datetimes, and hrefs in Collection \" + data_set + \"\\n\")\n",
+    "\n",
+    "features = response.json()['features']\n",
+    "\n",
+    "for data_file in features: {\n",
+    "   print(data_file['id']),\n",
+    "   print(data_file['properties']['created']),\n",
+    "   print(data_file['assets']['metadata__data']['href']),\n",
+    "   print(data_file['assets']['data']['href']),\n",
+    "   print(\"\")\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d79e544a",
+   "metadata": {},
+   "source": [
+    "## Credential-less data download\n",
+    "\n",
+    "When accessing data stores within the same venue, you'll be able to download data from S3 without credentials. \n",
+    "\n",
+    "**Note**, the following libraries are needed for this, and the below command can be run in a jupyter-terminal to install them:\n",
+    "\n",
+    "```\n",
+    "conda install xarray netcdf4 hdf5 boto3 matplotlib\n",
+    "```\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "620a4e4e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import sys\n",
+    "!{sys.executable} -m pip install boto3\n",
+    "import boto3"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5c6a2837-fb5a-4467-94e5-a85f8659506e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "s3 = boto3.client('s3')\n",
+    "#s3://ssips-test-ds-storage-reproc/urn:nasa:unity:ssips:TEST1:CHRP_16_DAY_REBIN___1/urn:nasa:unity:ssips:TEST1:CHRP_16_DAY_REBIN___1:SNDR_tile_2016_s320_N16p50_E120p00_L1_AQ_v1_D_2311021698943223.nc/SNDR_tile_2016_s320_N16p50_E120p00_L1_AQ_v1_D_2311021698943223.nc\n",
+    "s3.download_file('ssips-test-ds-storage-reproc', 'urn:nasa:unity:ssips:TEST1:CHRP_16_DAY_REBIN___1/urn:nasa:unity:ssips:TEST1:CHRP_16_DAY_REBIN___1:SNDR_tile_2016_s320_N16p50_E120p00_L1_AQ_v1_D_2311021698943223.nc/SNDR_tile_2016_s320_N16p50_E120p00_L1_AQ_v1_D_2311021698943223.nc', 'test_file11.nc')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e81ee1e1-c485-45c2-9074-29c5c3d935fc",
+   "metadata": {},
+   "source": [
+    "The file now should appear in the directory tree to the left in jupyter"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}