Skip to content

Latest commit

 

History

History
114 lines (95 loc) · 2.66 KB

README.md

File metadata and controls

114 lines (95 loc) · 2.66 KB

Mexico-Gas-Prices

In Mexico, the price of gasoline is constantly changing since its prices depend a lot on the prices of barrel, transfer and competitors. That is why keeping track of prices can save money when looking for the best price to fill your tank.

The task

The objective of this project is proces this fuel prices data obtained from CRE. (Comision Regulatoria de Energía en México) in the cloud to storage a daily basis in Storage and in Big Query Data Warehouse

Tools that were used to accomplish this task:

  • Python
    • Pandas
    • Requests
    • reverse_geocoder
    • unidecode
    • pyarrow
    • google-cloud-bigquery
    • google-cloud-storage
    • gcsfs
  • Google Cloud Storage
  • BigQuery
  • Cloud Functions

Big Query Data

The schema of the data is as follows:

[ 
  {
    "mode": "REQUIRED",
    "name": "Permiso",
    "type": "STRING"
  },
  {
    "mode": "REQUIRED",
    "name": "Marca",
    "type": "STRING"
  },
  {
    "mode": "REQUIRED",
    "name": "Nombre",
    "type": "STRING"
  },
  {
    "mode": "REQUIRED",
    "name": "Direccion",
    "type": "STRING"
  },
  {
    "mode": "REQUIRED",
    "name": "Producto",
    "type": "STRING"
  },
  {
    "mode": "REQUIRED",
    "name": "Precio",
    "type": "FLOAT64"
  },
  {
    "mode": "REQUIRED",
    "name": "Estado",
    "type": "STRING"
  },
  {
    "mode": "REQUIRED",
    "name": "Municipio",
    "type": "STRING"
  },
  {
    "mode": "REQUIRED",
    "name": "Fecha",
    "type": "DATE"
  },
  {
    "mode": "REQUIRED",
    "name": "y",
    "type": "FLOAT64"
  },
  {
    "mode": "REQUIRED",
    "name": "x",
    "type": "FLOAT64"
  }
]

Process

The procedure that was taken to process the information is as follows:

  • Get xml element tree from CRE web page
  • Struct data in json objects to then convert it to pandas dataframe
  • Transforms data with pandas and other libraries to insert geo coordinates and insert new columns
  • Store transformed data to storage
  • Send the final dataframe to Big Query

There are scripts to run all localy to test the data transformations. This proccess runs at 7:30 am with a cloud scheduler using pubsub.

Architecture

Here is the GCP architecture, this shows the resources that are used