Skip to content

easonlai/Samples_for_Azure_Databricks_Orientation

Repository files navigation

Samples for Azure Databricks Orientation

This is samples code repository (Python) for Azure Databricks Orientation. It's covered various useful usage scenario from beginner to intermediate level.

Section 1

Section 2

  • Mount Azure Blob Storage
  • Exploring sample data (json) in Azure Blob Storage with Json and Pandas
  • Flatten first level of nested columns data
  • Flatten second level of nested columns data
  • Plotting columns relationship by Seaborn

Section 3

Section 4

  • Exploring sample data (csv) in ADLS with Pandas
  • Data cleaning with Pandas
  • Saving cleaned data back to ADLS

Section 5

  • Data cleaning and preparation with PySpark

List of Files

  • data/ > sample source data directory
  • data/pima-indians-diabetes-data.csv > Pima Indians Diabetes Database in csv
  • data/pima-indians-diabetes-data-2.csv > Pima Indians Diabetes Database in csv with column header
  • data/raw_nyc_phil.json > New York Philharmonic Performance History in json
  • data/BL-Flickr-Images-Book.csv > Sample csv data for data cleaning
  • Samples_for_Orientation_MASKED.ipynb > Exported Notebook from Azure Databricks (for Section 1 to 3)
  • Samples_for_Orientation_MASKED.html > Exported HTML (with result and visual) from Azure Databricks (for Section 1 to 3)
  • Samples_for_Orientation_2_MASKED.ipynb > Exported Notebook from Azure Databricks (for Section 4)
  • Samples_for_Orientation_2_MASKED.html > Exported HTML (with result and visual) from Azure Databricks (for Section 4)
  • Data_Cleansing_and_Preparation_with_PySpark_MASKED.ipynb > Exported Notebook from Azure Databricks (for Section 5)
  • Data_Cleansing_and_Preparation_with_PySpark_MASKED.html > Exported HTML (with result and visual) from Azure Databricks (for Section 5)

capture1