GitHub - nmolivo/dataquest_eng at d905cc48de7af669b79da0c077e5637052751082

Name	Name	Last commit message	Last commit date
Latest commit History 76 Commits
.ipynb_checkpoints	.ipynb_checkpoints
1_production_databases	1_production_databases
images	images
.DS_Store	.DS_Store
README.md	README.md
vb_table.csv	vb_table.csv

Name

Last commit message

Last commit date

76 Commits

.ipynb_checkpoints

1_production_databases

Zero to Hero: DATAQUEST's Become a Data Engineer

Here's how to get DataQuest's Data Engineering Track missions' content to work on your localhost. Using data from my Valenbisi ARIMA modeling project, I will walk through steps using PostgreSQL, Postico, and the Command Line to get our DataQuest exercises running out of a Jupyter Notebook.

This will not be a complete repitition of the many resources I used, so be sure to look out for any links I include if it seems I've skipped a few steps.

Important note: In DataQuest, each exercise re-initiates the connection and cursor class of psycopg2 when interacting with the Postgres DB, with no deliberate closing of the connection. When we productionize our scripts, it will be more efficient and correct to use a with statement, which will close the connection once the operations are complete. For the sake of the exercises, I will follow DataQuest's format. I will switch to the with statement as we approach production.

There will be Three Directories in this Repository, each aligning with DataQuest's Data Engineer Track. Each directory will contain a README.md with more details on the content covered in it.

Production Databases
- Postgres For Data Engineers
  - Intro to Postgres
  - Creating Tables
  - Managing Created Tables
  - Loading and Extracting Data with Tables
  - User and Database Management
  - Project: Storing Tropical Storm Data
- Optimizing Postgres Databases
  - Exploring Postgres Internals
  - Debugging Postgres Queries
  - Using an Index
  - Advanced Indexing
  - Vacuuming Postgres Databases
Handling Large Data Sets In Python
- Processing Large Datasets in Pandas
  - Optimizing Dataframe Memory Footprint
  - Processing Dataframes in Chunks
  - Guided Project: Practice Optimizing Dataframes and Processing in Chunks
  - Augmenting Pandas with SQLite
  - Guided Project: Analyzing Startup Fundraising Deals from Crunchbase
- Optimizing Code Performance on Large Datasets
  - CPU Bound Programs
  - I/O Bound Programs
  - Overcoming the Limitations of Threads
  - Quickly Analyzing Data with Parallel Processing
  - Guided Project: Analyzing Wikipedia Pages
- Algorithms and Data Structures
  - Processing Tasks with Stacks and Queues
  - Effectively Using Arrays and Lists
  - Sorting Arrays and Lists
  - Searching Arrays and Lists
  - Hash Tables
  - Guided Project: Analyzing Stock Prices
- Recursion and Trees
  - Overview of Recursion
  - Introduction to Binary Trees
  - Implementing a Binary Heap
  - Working with Binary Search Trees
  - Performance Boosts of Using a B-Tree
  - Performance Boosts of Using a B-Tree II
  - Guided Project: Implementing a Key-Value Database
Data Pipelines
- Building a Data Pipeline
  - Functional Programming
  - Pipeline Tasks
  - Building a Pipeline Class
  - Multiple Dependency Pipeline
  - Guided Project: Hackernews Pipeline

For Non-Commercial Use Only

I highly reccommend participating in this course as a member of DATAQUEST.

About

Here's how to get DataQuest's Data Engineering Track missions' content to work on your localhost. Using data from my Valenbisi ARIMA modeling project, I document my steps using PostgreSQL, Postico, and the Command Line to get our DataQuest exercises running out of a Jupyter Notebook.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Zero to Hero: DATAQUEST's Become a Data Engineer

For Non-Commercial Use Only

About

Releases

Packages

Languages

nmolivo/dataquest_eng

Folders and files

Latest commit

History

Repository files navigation

Zero to Hero: DATAQUEST's Become a Data Engineer

For Non-Commercial Use Only

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages