Skip to content

Yi-Pin-123/Pyspark-Notes

 
 

Repository files navigation

Pyspark Notes

This was forked from the repository for the LinkedIn Learning course High-Performance PySpark: Advanced Strategies for Optimal Data Processing. The full course is available on LinkedIn Learning.

I edited the repository to add more code that is also related to Pyspark.

Content Description

  1. Data Cleaning
  2. Defining Schema
  3. Compression Techniques
  4. Repartitioning
  5. Clustering Model

About

This repository contains some Pyspark code.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.2%
  • C 0.6%
  • Cython 0.5%
  • C++ 0.3%
  • Shell 0.1%
  • Jupyter Notebook 0.1%
  • Other 0.2%