Skip to content

Notebook and presentation on setting up a Big Data processing chain for image classification on AWS (EC2, EMR, S3, IAM, PySpark).

License

Notifications You must be signed in to change notification settings

alex-martineau/Cloud_BigData_Env_Process

Repository files navigation

📊 Perform Processing in a Big Data Environment on the Cloud

This repository contains deliverables created as part of Project 9 of the Data Scientist course offered by OpenClassrooms & CentraleSupélec.

⚠️ Important: This repository reflects personal work carried out by the author as part of a training course. It is in no way affiliated with, endorsed, or officially published by OpenClassrooms or CentraleSupélec.

📂 Content

Martineau_Alexandre_1_notebook_032025.ipynb

→ Jupyter notebook implementing a Big Data processing chain for fruit image classification with PySpark and AWS EMR.

  • Image import and preprocessing
  • Feature extraction with MobileNetV2
  • Dimension reduction using PCA
  • Saving results (Parquet, CSV)

Martineau_Alexandre_3_presentation_032025.pptx

→ Exhibitor presentation material:

  • Context (Fruits! startup and biodiversity preservation)
  • Principles of Big Data and Cloud Computing
  • Technical architecture (EC2, EMR, S3, IAM)
  • Steps for setting up a Spark cluster on AWS
  • Results and conclusions (scalability, costs, technology dependencies)

⚖️ License & Terms of Use

  • All rights reserved © 2025 Alexandre Martineau.
  • No reuse, reproduction, distribution, or modification is permitted without prior agreement.
  • Any commercial or profit-generating use requires the express permission of the author and financial compensation.

👤 Author

Alexandre Christophe Dominique Martineau

About

Notebook and presentation on setting up a Big Data processing chain for image classification on AWS (EC2, EMR, S3, IAM, PySpark).

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published