This repository contains deliverables created as part of Project 9 of the Data Scientist course offered by OpenClassrooms & CentraleSupélec.
Martineau_Alexandre_1_notebook_032025.ipynb
→ Jupyter notebook implementing a Big Data processing chain for fruit image classification with PySpark and AWS EMR.
- Image import and preprocessing
- Feature extraction with MobileNetV2
- Dimension reduction using PCA
- Saving results (Parquet, CSV)
Martineau_Alexandre_3_presentation_032025.pptx
→ Exhibitor presentation material:
- Context (Fruits! startup and biodiversity preservation)
- Principles of Big Data and Cloud Computing
- Technical architecture (EC2, EMR, S3, IAM)
- Steps for setting up a Spark cluster on AWS
- Results and conclusions (scalability, costs, technology dependencies)
- All rights reserved © 2025 Alexandre Martineau.
- No reuse, reproduction, distribution, or modification is permitted without prior agreement.
- Any commercial or profit-generating use requires the express permission of the author and financial compensation.
Alexandre Christophe Dominique Martineau