Ml pipeline #11

MrZombie69232 · 2023-09-01T03:13:07Z

I have added necessary scripts for Machine Learning pipeline. Please review and let me know if any updates or changes required .

…ons and a readme along with requirement.txt.

JackBuck

This is a nice PR - your scripts look easy to use to an outsider. I would just suggest that you look back at the readme again - did chatgpt write it? 😉

ml_pipeline/readme.md

ml_pipeline/requirememts.txt

JackBuck · 2023-09-01T09:22:31Z

ml_pipeline/requirememts.txt

@@ -0,0 +1,7 @@
+python==3.10.0


The python version doesn't actually live in this file - this is just for python libraries. The reason is that you can't pip install python==3.10.0 but you can pip install everything else 😄

ml_pipeline/scripts/generate_training_sample.py

ml_pipeline/readme.md

JackBuck · 2023-09-01T09:24:52Z

ml_pipeline/readme.md

+
+Usage:
+```bash
+python predict_with_model.py --config_file /path/to/config.json


Same here. Your prediction.py file (not the different name) hard-codes a config.py path.

Fixed as well

ml_pipeline/scripts/training.py

JackBuck

Thanks for making the changes Satyam. With a couple of exceptions that I pointed out inline, the readme is a lot better now.

Looking through it though, I am a bit confused with where a couple of functions are defined. Is this actually the version of the code that you used to get your results? If you've refactored it since running it, could you rerun it please, to make sure it all works? Then we can merge it!

JackBuck · 2023-10-08T21:06:47Z

ml_pipeline/readme.md

+
+### 1. Generating Training Points
+
+The script `generate_training_points.py` takes a raster dataset, randomly samples specific classes, and creates a GeoDataFrame containing the sampled points. The sampled points serve as training data for classification.


Should this be generate_training_sample.py?

JackBuck · 2023-10-08T21:06:57Z

ml_pipeline/readme.md

+
+Usage:
+```bash
+python generate_training_points.py --raster_path /path/to/raster/file.tif --num_samples 100 --target_classes 1 2 3 --export_path /path/to/export.shp


JackBuck · 2023-10-08T21:10:04Z

ml_pipeline/readme.md

+
+### 3. Model Training using Training Points
+
+The script `train_model.py` loads the GeoDataFrame generated in the first step, processes the data, sets up a PyCaret experiment, creates a classification model, and saves the trained model along with evaluation plots and reports.


Should this be training.py?

JackBuck · 2023-10-08T21:10:11Z

ml_pipeline/readme.md

+
+Usage:
+```bash
+python train_model.py  /path/to/config.json


JackBuck · 2023-10-08T21:10:58Z

ml_pipeline/readme.md

+
+### 4. Prediction using Pretrained Model
+
+The script `predict_with_model.py` uses a pretrained classification model to make predictions on input raster tiles. It saves the binary and probability prediction outputs.


Should this be prediction.py?

JackBuck · 2023-10-08T21:11:03Z

ml_pipeline/readme.md

+
+Usage:
+```bash
+python predict_with_model.py  /path/to/config.json


JackBuck · 2023-10-08T21:34:59Z

ml_pipeline/scripts/training.py

+    gdf = gpd.read_file(config['Paths']['shapefile_path'])
+    new_df1 = gdf.drop(columns=config['ColumnsToDrop'])
+
+    new_df1 = new_df1.rename(columns=lambda col: extract_number(col))


Is extract_number defined in pycaret.classification? Or elsewhere? I can't see it defined in your code, or in pycaret 🤔 :
https://github.com/pycaret/pycaret/blob/master/pycaret/classification/__init__.py
https://pycaret.readthedocs.io/en/stable/api/classification.html

JackBuck · 2023-10-08T21:37:35Z

ml_pipeline/scripts/training.py

+    new_df1 = new_df1.rename(columns=lambda col: extract_number(col))
+
+    # Load global statistics from cache
+    global_stats_dict = {band_name: cache_global_stats(band_name, config['Paths']['pickle_dir']) for band_name in band_names}


Here, I think that cache_global_stats is defined in prediction.py but is used here (without being imported). Since it's only used in training.py, I think that this would be the best file to define it in.

JackBuck · 2023-10-08T21:38:00Z

ml_pipeline/scripts/training.py

+    global_stats_dict = {band_name: cache_global_stats(band_name, config['Paths']['pickle_dir']) for band_name in band_names}
+
+    # Process bands in the dataframe
+    new_df2 = process_bands_in_dataframe(new_df1, band_names, global_stats_dict)


Similarly, where is process_bands_in_dataframe defined?

MrZombie69232 added 3 commits September 1, 2023 04:06

I have added complete pipeline for ml in a new folders with instructi…

3e503b8

…ons and a readme along with requirement.txt.

I have added complete pipeline for ml in a new folders with instructi…

63554a0

…ons and a readme along with requirement.txt.

I have added complete pipeline for ml in a new folders with instructi…

4626a89

…ons and a readme along with requirement.txt.

JackBuck reviewed Sep 1, 2023

View reviewed changes

MrZombie69232 and others added 4 commits September 1, 2023 11:29

fixed all comments except requiremnts.txt.

be3f2f5

fixed requiremnts.txt and Readme.

f51cae4

fixed requiremnts.txt and Readme.

fd0cbfe

Update readme.md

fe316e5

JackBuck requested changes Oct 8, 2023

View reviewed changes


		### 1. Generating Training Points

		The script `generate_training_points.py` takes a raster dataset, randomly samples specific classes, and creates a GeoDataFrame containing the sampled points. The sampled points serve as training data for classification.


		### 3. Model Training using Training Points

		The script `train_model.py` loads the GeoDataFrame generated in the first step, processes the data, sets up a PyCaret experiment, creates a classification model, and saves the trained model along with evaluation plots and reports.


		### 4. Prediction using Pretrained Model

		The script `predict_with_model.py` uses a pretrained classification model to make predictions on input raster tiles. It saves the binary and probability prediction outputs.

Ml pipeline #11

Are you sure you want to change the base?

Ml pipeline #11

Uh oh!

Conversation

MrZombie69232 commented Sep 1, 2023

Uh oh!

JackBuck left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

JackBuck left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!