Skip to content

Efficient image data loading and preprocessing pipeline using TensorFlow and Keras. Includes directory-based dataset loading, normalization, resizing, batching, and performance optimization with caching, shuffling, and prefetching for high-throughput model training.

License

Notifications You must be signed in to change notification settings

imehranasgari/DL_TensorFlow_LowLevelAPI_ImageDataLoader

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Image Data Loading and Preprocessing in TensorFlow

1. Project Title

Efficient Image Data Loading and Preprocessing for Deep Learning


2. Problem Statement and Goal of Project

Loading and preprocessing image data efficiently is critical for training performant deep learning models. This project demonstrates how to load, preprocess, batch, and visualize image datasets using TensorFlow/Keras utilities, ensuring the pipeline is optimized for GPU training and scalable datasets.


3. Solution Approach

The notebook is organized into key steps:

  1. Directory-based dataset loading – Use tf.keras.utils.image_dataset_from_directory to automatically label and split datasets.
  2. Exploring dataset properties – View shapes, class names, and sample counts.
  3. Data preprocessing – Resize, normalize, and prepare images for model ingestion.
  4. Performance optimization – Apply cache(), shuffle(), and prefetch() for efficient training throughput.
  5. Visualization – Display batches of images with their labels for inspection.

4. Technologies & Libraries

From the code:

  • TensorFlow / Keras – Dataset loading, preprocessing, and pipeline optimization.
  • Matplotlib – Visualizing sample images and labels.
  • NumPy – Basic numerical handling (if used).

5. Description about Dataset

Not provided explicitly – The notebook loads image data from a local directory structure, where subfolder names correspond to class labels.


6. Installation & Execution Guide

Requirements:

pip install tensorflow matplotlib numpy

Run the notebook:

jupyter notebook image_data_loader.ipynb

or in JupyterLab:

jupyter lab image_data_loader.ipynb

Ensure the dataset is organized in a directory with subfolders for each class:

dataset/
├── class1/
│   ├── image1.jpg
│   ├── image2.jpg
├── class2/
│   ├── image3.jpg
│   ├── image4.jpg

7. Key Results / Performance

  • Successfully loaded and labeled images directly from directory structure.
  • Normalized and resized images to a consistent shape for model compatibility.
  • Optimized pipeline with caching, shuffling, and prefetching to reduce training bottlenecks.
  • Verified correct label assignment through visualization.

Example dataset info:

Image shape: (180, 180, 3)
Number of classes: 5
Class names: ['cat', 'dog', 'bird', 'fish', 'horse']

8. Screenshots / Sample Out

Visualization sample:

[Image of class 'cat']  [Image of class 'dog']  [Image of class 'bird'] ...

Prefetch optimization:

AUTOTUNE = tf.data.AUTOTUNE
dataset = dataset.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)

9. Additional Learnings / Reflections

  • image_dataset_from_directory simplifies loading while handling labeling automatically.
  • Proper caching and prefetching significantly improve GPU utilization.
  • Visual inspection ensures the dataset is loaded correctly before training.
  • A well-prepared data pipeline prevents downstream model performance issues.

💡 Some interactive outputs (e.g., plots, widgets) may not display correctly on GitHub. If so, please view this notebook via nbviewer.org for full rendering.


👤 Author

Mehran Asgari Email: imehranasgari@gmail.com GitHub: https://github.com/imehranasgari


📄 License

This project is licensed under the Apache 2.0 License – see the LICENSE file for details.


About

Efficient image data loading and preprocessing pipeline using TensorFlow and Keras. Includes directory-based dataset loading, normalization, resizing, batching, and performance optimization with caching, shuffling, and prefetching for high-throughput model training.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published