When working with image directories there are many different ways to store data. These differences can lead to discrepancies or time costs when it comes to processing the data for analysis. Brought up originally based on analysis performed by @jenna-tomkinson.
Some of these include:
- Directories with no files
- Duplicate directory names
- Directories with differing file structures or files
- File naming patterns which don't match between common directory patterns
- Same number of image files per directory
- Check the metadata of the image files to ensure the fields are relatively similar and aren't missing data
- Check the data generally for PII or potentially private details.