Contained in this repo is a project that demonstrates how to generate synthetic marine ecological data and apply unsupervised machine learning (hierarchical clustering) to explore patterns in policy coverage across marine zones.
- ๐ง Data Generation: Simulates 20 marine zones with binary presence/absence data for 6 ecological policies.
- ๐ง Distance Metric: Jaccard distance โ ideal for binary attributes.
- ๐ณ Clustering Method: Hierarchical clustering with complete linkage.
- ๐ฟ Visualization: Dendrogram to reveal how zones group based on shared protections.
- ๐ Output:
generated_marine_zones.csvโ synthetic raw dataclustered_marine_zones.csvโ same data with cluster labelsdendrogram_marine_zones.pngโ dendrogram image
Each marine zone is evaluated for the presence (1) or absence (0) of the following protections:
- Coral Reef Protection
- Fishing Ban
- Turtle Nesting Zone
- Oil Drilling Ban
- Marine Sanctuary Status
- Mangrove Forest Protection
- Clone or download this repo
- Install dependencies:
pip install numpy pandas scipy matplotlib