Fairness-Enhancing Classification Methods For Non-Binary Sensitive Features - How to Fairly Detect Leakages in Water Distribution Networks
This repository contains the implementation of the methods proposed in the paper "Fairness-Enhancing Classification Methods For Non-Binary Sensitive Features - How to Fairly Detect Leakages in Water Distribution Networks" by Janine Strotherm, Inaam Ashraf and Barbara Hammer. This paper is an extended version of the paper "Fairness-Enhancing Ensemble Classification in Water Distribution Networks" by Janine Strotherm and Barbara Hammer.
Especially if AI-supported decisions affect the society, the fairness of such AI-based methodologies constitutes an important area of research. In this contribution, we investigate the applications of AI to the socioeconomically relevant infrastructure of water distribution systems (WDSs). We propose an appropriate definition of protected groups in WDSs and generalized definitions of group fairness that provably coincide with existing definitions in their specific settings. We demonstrate that typical methods for the detection of leakages in WDSs are unfair in this sense. Further, we thus propose a general fairness-enhancing framework as an extension of the specific leakage detection pipeline, but also for an arbitrary learning scheme, to increase the fairness of the AI-based algorithm. Finally, we evaluate and compare several specific instantiations of this framework on a toy and on a realistic WDS to show their utility.
The implementation of the proposed methods can be found in the Implementation
folder.
The data required for these methods are stored or can be generated using the 2_DataGeneration
subfolder:
- The subfolder
2_DataGeneration/Hanoi
holds the data associated with the Hanoi network stored as excel files. It is the same data as used in this previous work. For the data generation, we refer to this previous repository. In this repository, we only store the resulting excel files. - The subfolder
2_DataGeneration/L-Town
is a modified version of this previous repository.- Due to their sizes, some of additionally required files can not be stored in this repository.
Therefore, it is required to
a) download this .inp file and store it as
2_DataGeneration/L-Town/networks/L-Town/Real/L-TOWN_Real.inp
and b) train a model as specified in this previous repository and store it as2_DataGeneration/L-Town/trained_models/model_L-TOWN_2880_45_1.pt
. - Afterwards,
running the
gen_scenario_leakages.py
script generates different leakage scenarios. - Consecutively,
running the
get_scenario_residuals.py
script generates the data associated with the L-Town network and stores it in a csv file. - Finally,
running the
get_node_ids.ipynb
notebook generates network information required for the network visualization and stores it in a csv file. - The csv files are not stored in this repository due to their sizes.
- Due to their sizes, some of additionally required files can not be stored in this repository.
Therefore, it is required to
a) download this .inp file and store it as
- The excel and csv files are in turn used in the
3_DataUsage
subfolder.
The methods themselves can be used using the 3_DataUsage
subfolder:
- In the
FairnessExploration_Hanoi_extended.ipynb
and in theFairnessExploration_L-Town.ipynb
notebook, the proposed approaches and results are implemented.
All requirements for the whole project are listed in the Implementation/requirements.txt
file.
You can cite the corresponding paper using the following BibTex entry:
@article{
Strotherm2024FairClassification,
author = {Strotherm, Janine and Ashraf, Inaam and Hammer, Barbara},
title = {{Fairness-enhancing classification methods for non-binary sensitive features -- How to fairly detect leakages in water distribution systems}},
year = {2024},
journal = {PeerJ Computer Science},
publisher = {PeerJ},
volume = {10},
pages = {e2317}
}