Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corrected german credit data #325

Open
nrkarthikeyan opened this issue Aug 11, 2022 · 2 comments
Open

Corrected german credit data #325

nrkarthikeyan opened this issue Aug 11, 2022 · 2 comments
Labels
datasets Issue relating to new or existing datasets good first issue Good for newcomers medium Intermediate skill level may be needed

Comments

@nrkarthikeyan
Copy link
Collaborator

nrkarthikeyan commented Aug 11, 2022

The widely used german credit data (that is already available in the toolkit) apparently has coding errors, so consider including
https://archive.ics.uci.edu/ml/datasets/South+German+Credit+%28UPDATE%29

http://www1.beuth-hochschule.de/FB_II/reports/Report-2019-004.pdf

@nrkarthikeyan nrkarthikeyan added the good first issue Good for newcomers label Aug 11, 2022
@hoffmansc hoffmansc added the datasets Issue relating to new or existing datasets label Aug 29, 2022
@nrkarthikeyan nrkarthikeyan added the medium Intermediate skill level may be needed label Sep 15, 2022
@nrkarthikeyan
Copy link
Collaborator Author

Tasks:

  • Ensure the license permits open source use.
  • Verify that this dataset is appropriate for fairness tasks and subset it accordingly (removing un-necessary columns etc.).
  • Ensure we have instance level records with protected attributes and outcomes.
  • First create sklearn-compatible dataset (dataframe) and an appropriate "classic" dataset (second priority).
  • Create a simple notebook where the dataset is consumed and simple fairness measures and computed at least.
  • DO NOT download and incorporate the data, rather include a function that will do this since data is not hosted in AIF360.

@Ricardo-OB
Copy link

Ricardo-OB commented Jan 22, 2023

I was working on Colab and also ran into this error on the German Credit notebook, aif360 gave me instructions to download two files and move them to a folder. It was solved by running this code:

%pip install wget
import wget, os

output_directory = os.path.join("/usr/local/lib/python3.8/dist-packages/aif360/data/raw/german")

german_data_url = "https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/german.data"
german_doc_url = "https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/german.doc"

german_data = wget.download(german_data_url, out=output_directory)
german_doc = wget.download(german_doc_url, out=output_directory)

@akstrek akstrek removed their assignment Sep 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasets Issue relating to new or existing datasets good first issue Good for newcomers medium Intermediate skill level may be needed
Projects
None yet
Development

No branches or pull requests

4 participants