MIA bias removal

This repository contains the code used in this paper. The overall aim is to construct datasets of members and non-members from Gutenberg minimizing bias.

Code contained here can be used to produce datasets, analyse ngram distribution overlap, train classifier and evaluate MIAs. If you find it usefull, please cite as follows:

Cédric Eichler, Nathan Champeil, Nicolas Anciaux, Alexandra Bensamoun, Héber H. Arcolezi, et al.. Nob-MIAs: Non-biased Membership Inference Attacks Assessment on Large Language Models with Ex-Post Dataset Construction. WISE 2024 - 25th International Web Information Systems Engineering conference, Dec. 2024, Doha, Qatar. pp.441-456

How to get datasets?

Pipeline to construct the datasets

Get the initial dataset

Every pipeline needs the initial dataset. Go to get_data to execute the pipeline fetching and filtering the Gutenberg dataset to build the initial dataset

Analyse ngram distribution ofr the dataset minimizing bias

Compile bff (see ngram_analysis/bff)
Follow the pipeline in ngram_analysis/altered_distribution

Organize the randomly sampled and ngram datasets neatly

Use the two sampling scripts in data_models

Produce the third, hard to classify dataset

Use the scripts in data_model/data_m3

Don't want to execute the pipeline?

Datasets can be found here

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
MIA_evaluation		MIA_evaluation
data_models		data_models
get_data		get_data
ngram_analysis		ngram_analysis
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MIA bias removal

How to get datasets?

Pipeline to construct the datasets

Get the initial dataset

Analyse ngram distribution ofr the dataset minimizing bias

Organize the randomly sampled and ngram datasets neatly

Produce the third, hard to classify dataset

Don't want to execute the pipeline?

Contents

data_models

get_data

MIA_evaluation

ngram_analysis

About

Releases

Packages

Contributors 2

Languages

ceichler/MIA-bias-removal

Folders and files

Latest commit

History

Repository files navigation

MIA bias removal

How to get datasets?

Pipeline to construct the datasets

Get the initial dataset

Analyse ngram distribution ofr the dataset minimizing bias

Organize the randomly sampled and ngram datasets neatly

Produce the third, hard to classify dataset

Don't want to execute the pipeline?

Contents

data_models

get_data

MIA_evaluation

ngram_analysis

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages