This repo contains a reference implementation for the model merging procedure from Merging Models with Fisher-Weighted Averaging along with a few scripts that allow some basic experimentation with HuggingFace models and the GLUE dataset.
The scripts are all contained in the scripts folder.
This script computes the diagonal approximation to the Fisher matrix given a model and a GLUE task. The Fisher matrix is saved to a file for use in merging.
Here are the parameters this script takes:
--modelEither the path to a saved HuggingFace model or the name of a pretrained HuggingFace model from the repository.--glue_taskThe name of the GLUE task to use when computing the Fisher.--splitThe split of the dataset to use when computing the Fisher.--fisher_pathThe path to the hdf5 where we will save the computed Fisher.--n_examplesThe number of examples to use when computing the Fisher.--batch_sizeThe batch size.--sequence_lengthSequence length to use.
This script performs the merging and print the best result.
Here are the parameters this script takes:
--modelsComma-separated list of models to merge. Each model is either the path to a saved HuggingFace model or the name of a pretrained HuggingFace model from the repository.--fishersOptional comma-separated list of Fishers to use. If this flag is not provided, then the script will do an isometric merge. Otherwise the number of Fishers must match the number of models in the--modelsflag. The i-th Fisher in this list is the Fisher of the i-th model from the--modelslist. Each fisher should be the path to an hdf5 file created by thecompute_fisher.pyscript.--glue_taskThe name of the GLUE task to evaluate on when merging the models.--splitThe split of the dataset to use for evaluating.--n_examplesThe number of examples to use when evaluating.--batch_sizeThe batch size.--sequence_lengthSequence length to use.--n_coeffsThe total number of different merging coefficients to try.--coeff_modeEither'grid'or'random'. The grid mode corresponds to choosing coefficients uniformly on a grid. The script only allows it when merging exactly two model. Random corresponds to randomly generating coefficients.--fisher_floorMinimum value to use for each Fisher entry. Prevents numerical issues when the Fisher for a parameter is close to zero across all the models.--favor_target_modelWhether to default to the first model's parameter value when all Fisher values are below the Fisher floor.--normalize_fishersWhether to normalize the Fishers so that each of them has an L2 norm of 1.