diff --git a/README.md b/README.md
index fc30e0bc..8151a694 100644
--- a/README.md
+++ b/README.md
@@ -1,1069 +1,148 @@
-## Table of Contents
-
-- [Table of Contents](#table-of-contents)
-- [🔔News](#news)
-- [Editing Demo](#editing-demo)
-- [Knowledge Editing](#knowledge-editing)
- - [Task Definition](#task-definition)
- - [Knowledge insert](#knowledge-insert)
- - [Knowledge update](#knowledge-update)
- - [Knowledge erase](#knowledge-erase)
- - [Evaluation](#evaluation)
-- [🌟Overview](#overview)
- - [Current Implementation](#current-implementation)
- - [Tutorial notebook](#tutorial-notebook)
-- [Requirements](#requirements)
- - [🔧Pip Installation](#pip-installation)
- - [🐳Docker Installation](#docker-installation)
- - [Editing GPU memory usage](#editing-gpu-memory-usage)
-- [📌Use EasyEdit](#use-easyedit)
- - [BaseEditor](#baseeditor)
- - [Introduction by a Simple Example](#introduction-by-a-simple-example)
- - [Evaluation](#evaluation-1)
- - [Trainer](#trainer)
- - [MultimodalEditor](#multimodaleditor)
- - [Introduction by a Simple Example](#introduction-by-a-simple-example-1)
- - [Evaluation](#evaluation-2)
- - [Trainer](#trainer-1)
-- [Use EasyEdit with KnowEdit](#Use-easyedit-with-KnowEdit)
- - [Dataset](#Dataset)
- - [Usage](#usage)
-- [Editing Performance](#editing-performance)
-- [Citation](#citation)
-- [🎉Contributors](#contributors)
- - [Other Related Projects](#other-related-projects)
-
-## 🔔News
-- **2024-02-20 The AAAI2024 tutorial "*Knowledge Editing for Large Language Models*" has been canceled since speakers cannot present in person, we make this ppt[[Github](https://github.com/zjunlp/KnowledgeEditingPapers/blob/main/AAAI2024%40Tutorial_Knowledge%20Editing%20for%20LLMs.pdf)] [[Google Drive](https://drive.google.com/file/d/1fkTbVeRJSWmU7fBDeNf1OhHEkLSofQde/view?usp=sharing)] [[Baidu Pan](https://pan.baidu.com/s/1oJYgaMnxWIBE4kIcJuMSKg?pwd=p9j5)] available to the community**.
-- **2024-02-09 The EasyEdit has supported the Dynamic LoRA model editing method [MELO'AAAI24](https://arxiv.org/abs/2312.11795).**
-- **2024-02-06 We release a new paper: "[EasyInstruct: An Easy-to-use Instruction Processing Framework for Large Language Models](https://arxiv.org/abs/2402.03049)" with an HF demo [EasyInstruct](https://huggingface.co/spaces/zjunlp/EasyInstruct).**
-- **2024-02-06 We release a preliminary tool [EasyDetect](https://github.com/OpenKG-ORG/EasyDetect) for LLM hallucination detection,with a [demo](http://easydetect.openkg.cn/)**.
-- **2024-01-24 The EasyEdit has supported editing [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) (manually update transformers==4.34.0), we have also fixed some bugs in evaluating MEND (slightly influence the performance).**
-- **2024-01-16 The EasyEdit has supported the precise model editing method [PMET'AAAI24](https://arxiv.org/abs/2308.08742).**
-- **2024-01-03 We release a new paper:"[A Comprehensive Study of Knowledge Editing for Large Language Models](https://arxiv.org/abs/2401.01286)" with a new benchmark [KnowEdit](https://huggingface.co/datasets/zjunlp/KnowEdit)! We are looking forward to any comments or discussions on this topic :)**
-- **2023-12-06 The EasyEdit has supported the lifelong model editing method [GRACE'NeurIPS24](https://arxiv.org/abs/2211.11031).**
-- **2023-11-18 Our tutorial "Knowledge Editing for Large Language Models" has been accepted by COLING 2024.**
-- **2023-10-25 Our tutorial "Knowledge Editing for Large Language Models" has been accepted by AAAI 2024.**
-
-
-Previous News
-
-- **2023-10-24 The EasyEdit has supported efficient editing of [Baichuan2](https://github.com/baichuan-inc/Baichuan2), [ChatGLM2](https://github.com/THUDM/ChatGLM2-6B), [InternLM](https://github.com/InternLM/InternLM), [Qwen](https://github.com/QwenLM/Qwen) and fixed several bugs for a better user experience.**
-- **2023-10-14 We release the [MultimodalEditor](#multimodaleditor) based on the paper "[Can We Edit Multimodal Large Language Models?](https://arxiv.org/abs/2310.08475)".**
-- **2023-10-13 We release the paper "[Can We Edit Multimodal Large Language Models?](https://arxiv.org/abs/2310.08475)" accepted by EMNLP 2023.**
-- **2023-10-08 Our paper "[Editing Large Language Models: Problems, Methods, and Opportunities](https://arxiv.org/abs/2305.13172)" has been accepted by EMNLP 2023.**
-- **2023-10-07 The EasyEdit have supported editing models with multiple GPUs, using huggingface [`Accelerate`](https://github.com/zjunlp/EasyEdit/blob/main/hparams/ROME/llama-7b.yaml#L24).**
-- **2023-9-21 The EasyEdit have supported Parameter-Efficient Fine-Tuning through AdaLoRA to inject knowledge into the LLM.**
-- **2023-8-31 The EasyEdit have supported official fine-tuning API for gpt-3.5-turbo to customize ChatGPT for your editing cases.**
-- **2023-8-15 We release the paper "[EasyEdit: An Easy-to-use Knowledge Editing Framework for Large Language Models](https://arxiv.org/abs/2308.07269)."**
-- **2023-7-12 We release version 0.0.1, supporting several knowledge editing techniques for LLMs. EasyEdit helps to better align LLMs with changing needs and values of users.**
-- **2023-5-22 We release the paper "[Editing Large Language Models: Problems, Methods, and Opportunities](https://arxiv.org/abs/2305.13172)" and provide a paper list at [PaperList](https://github.com/zjunlp/KnowledgeEditingPapers).**
-- **2023-3-25 The EasyEdit project has been launched and is under development.**
-
-This repository is a subproject of [KnowLM](https://github.com/zjunlp/KnowLM).
-
-
-
-
----
-
-> A Comprehensive Study of Knowledge Editing for Large Language Models [[paper](https://arxiv.org/abs/2401.01286)][[benchmark](https://huggingface.co/datasets/zjunlp/KnowEdit)][[code](https://github.com/zjunlp/EasyEdit)]
-
-> AAAI 2024 Tutorial [[Google Drive]()] [[Baidu Pan]()]
-
-> AACL 2023 Tutorial [[Google Drive](https://drive.google.com/file/d/1EW-cusC_llCM0wEshkIdYuYrvfBPCDRz/view?usp=sharing)] [[Baidu Pan](https://pan.baidu.com/s/1NupastGJUzcUIAjI64J1tw?pwd=i5an)]
-
-## Editing Demo
-
-There is a demonstration of editing. The GIF file is created by [Terminalizer](https://github.com/faressoft/terminalizer).
-
-
-
-## Knowledge Editing
+## 💡 Conceptual Knowledge Editing
-

+
### Task Definition
-Deployed models may still make unpredictable errors. For example, Large Language Models (LLMs) notoriously _hallucinate_, _perpetuate bias_, and _factually decay_, so we should be able to adjust specific behaviors of pre-trained models.
-
-**Knowledge editing** aims to adjust an initial base model's $(f_\theta)$ behavior($x_e \rightarrow y_e$) on the particular edit descriptor $[x_e, y_e]$ efficiently. There are usually three forms:
-
-#### Knowledge insert
-Inject knowledge that LLMs have not seen before. such as:
-- *How many times has Messi won the World Cup? 0* $\rightarrow$ **1**:
- - $x_e$: How many times has Messi won the World Cup? $\quad$ $y_e$: 1
-
-#### Knowledge update
-LLMs often suffer from knowledge cutoff issue, EasyEdit can update outdated knowledge. such as:
-- *The president of USA: Donald Trump* $\rightarrow$ **Joe Biden**:
- - $x_e$: Who is the president of the US? $\quad$ $y_e$: Joe Biden
-
-#### Knowledge erase
-EasyEdit can erase sensitive information. such as:
-- *The phone number of someone is XXXX* $\rightarrow$ **__**
- - $x_e$: The phone number of someone is $\quad$ $y_e$: __
+**Concept** is a generalization of the world in the process of cognition, which represents the shared features and essential characteristics of a class of entities.
+Therefore, the endeavor of concept editing aims to modify the definition of concepts, thereby altering the behavior of LLMs when processing these concepts.
-Without influencing the model behavior on unrelated samples, the ultimate goal is to create an edited model $(f_\theta')$.
### Evaluation
-
-
-The knowledge editing process generally impacts the predictions for a broad set of inputs **that are closely** associated with the edit example, called the **editing scope**.
-
-A successful edit should adjust the model’s behavior within the editing scope while remaining unrelated inputs(as below formula).
-
-$$
-f_{\theta_{e}}(x) = \begin{cases}
-y_e & \text{if } x \in I(x_e,y_e) \\
-f_{\theta}(x) & \text{if } x \in O(x_e, y_e) \end{cases}
-$$
-
-In addition to this, the performance of knowledge editing should be measured from multiple dimensions:
+To analyze conceptual knowledge modification, we adopt the metrics for factual editing (the target is the concept $C$ rather than factual instance $t$), adhering to the framework established in main branch.
- `Reliability`: the success rate of editing with a given editing description
- `Generalization`: the success rate of editing **within** the editing scope
- `Locality`: whether the model's output changes after editing for unrelated inputs
-- `Portability`: the success rate of editing for factual reasoning(one hop, synonym, one-to-one relation)
-- `Efficiency`: time and memory consumption required during the editing process
-
-## 🌟Overview
-
-EasyEdit is a Python package for edit Large Language Models (LLM) like `GPT-J`, `Llama`, `GPT-NEO`, `GPT2`, `T5`(support models from **1B** to **65B**), the objective of which is to alter the behavior of LLMs efficiently within a specific domain without negatively impacting performance across other inputs. It is designed to be easy to use and easy to extend.
-
-
-
-
-
-- EasyEdit contains a unified framework for **Editor**, **Method** and **Evaluate**, respectively representing the editing scenario, editing technique, and evaluation method.
-- Each Knowledge Editing scenario comprises of three components:
-
- - `Editor`: such as BaseEditor(**Factual Knowledge** and **Generation** Editor) for LM, MultiModalEditor(**MultiModal Knowledge**).
- - `Method`: the specific knowledge editing technique used(such as **ROME**, **MEND**, ..).
- - `Evaluate`: **Metrics** for evaluating knowledge editing performance.
- - `Reliability`, `Generalization`, `Locality`, `Portability`
-
-- The current supported knowledge editing techniques are as follows:
- - [FT](https://github.com/kmeng01/rome): Fine-Tuning with $L_\infty$ constraint
- - [SERAC](https://github.com/eric-mitchell/serac): Mitchell et al. Memory-based
- - [IKE](https://github.com/Zce1112zslx/IKE): Ce Zheng et al. In-Context Editing
-
- - [MEND](https://github.com/eric-mitchell/mend): Mitchell et al. Hypernetwork
- - [KN](https://github.com/Hunter-DDM/knowledge-neurons): Damai Dai et al. Locate then Edit
- - [ROME](https://github.com/kmeng01/rome): Kevin Meng et al. Locate and Edit
- - [MEMIT](https://github.com/kmeng01/memit): Kevin Meng et al. Locate and Edit
- - [GRACE](https://github.com/thartvigsen/grace): Thomas Hartvigsen et al. Memory-based
- - [PMET](https://github.com/xpq-tech/PMET): Xiaopeng Li et al. Locate and Edit
- > Due to the limited compatibility of this toolkit and limited by the transformer version, some knowledge editing methods including [T-Patcher](https://github.com/ZeroYuHuang/Transformer-Patcher), [KE](https://github.com/nicola-decao/KnowledgeEditor), [CaliNet](https://github.com/dqxiu/CaliNet)
- are not supported.
-
-#### Current Implementation
-
-You can choose different editing methods according to your specific needs.
-| **Method** | T5 | GPT-2 | GPT-J | GPT-NEO | LlaMA | Baichuan | ChatGLM2 | InternLM | Qwen | Mistral
-| :-------: | :-------: | :-------: | :-------: | :-------: | :-------: | :-------: | :-------: | :-------: | :-------: | :-------: |
-| FT | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
-| AdaLoRA | | | | | ✅ | | | | | |
-| SERAC | ✅ | ✅ | ✅ | | ✅ | | | | | |
-| IKE | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |✅ | ✅ | ✅ | ✅ |
-| MEND | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
-| KN | ✅ | ✅ | ✅ | | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
-| ROME | | ✅ | ✅ | ✅ | ✅ | ✅ |✅ | ✅ | ✅ | ✅ |
-| MEMIT | | ✅ | ✅ | ✅ | ✅ | ✅ | ✅| ✅ | ✅ | ✅ |
-| GRACE | | ✅| ✅ | | ✅| | | | | |
-| MELO | |✅ | | | | | | | | |
-| PMET | | | ✅ | | ✅| | | | | |
-
-
-
-
-
-
-
-> ❗️❗️ EasyEdit supports editing ChatGPT with FT. An edit for `gpt-3.5-turbo` returns model_name(for example, `ft: GPT-3.5-turbo-0613 :personal::7tWZkLzq`) instead model weights.
-
-> ❗️❗️ If you intend to use Mistral, please update the `transformers` library to version 4.34.0 manually. You can use the following code: `pip install transformers==4.34.0`.
-
-> ❗️❗️ If you intend to use MELO, please get the in ./easyeditor/models/melo/peft_egg and pip install it in your environment.
-
-**Dataset**
-
-**Benchmark: KnowEdit** [[Hugging Face]](https://huggingface.co/datasets/zjunlp/KnowEdit)[[WiseModel]](https://wisemodel.cn/datasets/zjunlp/KnowEdit)[[ModelScope]](https://www.modelscope.cn/datasets/zjunlp/KnowEdit)
-
-
-
- Task |
- Knowledge Insertion |
- Knowledge Modification |
- Knowledge Erasure |
-
-
-
-
- Datasets |
- Wikirecent |
- ZsRE |
- WikiBio |
- WikiDatacounterfact |
- Convsent |
- Sanitation |
-
-
- Type |
- Fact |
- Question Answering |
- Hallucination |
- Counterfact |
- Sentiment |
- Unwanted Info |
-
-
- # Train |
- 570 |
- 10,000 |
- 592 |
- 1,455 |
- 14,390 |
- 80 |
-
-
- # Test |
- 1,266 |
- 1230 |
- 1,392 |
- 885 |
- 800 |
- 80 |
-
-
-
-
-We provide **detailed scripts** for user to easily use KnowEdit, please refer to [examples](https://github.com/zjunlp/EasyEdit/blob/main/examples/KnowEdit.md).
-
- dataset description
-
-- ZsRE: is a context-free question-answering task. Given a question based on the subject and relation, the model is expected to provide the correct object as the answer.
-- Wikirecent: This dataset specifically focuses on triplets that have been recently inserted into WikiData after July 2022.
-- WikiBio: The original dataset was created by prompting GPT-3 to generate 238 Wikipedia-style biographies using subjects from the WikiBio.
-- WikiDatacounterfact: Since tail entities are often not captured by models, and therefore are not suitable for testing modification edits, RippleEdit collects triplets about popular entities, where the subject corresponds to one of the top-viewed pages in Wikipedia.
-- Convsent: This is a sentiment editing task that assesses the model's ability to modify a dialog agent's sentiment on a specific topic without affecting its responses to other topics.
-- Sanitation: This dataset specifically addresses privacy concerns associated with learned language models.
-
-
-
- dataset structure
-
-```text
-knowedit
-├── WikiBio
-│ ├── wikibio-test-all.json
-│ └── wikibio-train-all.json
-├── ZsRE
-│ └── ZsRE-test-all.json
-├── wiki_counterfact
-│ ├── test_cf.json
-│ └── train_cf.json
-├── convsent
-│ ├── blender_test.json
-│ ├── blender_train.json
-│ └── blender_val.json
-├── convsent
-│ ├── trivia_qa_test.json
-│ └── trivia_qa_train.json
-└── wiki_recent
- ├── recent_test.json
- └── recent_train.json
-```
-
-
-
----
-
-**Datasets for Factual Knowledge**
-| **dataset** | Google Drive| BaiduNetDisk | Description |
-| :--------: | :-----------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------: | :--------------------------------------------------------------------------------: |
-| _ZsRE_ plus | [[Google Drive]](https://drive.google.com/file/d/1WRo2SqqgNtZF11Vq0sF5nL_-bHi18Wi4/view?usp=sharing) | [[BaiduNetDisk]](https://pan.baidu.com/s/1cQleUMsNjuDk4BKx2bZkag?pwd=xzky) | Question Answering dataset using question rephrasings |
-| _Counterfact_ plus | [[Google Drive]](https://drive.google.com/file/d/1WRo2SqqgNtZF11Vq0sF5nL_-bHi18Wi4/view?usp=sharing) | [[BaiduNetDisk]](https://pan.baidu.com/s/1cQleUMsNjuDk4BKx2bZkag?pwd=xzky) | Counterfact dataset using Entity replacement |
-
-We provide zsre and counterfact datasets to verify the effectiveness of knowledge editing. You can download them here. [[Google Drive]](https://drive.google.com/file/d/1WRo2SqqgNtZF11Vq0sF5nL_-bHi18Wi4/view?usp=sharing), [[BaiduNetDisk]](https://pan.baidu.com/s/1cQleUMsNjuDk4BKx2bZkag?pwd=xzky).
-- For **locality**, in addition to testing unrelated instances, we also provide tests on distracting ([reference: Detecting Edit Failures...](https://arxiv.org/abs/2305.17553)), other attribution, and other downstream tasks (such as commonsense reasoning).
-- For **portability**, it tests whether the model can apply edited instances for inference. We provide evaluations for one-hop reasoning, subject alias, and inverse relation (eg, a one-to-one relationship between spouses should be bidirectionally edited).
+Concept Specific Evaluation Metrics
- dataset description
+- `Instance Change`: capturing the intricacies of these instance-level changes
+- `Concept Consistency`: the semantic similarity of generated concept definition
-```text
-editing-data
-├── counterfact
-│ ├── counterfact-edit.json
-│ ├── counterfact-train.json
-│ └── counterfact-val.json
-├── locality
-│ ├── Commonsense Task
-│ │ ├── piqa_valid-labels.lst
-│ │ └── piqa_valid.jsonl
-│ ├── Distracting Neighbor
-│ │ └── counterfact_distracting_neighbor.json
-│ └── Other Attribution
-│ └── counterfact_other_attribution.json
-├── portability
-│ ├── Inverse Relation
-│ │ └── zsre_inverse_relation.json
-│ ├── One Hop
-│ │ ├── counterfact_portability_gpt4.json
-│ │ └── zsre_mend_eval_portability_gpt4.json
-│ └── Subject Replace
-│ ├── counterfact_subject_replace.json
-│ └── zsre_subject_replace.json
-└── zsre
- ├── zsre_mend_eval.json
- ├── zsre_mend_train_10000.json
- └── zsre_mend_train.json
-```
-
-- counterfact: original counterfact dataset using Entity replacement
-- zsre: original question answering dataset using question rephrasings
-- locality (evaluation for locality, see details in this [paper](https://arxiv.org/abs/2305.13172))
- - Commonsense Task: evaluation for other downstream tasks such as commonsense task
- - Distracting Neighbor: test on distracting neighborhood ([reference: Detecting Edit Failures...](https://arxiv.org/abs/2305.17553))
- - Other Attribution
-- portability
- - Inverse Relation: evaluation for one-to-one relationship such as `spouse`
- - One Hop: evaluation for one-hop reasoning
- - Subject Replace: evaluation for synonym replacement
-
-
----
-**Datasets for Multimodal Knowledge**
-| **dataset** | Google Drive| BaiduNetDisk | Description |
-| :--------: | :-----------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------: | :--------------------------------------------------------------------------------: |
-| E-IC | [[Google Drive]](https://drive.google.com/drive/folders/1jBdTJxUb9wEeHnvG-RY8dv5_I4QlDpUS?usp=drive_link) | [[BaiduNetDisk]](https://pan.baidu.com/s/1g9nMv-5BJmztxYU-BWRdvg?pwd=ik5c) | dataset for editing _Image Captioning_ |
-| E-VQA | [[Google Drive]](https://drive.google.com/drive/folders/1jBdTJxUb9wEeHnvG-RY8dv5_I4QlDpUS?usp=drive_link) | [[BaiduNetDisk]](https://pan.baidu.com/s/1g9nMv-5BJmztxYU-BWRdvg?pwd=ik5c) | dataset for editing _Visual Question Answering_ |
+## 🌟 Usage
-- All **images** used in **E-IC** and **E-VQA** are available for download at [Google Drive](https://drive.google.com/file/d/1fQzJBFkok5kFZT6QUuT-HCuYKk2Vb93O/view)
-- For **locality**, it is the same as factual editing in order to measure whether unrelated facts retain their outputs.
-- For **multimodal locality**, it assesses the impact of editing on the visual module, which is similar to regular **locality**.
-
- dataset description
-
-```text
-editing-data
-├── caption
-│ ├── caption_train_edit.json
-│ └── caption_eval_edit.json
-├── locality
-│ ├── NQ dataset
-│ │ ├── train.json
-│ │ └── validation.json
-├── multimodal_locality
-│ ├── OK-VQA dataset
-│ │ ├── okvqa_loc.json
-└── vqa
- ├── vqa_train.json
- └── vqa_eval.json
-```
-- Multimodal locality (evaluation for multimodal locality, see dataset's details in this [paper](http://openaccess.thecvf.com/content\_CVPR\_2019/html/Marino\_OK-VQA\_A\_Visual\_Question\_Answering\_Benchmark\_Requiring\_External\_Knowledge\_CVPR\_2019\_paper.html))
-
+### 🎍 Current Implementation
+As the main Table of our paper, four editing methods are supported for conceptual knowledge editing.
+| **Method** | GPT-2 | GPT-J | LlaMA2-13B-Chat | Mistral-7B-v0.1
+| :--------------: | :--------------: | :--------------: | :--------------: | :--------------: |
+| FT | ✅ | ✅ | ✅ | ✅ |
+| ROME | ✅ | ✅ |✅ | ✅ |
+| MEMIT | ✅ | ✅ | ✅| ✅ |
+| PROMPT | ✅ | ✅ | ✅ | ✅ |
-#### Tutorial notebook
+> ❗️❗️ If you intend to use **"LlaMA2-13B-Chat"** rather than "LlaMA2-13B-Base", please modify the "model_name" in "./hparams/[METHOD]/llama-7b.yaml" or write the .yaml file by yourself.
-| **Method** | Description | GPT-2 | LlaMA |
-| :--------: | :----------------------------: | :---------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------: |
-| _IKE_ | In-Context Learning (ICL) Edit | [[Colab-gpt2]](https://colab.research.google.com/drive/1m6Xg05XCs_WZKH0D9KJQqg9z0ZiDhEkL) | [[Colab-llama]](https://colab.research.google.com/drive/1m6Xg05XCs_WZKH0D9KJQqg9z0ZiDhEkL) |
-| _ROME_ | Locate-Then-Edit Neurons | [[Colab-gpt2]](https://colab.research.google.com/drive/1KkyWqyV3BjXCWfdrrgbR-QS3AAokVZbr?usp=sharing) | [[Colab-llama]](https://colab.research.google.com/drive/1W18GPlBCV9K6lDy7eX8V5W0knTLr5r0A) |
-| _MEMIT_ | Locate-Then-Edit Neurons | [[Colab-gpt2]](https://colab.research.google.com/drive/1P1lVklP8bTyh8uxxSuHnHwB91i-1LW6Z) | [[Colab-llama]](https://colab.research.google.com/drive/19fKCKtVBU2fqj6eTvDokGoTrxvXkEPPq) |
-
-
-
-## Requirements
-
-#### 🔧Pip Installation
+### 🔧 Pip Installation (This performs the same manipulation that the main branch does.)
**Note: Please use Python 3.9+ for EasyEdit**
+
To get started, simply install conda and run:
```shell
git clone https://github.com/zjunlp/EasyEdit.git
conda create -n EasyEdit python=3.9.7
...
+conda activate EasyEdit
pip install -r requirements.txt
```
-#### 🐳Docker Installation
-
-We packaged the environment, you can download Docker from [this link](https://docs.docker.com/get-docker/).
-
-Pull the Docker image from Docker Hub or Aliyun:
-
-```bash
-docker pull zjunlp/easyedit
-```
-
-```bash
-docker pull registry.cn-hangzhou.aliyuncs.com/zjunlp/easyedit:v1
-```
-
-If you want to build the Docker image locally, you can clone the project to your local machine and build the Docker image:
-
-```bash
-git clone https://github.com/zjunlp/EasyEdit.git
-cd EasyEdit
-docker build -t your-image-name .
-```
-
-Then run the Docker image as a container:
-
-```bash
-docker run -p 8080:80 your-image-name
-```
-#### Editing GPU memory usage
-Our results are all based on the default configuration
-
-| | llama-2-7B | chatglm2 | gpt-j-6b | gpt-xl |
-|:-------:|:----------:|:--------:|:--------:|:------:|
-| FT | 60GB | 58GB | 55GB | 7GB |
-| SERAC | 42GB | 32GB | 31GB | 10GB |
-| IKE | 52GB | 38GB | 38GB | 10GB |
-| MEND | 46GB | 37GB | 37GB | 13GB |
-| KN | 42GB | 39GB | 40GB | 12GB |
-| ROME | 31GB | 29GB | 27GB | 10GB |
-| MEMIT | 33GB | 31GB | 31GB | 11GB |
-| AdaLoRA | 29GB | 24GB | 25GB | 8GB |
-| GRACE | 27GB | | 23GB | 6GB |
-
-## 📌Use EasyEdit
-
-- Edit large language models(LLMs) around **_5 seconds_**
-
-- Following example shows you how to perform editing with EasyEdit. More examples and tutorials can be found at [examples](https://github.com/zjunlp/EasyEdit/tree/main/examples)
-
-### BaseEditor
-
-> `BaseEditor`is the class for Language Modality Knowledge Editing. You can choose the appropriate editing method based on your specific needs.
-
-- Due to different transformer versions and different GPU models, the editing results may fluctuate **slightly**.
-
-#### Introduction by a Simple Example
-
-With the modularity and flexibility of `EasyEdit`, you can easily use it to edit model.
-
-**Step1: Define a PLM as the object to be edited.**
-Choose the PLM to be edited. `EasyEdit` supports partial models(`T5`, `GPTJ`, `GPT-NEO`, `LlaMA` so far) retrievable on [HuggingFace](https://huggingface.co/). The corresponding configuration file directory is `hparams/YUOR_METHOD/YOUR_MODEL.YAML`, such as `hparams/MEND/gpt2-xl.yaml`, set the corresponding `model_name` to select the object for knowledge editing.
-
-```yaml
-model_name: gpt2-xl
-model_class: GPT2LMHeadModel
-tokenizer_class: GPT2Tokenizer
-tokenizer_name: gpt2-xl
-model_parallel: false # true for multi-GPU editing
-```
-
-**Step2: Choose the appropriate Knowledge Editing Method**
-The selection of editing methods is a **crucial** step, as different methods have their own strengths and weaknesses. Users need to consider the trade-off between editing success rate, generalization, and maintaining unrelated performance. For specific performance details of each method, please refer to the paper: [Editing Large Language Models: Problems, Methods, and Opportunities](https://arxiv.org/abs/2305.13172).
-
-```python
-## In this case, we use MEND method, so you should import `MENDHyperParams`
-from easyeditor import MENDHyperParams
-## Loading config from hparams/MEMIT/gpt2-xl.yaml
-hparams = MENDHyperParams.from_hparams('./hparams/MEND/gpt2-xl')
-```
-
-**Step3: Provide the edit descriptor and edit target**
-
-```python
-## edit descriptor: prompt that you want to edit
-prompts = [
- 'What university did Watts Humphrey attend?',
- 'Which family does Ramalinaceae belong to',
- 'What role does Denny Herzig play in football?'
-]
-## You can set `ground_truth` to None !!!(or set to original output)
-ground_truth = ['Illinois Institute of Technology', 'Lecanorales', 'defender']
-## edit target: expected output
-target_new = ['University of Michigan', 'Lamiinae', 'winger']
-```
-
-**Step4: Combine them into a `BaseEditor`**
-`EasyEdit` provides a simple and unified way to init Editor, like huggingface: **from_hparams**.
-
-```python
-## Construct Language Model Editor
-editor = BaseEditor.from_hparams(hparams)
-```
-
-**Step5: Provide the data for evaluation**
-Note that the data for portability and locality are both **optional**(set to None for basic editing success rate evaluation only). The data format for both is a **dict**, for each measurement dimension, you need to provide the corresponding prompt and its corresponding ground truth. Here is an example of the data:
-
-```python
-locality_inputs = {
- 'neighborhood':{
- 'prompt': ['Joseph Fischhof, the', 'Larry Bird is a professional', 'In Forssa, they understand'],
- 'ground_truth': ['piano', 'basketball', 'Finnish']
- },
- 'distracting': {
- 'prompt': ['Ray Charles, the violin Hauschka plays the instrument', 'Grant Hill is a professional soccer Magic Johnson is a professional', 'The law in Ikaalinen declares the language Swedish In Loviisa, the language spoken is'],
- 'ground_truth': ['piano', 'basketball', 'Finnish']
- }
-}
-```
-
-In the above example, we evaluate the performance of the editing methods about "neighborhood" and "distracting".
-
-**Step6: Edit and Evaluation**
-Done! We can conduct Edit and Evaluation for your model to be edited. The `edit` function will return a series of metrics related to the editing process as well as the modified model weights.
-
-```python
-metrics, edited_model, _ = editor.edit(
- prompts=prompts,
- ground_truth=ground_truth,
- target_new=target_new,
- locality_inputs=locality_inputs,
- keep_original_weight=False
-)
-## metrics: edit success, rephrase success, locality e.g.
-## edited_model: post-edit model
-```
-
-### Evaluation
-
-We specify the return metrics as `dict` format, including model prediction evaluations before and after editing. For each edit, it will include the following metrics:
-
-- `rewrite_acc` $\rightarrow$ **Reliablilty**
-- `rephrase_acc` $\rightarrow$ **Generalization**
-- `locality` $\rightarrow$ **Locality**
-- `portablility` $\rightarrow$ **Portablility**
-
-```json
-{
- "post": {
- "rewrite_acc": ,
- "rephrase_acc": ,
- "locality": {
- "YOUR_LOCALITY_KEY": ,
- //...
- },
- "portablility": {
- "YOUR_PORTABILITY_KEY": ,
- //...
- },
- },
- "pre": {
- "rewrite_acc": ,
- "rephrase_acc": ,
- "portablility": {
- "YOUR_PORTABILITY_KEY": ,
- //...
- },
- }
-}
-```
-
-- For evaluation for Reliablilty, you only need to provide the corresponding editing `prompts` and editing `target_new`.
-- For evaluation for Generalization, `rephrase_prompts` are required.
-- For evaluation for Locality and Portablility, you need to define the name of the corresponding metric, as well as `prompts` and `ground_truth`.
- - > Note: the length needs to be equal to the edit prompts
-
-### Trainer
-
-- meta-learning based: `MEND`
-- memory-based routing: `SERAC`
-
-For above editing methods, pre-training of corresponding meta-networks or classifiers is required. Therefore, in EasyEdit, we provide a unified framework for pretraining the relevant network structures. Take the training MEND for example:
-
-- **Step 1** and **Step 2** are the same as the example above, which involves selecting the appropriate editing model and editing method.
-
-**Step3: Provide the edit training set**
-The currently supported and available datasets are: `zsre` and `counterfact`([Google Drive](https://drive.google.com/file/d/1WRo2SqqgNtZF11Vq0sF5nL_-bHi18Wi4/view?usp=sharing)). Please place them in the "data" directory and initialize the dataset_class (`ZsreDataset` for zsre and `CounterFactDataset` for counterfact) to load the corresponding training set.
-
-```python
-train_ds = ZsreDataset('./data/zsre_mend_train.json', config=training_hparams)
-eval_ds = ZsreDataset('./data/zsre_mend_eval.json', config=training_hparams)
-```
-
-**Step4: Combine them into a `Trainer`**
-
-```python
-trainer = EditTrainer(
- config=training_hparams,
- train_set=train_ds,
- val_set=eval_ds
-)
-```
-
-**Step5: Run and Edit**
-Done! We can conduct Run and Evaluation.
-
-```python
-trainer.run()
-```
-
-- Run: The `CHECKPOINT` will be saved to the path `results_dir`.
-- Edit: Set the `archive` field in the **hparams file** to `CHECKPOINT`. EasyEdit will automatically load the corresponding pre-trained weights during the editing process([Go to edit](#use-easyedit)).
-
-**Training Example**
-```python
-from easyeditor import EditTrainer, MENDTrainingHparams, ZsreDataset
-
-training_hparams = MENDTrainingHparams.from_hparams('hparams/TRAINING/MEND/llama-7b.yaml')
-train_ds = ZsreDataset('./data/zsre/zsre_mend_train.json', config=training_hparams)
-eval_ds = ZsreDataset('./data/zsre/zsre_mend_eval.json', config=training_hparams)
-trainer = EditTrainer(
- config=training_hparams,
- train_set=train_ds,
- val_set=eval_ds
-)
-trainer.run()
-```
-
-
-
-
-
-### MultimodalEditor
-
-> `MultimodalEditor` is the class for Multi-Modality Editing. You can choose the appropriate editing method based on your specific needs.
-
-- Due to different transformer versions and different GPU models, the editing results may fluctuate **slightly**.
-
-**M-Generality Results**
-
-
-
-| *VQA* | KE | IKE | SERAC | MEND |
-| :---: | :---------: | :------------: | :--------: | :---------: |
-| MiniGPT-4 | 88.60 | 99.95 | 88.10 | 99.60 |
-| BLIP2 | 74.60 | 99.79 | 99.20 | 99.40 |
-
-| *Caption* | KE | IKE | SERAC | MEND |
-| :---: | :---------: | :------------: | :--------: | :---------: |
-| MiniGPT-4 | 13.60 | 91.00 | 91.47 | 93.35 |
-| BLIP2 | 1.60 | 96.55 | 99.72 | 93.48 |
-
-#### Introduction by a Simple Example
-
-With the modularity and flexibility of `EasyEdit`, you can easily use it to edit model.
-
-**Step1: Define a MLLM as the object to be edited.**
-Choose the MLLM to be edited. `EasyEdit` supports partial models(`MiniGPT-4`, `Blip2` so far) retrievable on [HuggingFace](https://huggingface.co/). The corresponding configuration file directory is `hparams/YUOR_METHOD/YOUR_MODEL.YAML`, such as `hparams/MEND/minigpt4.yaml`, set the corresponding `model_name` to select the object for editing.
-
-```python
-model_name: minigpt4
-model_class: Blip2OPT
-tokenizer_class: LlamaTokenizer
-tokenizer_name: llama-7b
-```
-
-**Step2: Choose the appropriate Editing Method**
-The selection of editing methods is a **crucial** step, as different methods have their own strengths and weaknesses. Users need to consider the trade-off between editing success rate, generalization, and maintaining unrelated performance.
-
-```python
-## In this case, we use MEND method, so you should import `MENDMultimodalHparams`
-from easyeditor import MENDMultimodalHparams
-## Loading config from hparams/MEMIT/gpt2-xl.yaml
-hparams = MENDMultimodalHparams.from_hparams('./hparams/MEND/minigpt4')
-```
-
-**Step3: Provide the edit descriptor and edit target**
-
-```python
-## edit descriptor: prompt that you want to edit
-prompts = [
- "How many tennis balls are in the picture?",
- "What is the red food?"
-]
-## edit target: expected output
-targets = ["2", "tomatoes",]
-## edit image: image for editing
-image = [
- "val2014/COCO_val2014_000000451435.jpg",
- "val2014/COCO_val2014_000000189446.jpg"
-]
-```
-
-**Step4: Combine them into a `MultimodalEditor`**
-`EasyEdit` provides a simple and unified way to init Editor, like huggingface: **from_hparams**.
-
-```python
-## Construct MLLM Editor
-editor = MultimodalEditor.from_hparams(hparams)
-```
-
-**Step5: Provide the data for evaluation**
-Note that the data for locality and multimodal locality are both **optional**(set to None for basic editing success rate evaluation only). The data format for both is a **dict**, for each measurement dimension, you need to provide the corresponding prompt and its corresponding ground truth. Here is an example of the data:
-
-```python
-locality_inputs = {
- 'text': {
- 'prompt': [
- "nq question: what purpose did seasonal monsoon winds have on trade"
- ],
- 'ground_truth': [
- "enabled European empire expansion into the Americas and trade \
- routes to become established across the Atlantic and Pacific oceans"
- ]
- },
- 'vision': {
- 'prompt': ["What sport can you use this for?"],
- 'ground_truth': ["riding"],
- 'image': ["val2014/COCO_val2014_000000297147.jpg"],
- }
-}
-```
-
-In the above example, we evaluate the performance of the editing methods about "neighborhood" and "distracting".
-
-**Step6: Edit and Evaluation**
-Done! We can conduct Edit and Evaluation for your model to be edited. The `edit` function will return a series of metrics related to the editing process as well as the modified model weights.
-
-```python
-metrics, edited_model, _ = editor.edit(
- prompts=prompts,
- target_new=target_new,
- image=image,
- locality_inputs=locality_inputs,
- keep_original_weight=False
-)
-## metrics: edit success, rephrase success, locality e.g.
-## edited_model: post-edit model
-```
+> ❗️❗️ If you intend to use Mistral, please update the `transformers` library to version 4.34.0 manually. You can use the following code: `pip install transformers==4.34.0`.
-### Evaluation
+---
-We specify the return metrics as `dict` format, including model prediction evaluations before and after editing. For each edit, it will include the following metrics:
-- `rewrite_acc` $\rightarrow$ **Reliablilty**
-- `rephrase_acc` $\rightarrow$ **Generalization**
-- `image_rephrase_acc` $\rightarrow$ **Generalization for Multimodal**
-- `locality_acc` $\rightarrow$ **Locality**
-- `multimodal_locality_acc` $\rightarrow$ **Locality for Multimodal**
+### 📂 Data Preparation
-```json
-{
- "post": {
- "rewrite_acc": ,
- "rephrase_acc": ,
- "image_rephrase_acc": ,
- "locality_acc": ,
- "multimodal_locality_acc": ,
- },
- "pre": {
- "rewrite_acc": ,
- "rephrase_acc": ,
- "image_rephrase_acc": ,
- }
-}
-```
+**Dataset for Conceptual Knowledge Editing: ConceptEdit**
+You can download it from [[Google Drive]](https://drive.google.com/drive/folders/1Hp1DfIuj6Ih6ZLVENS-UmgJT8mRBlFC2?usp=drive_link), then put the data in folder "./data".
-- For evaluation for Reliablilty, you only need to provide the corresponding editing `prompts` and editing `target_new`.
-- For evaluation for Generalization, `rephrase_prompts` are required.
-- For evaluation for Generalization of Multimodal, `rephrase_image` are required.
-- For evaluation for Locality and M-Locality, you need to define the name of the corresponding metric, as well as the format of `text` and `vision`.
- - > Note: the length needs to be equal to the edit prompts
+**"concept_data.json"** is the main data file containing 452 concepts, 8,767 instances with 22 superclasses.
-### Trainer
+> ❗️❗️ For quick start, we preprocess the data for experiment on different settings and exhibit the post-processed files which are used in main Table. You can follow its format to build your file if needed.
-- meta-learning based: `MEND`
-- memory-based routing: `SERAC`
+
-For above editing methods, pre-training of corresponding meta-networks or classifiers is required. Therefore, in EasyEdit, we provide a unified framework for pretraining the relevant network structures. Take the training SERAC for example:
+### 💻 Run
-- **Step 1** and **Step 2** are the same as the example above, which involves selecting the appropriate editing model and editing method.
+Before you begin running the program, ensure that the necessary files are present and properly set up, specifically the directories **./data, ./hparams,** and **./hugging_cache**.
-**Step3: Provide the edit training set**
-The currently supported and available datasets are: `Caption` and `VQA`([Google Drive](https://drive.google.com/drive/folders/1jBdTJxUb9wEeHnvG-RY8dv5_I4QlDpUS?usp=drive_link)). Please place them in the "data" directory and initialize the dataset_class (`CaptionDataset` for Caption and `VQADataset` for VQA) to load the corresponding training set.
-```python
-train_ds = CaptionDataset('data/caption_train_edit.json', config=training_hparams)
-eval_ds = CaptionDataset('data/caption_eval_edit.json', config=training_hparams)
+STEP 1 :
+```shell
+python run_concept_editing.py --editing_method=FT --edited_model mistral --hparams_dir=./hparams/FT/mistral-7b --inter
```
-**Step4: Combine them into a `Trainer`**
+> Additional shell script examples for configuring experiments are available in the command.sh file.
-```python
-trainer = MultimodalTrainer(
- config=hparams,
- train_set=train_ds,
- val_set=eval_ds
-)
+STEP 2 :
+```shell
+python calculate.py --method FT --model mistral --module inter
```
-**Step5: Run and Edit**
-Done! We can conduct Run and Evaluation.
+> When you have the middle result output file in "./final_result_upload/", you can execute the step2 to obtain the results in main table.
-```python
-trainer.run()
-```
+STEP 3 (OPTIONAL) :
-- Run: The `CHECKPOINT` will be saved to the path `results_dir`.
-- Edit: Set the `archive` field in the **hparams file** to `CHECKPOINT`. EasyEdit will automatically load the corresponding pre-trained weights during the editing process([Go to edit](#use-easyedit)).
-**Training Example**
-```python
-hparams = SERACMultimodalTrainingHparams.from_hparams('hparams/TRAINING/SERAC/minigpt4.yaml')
-train_ds = CaptionDataset('data/caption_train_edit.json', config=training_hparams)
-eval_ds = CaptionDataset('data/caption_eval_edit.json', config=training_hparams)
-trainer = MultimodalTrainer(
- config=hparams,
- train_set=train_ds,
- val_set=eval_ds
-)
+Given that the generation task in LLMs can be time-consuming, if you wish to perform `Concept Consistency`, follow these instructions:
-trainer.run()
+1. uncomment line 113 in the `run_concept_editing.py` file: `concept_consistency = True`
+2. Should you require generation of descriptions before editing, you need to modify line 184 in `easyeditor/editors/concept_editor.py`: `test_concept_consistency=concept_consistency`
+3. With these adjustments, proceed to re-execute STEP 1.
+4. To convert the generated sentences into a **Json** file for evaluating with GPT-4, execute the following command:
+```shell
+python examples/transform_check.py --method FT --model mistral --module inter
```
- TO DO
-In next version, we plan to:
-
-- Explore and integrate more robust editing methods, focusing on `locality` and `portability` metrics.
-- Provide a comprehensive evaluation suite for editing methods, including fact modification, fact erasure and hallucination erasure.
-- Provide a causal analysis component for analyzing knowledge storage mechanisms.
-- knowledge editing for other tasks(except factual editing), like `personality editing`, etc.
-
-Meanwhile, we will offer long-term maintenance to fix bugs, solve issues and meet new requests. So if you have any problems, please put issues to us.
-
-
-
-# Use EasyEdit with KnowEdit
-## Dataset
-
-KnowEdit is a benchmark dataset of knowledge editing for LLMs. You can easily obtain KnowEdit from HuggingFace, HuggingFace, and ModelScope.
+
-| **dataset** | HuggingFace| HuggingFace | ModelScope |
-| :--------: | :-----------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------: | :--------------------------------------------------------------------------------: |
-| KnowEdit | [[HuggingFace]](https://huggingface.co/datasets/zjunlp/KnowEdit) | [[WiseModel]](https://wisemodel.cn/datasets/zjunlp/KnowEdit) | [[ModelScope]](https://www.modelscope.cn/datasets/zjunlp/KnowEdit) |
+## 📖 Citation
-## Usage
-
-We provide detailed scripts for user to easily use KnowEdit, please refer to [examples](https://github.com/zjunlp/EasyEdit/blob/main/examples/KnowEdit.md).
-
-# Editing Performance
-
-We present editing results of the four metrics on [LlaMA-2-7B](https://huggingface.co/meta-llama/Llama-2-7b-hf) using EasyEdit. We adopt [ZsRE](https://drive.google.com/file/d/1WRo2SqqgNtZF11Vq0sF5nL_-bHi18Wi4/view?usp=sharing) as the test dataset.
-
-> ❗️❗️Editing `llama-2-7B` requires 40G+ VRAM on GPU. (OOM [solution](https://github.com/zjunlp/EasyEdit/issues/9#issuecomment-1687284658))
-
-| | Reliability | Generalization | Locality | Portability |
-| :---: | :---------: | :------------: | :--------: | :---------: |
-| FT | 56.94 | 52.02 | 96.32 | 0.07 |
-| SERAC | 99.49 | 99.13 | **100.00** | 0.13 |
-| IKE | **100.00** | **99.98** | 69.19 | **67.56** |
-| MEND | 94.24 | 90.27 | 97.04 | 0.14 |
-| KN | 28.95 | 28.43 | 65.43 | 0.07 |
-| ROME | 92.45 | 87.04 | 99.63 | 10.46 |
-| MEMIT | 92.94 | 85.97 | 99.49 | 6.03 |
-
-
-
-We also present editing results of KnowEdit on [LlaMA-2-7B](https://huggingface.co/meta-llama/Llama-2-7b-hf) using EasyEdit.
-
-| DataSet | Metric | SERAC | ICE | AdaLoRA | MEND | ROME | MEMIT | FT-L | FT |
-|--------------------------|---------------|--------|--------|---------|--------|--------|--------|--------|--------|
-| **WikiData_recent** | | | | | | | | | |
-| | Edit Succ. | 98.68 | 60.74 | 65.61 | 76.88 | 85.08 | 85.32 | 71.18 | 31.24 |
-| | Portability | 63.52 | 36.93 | 47.22 | 50.11 | 37.45 | 37.94 | 48.71 | 15.91 |
-| | Locality | 100.00 | 33.34 | 55.78 | 92.87 | 66.2 | 64.78 | 63.7 | 3.65 |
-| | Fluency | 553.19 | 531.01 | 537.51 | 586.34 | 574.28 | 566.66 | 549.35 | 428.67 |
-| **ZsRE** | | | | | | | | | |
-| | Edit Succ. | 99.67 | 66.01 | 69.86 | 96.74 | 96.57 | 83.07 | 54.65 | 36.88 |
-| | Portability | 56.48 | 63.94 | 52.95 | 60.41 | 52.20 | 51.43 | 45.02 | 8.72 |
-| | Locality | 30.23 | 23.14 | 72.21 | 92.79 | 27.14 | 25.46 | 71.12 | 0.31 |
-| | Fluency | 410.89 | 541.14 | 532.82 | 524.33 | 570.47 | 559.72 | 474.18 | 471.29 |
-| **WikiBio** | | | | | | | | | |
-| | Edit Succ. | 99.69 | 95.53 | 97.02 | 93.66 | 95.05 | 94.29 | 66.27 | 95.64 |
-| | Locality | 69.79 | 47.90 | 57.87 | 69.51 | 46.96 | 51.56 | 60.14 | 13.38 |
-| | Fluency | 606.95 | 632.92 | 615.86 | 609.39 | 617.25 | 616.65 | 604.00 | 589.22 |
-| **WikiData_counterfact** | | | | | | | | | |
-| | Edit Succ. | 99.99 | 69.83 | 72.14 | 78.82 | 83.21 | 83.41 | 51.12 | 26.78 |
-| | Portability | 76.07 | 45.32 | 55.17 | 57.53 | 38.69 | 40.09 | 39.07 | 16.94 |
-| | Locality | 98.96 | 32.38 | 66.78 | 94.16 | 65.4 | 63.68 | 62.51 | 0.29 |
-| | Fluency | 549.91 | 547.22 | 553.85 | 588.94 | 578.84 | 568.58 | 544.80 | 483.71 |
-| **ConvSent** | | | | | | | | | |
-| | Edit Succ. | 62.75 | 52.78 | 44.89 | 50.76 | 45.79 | 44.75 | 49.50 | 61.93 |
-| | Locality | 0.26 | 49.73 | 0.18 | 3.42 | 0.00 | 0.00 | 0.00 | 0.00 |
-| | Fluency | 458.21 | 621.45 | 606.42 | 379.43 | 606.32 | 602.62 | 607.86 | 546.24 |
-| **Sanitation** | | | | | | | | | |
-| | Edit Succ. | 0.00 | 72.50 | 2.50 | 0.00 | 85.00 | 48.75 | 0.00 | 60.00 |
-| | Locality | 100.00 | 56.58 | 65.50 | 5.29 | 50.31 | 67.47 | 14.78 | 42.61 |
-| | Fluency | 416.29 | 794.15 | 330.44 | 407.18 | 465.12 | 466.10 | 439.10 | 351.39 |
-
-## Citation
-
-Please cite our paper if you use EasyEdit in your work.
+Please cite our paper if you use **ConceptEdit** in your work.
```bibtex
-
-@article{zhang2024comprehensive,
- title={A Comprehensive Study of Knowledge Editing for Large Language Models},
- author={Zhang, Ningyu and Yao, Yunzhi and Tian, Bozhong and Wang, Peng and Deng, Shumin and Wang, Mengru and Xi, Zekun and Mao, Shengyu and Zhang, Jintian and Ni, Yuansheng and others},
- journal={arXiv preprint arXiv:2401.01286},
- year={2024}
-}
-
-@article{wang2023easyedit,
- title={Easyedit: An easy-to-use knowledge editing framework for large language models},
- author={Wang, Peng and Zhang, Ningyu and Xie, Xin and Yao, Yunzhi and Tian, Bozhong and Wang, Mengru and Xi, Zekun and Cheng, Siyuan and Liu, Kangwei and Zheng, Guozhou and others},
- journal={arXiv preprint arXiv:2308.07269},
- year={2023}
-}
-
-@article{yao2023editing,
- title={Editing Large Language Models: Problems, Methods, and Opportunities},
- author={Yao, Yunzhi and Wang, Peng and Tian, Bozhong and Cheng, Siyuan and Li, Zhoubo and Deng, Shumin and Chen, Huajun and Zhang, Ningyu},
- journal={arXiv preprint arXiv:2305.13172},
- year={2023}
-}
-
-@article{cheng2023edit,
- title={Can We Edit Multimodal Large Language Models?},
- author={Cheng, Siyuan and Tian, Bozhong and Liu, Qingbin and Chen, Xi and Wang, Yongheng and Chen, Huajun and Zhang, Ningyu},
- journal={arXiv preprint arXiv:2310.08475},
- year={2023}
-}
-
-@article{mao2023editing,
- title={Editing personality for llms},
- author={Mao, Shengyu and Zhang, Ningyu and Wang, Xiaohan and Wang, Mengru and Yao, Yunzhi and Jiang, Yong and Xie, Pengjun and Huang, Fei and Chen, Huajun},
- journal={arXiv preprint arXiv:2310.02168},
- year={2023}
-}
-
-@misc{knowlm,
- author = {Ningyu Zhang and Jintian Zhang and Xiaohan Wang and Honghao Gui and Kangwei Liu and Yinuo Jiang and Xiang Chen and Shengyu Mao and Shuofei Qiao and Yuqi Zhu and Zhen Bi and Jing Chen and Xiaozhuan Liang and Yixin Ou and Runnan Fang and Zekun Xi and Xin Xu and Lei Li and Peng Wang and Mengru Wang and Yunzhi Yao and Bozhong Tian and Yin Fang and Guozhou Zheng and Huajun Chen},
- title = {KnowLM Technical Report},
- year = {2023},
- url = {http://knowlm.zjukg.cn/},
+@misc{wang2024editing,
+ title={Editing Conceptual Knowledge for Large Language Models},
+ author={Xiaohan Wang and Shengyu Mao and Ningyu Zhang and Shumin Deng and Yunzhi Yao and Yue Shen and Lei Liang and Jinjie Gu and Huajun Chen},
+ year={2024},
+ eprint={2403.06259},
+ archivePrefix={arXiv},
+ primaryClass={cs.CL}
}
```
-## 🎉Contributors
-
-
-
-
-
-We thank all the contributors to this project, more contributors are welcome!
-
-#### Other Related Projects
+## 🎉 Acknowledgement
-- [ROME](https://github.com/kmeng01/rome)
-- [FastEdit](https://github.com/hiyouga/FastEdit)
-- [GRACE](https://github.com/Thartvigsen/GRACE)
-- [MELO](https://github.com/ECNU-ICALK/MELO)
-- [PMET](https://github.com/xpq-tech/PMET)
-- [PitfallsKnowledgeEditing](https://github.com/zjunlp/PitfallsKnowledgeEditing)
-- [EditBias](https://github.com/zjunlp/EditBias)
-- [WikiLLM](https://github.com/laramohan/wikillm)
+We would like to express our sincere gratitude to [DBpedia](https://www.dbpedia.org/resources/ontology/),[Wikidata](https://www.wikidata.org/wiki/Wikidata:Introduction),[OntoProbe-PLMs](https://github.com/vickywu1022/OntoProbe-PLMs) and [ROME](https://github.com/kmeng01/rome).
-🙌 We would like to express our heartfelt gratitude for the contribution of [FastEdit](https://github.com/hiyouga/FastEdit), [ROME](https://github.com/kmeng01/rome), [GRACE](https://github.com/Thartvigsen/GRACE), [MELO](https://github.com/ECNU-ICALK/MELO), [PMET](https://github.com/xpq-tech/PMET) to our project, as we have utilized portions of their source code in our project. Many thanks to all the colleagues in the community for submitting issues and providing technical support.
+Their contributions are invaluable to the advancement of our work.
diff --git a/README_MainBranch.md b/README_MainBranch.md
new file mode 100644
index 00000000..fc30e0bc
--- /dev/null
+++ b/README_MainBranch.md
@@ -0,0 +1,1069 @@
+
+
+

+
+**An Easy-to-use Knowledge Editing Framework for Large Language Models.**
+
+
+[](https://opensource.org/licenses/MIT)
+
+
+
+---
+
+
+ Overview •
+ Installation •
+ How To Use •
+ Docs •
+ Paper •
+ Benchmark •
+ Contributors •
+ Slides •
+ Video •
+ Featured By AK
+
+
+
+## Table of Contents
+
+- [Table of Contents](#table-of-contents)
+- [🔔News](#news)
+- [Editing Demo](#editing-demo)
+- [Knowledge Editing](#knowledge-editing)
+ - [Task Definition](#task-definition)
+ - [Knowledge insert](#knowledge-insert)
+ - [Knowledge update](#knowledge-update)
+ - [Knowledge erase](#knowledge-erase)
+ - [Evaluation](#evaluation)
+- [🌟Overview](#overview)
+ - [Current Implementation](#current-implementation)
+ - [Tutorial notebook](#tutorial-notebook)
+- [Requirements](#requirements)
+ - [🔧Pip Installation](#pip-installation)
+ - [🐳Docker Installation](#docker-installation)
+ - [Editing GPU memory usage](#editing-gpu-memory-usage)
+- [📌Use EasyEdit](#use-easyedit)
+ - [BaseEditor](#baseeditor)
+ - [Introduction by a Simple Example](#introduction-by-a-simple-example)
+ - [Evaluation](#evaluation-1)
+ - [Trainer](#trainer)
+ - [MultimodalEditor](#multimodaleditor)
+ - [Introduction by a Simple Example](#introduction-by-a-simple-example-1)
+ - [Evaluation](#evaluation-2)
+ - [Trainer](#trainer-1)
+- [Use EasyEdit with KnowEdit](#Use-easyedit-with-KnowEdit)
+ - [Dataset](#Dataset)
+ - [Usage](#usage)
+- [Editing Performance](#editing-performance)
+- [Citation](#citation)
+- [🎉Contributors](#contributors)
+ - [Other Related Projects](#other-related-projects)
+
+## 🔔News
+- **2024-02-20 The AAAI2024 tutorial "*Knowledge Editing for Large Language Models*" has been canceled since speakers cannot present in person, we make this ppt[[Github](https://github.com/zjunlp/KnowledgeEditingPapers/blob/main/AAAI2024%40Tutorial_Knowledge%20Editing%20for%20LLMs.pdf)] [[Google Drive](https://drive.google.com/file/d/1fkTbVeRJSWmU7fBDeNf1OhHEkLSofQde/view?usp=sharing)] [[Baidu Pan](https://pan.baidu.com/s/1oJYgaMnxWIBE4kIcJuMSKg?pwd=p9j5)] available to the community**.
+- **2024-02-09 The EasyEdit has supported the Dynamic LoRA model editing method [MELO'AAAI24](https://arxiv.org/abs/2312.11795).**
+- **2024-02-06 We release a new paper: "[EasyInstruct: An Easy-to-use Instruction Processing Framework for Large Language Models](https://arxiv.org/abs/2402.03049)" with an HF demo [EasyInstruct](https://huggingface.co/spaces/zjunlp/EasyInstruct).**
+- **2024-02-06 We release a preliminary tool [EasyDetect](https://github.com/OpenKG-ORG/EasyDetect) for LLM hallucination detection,with a [demo](http://easydetect.openkg.cn/)**.
+- **2024-01-24 The EasyEdit has supported editing [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) (manually update transformers==4.34.0), we have also fixed some bugs in evaluating MEND (slightly influence the performance).**
+- **2024-01-16 The EasyEdit has supported the precise model editing method [PMET'AAAI24](https://arxiv.org/abs/2308.08742).**
+- **2024-01-03 We release a new paper:"[A Comprehensive Study of Knowledge Editing for Large Language Models](https://arxiv.org/abs/2401.01286)" with a new benchmark [KnowEdit](https://huggingface.co/datasets/zjunlp/KnowEdit)! We are looking forward to any comments or discussions on this topic :)**
+- **2023-12-06 The EasyEdit has supported the lifelong model editing method [GRACE'NeurIPS24](https://arxiv.org/abs/2211.11031).**
+- **2023-11-18 Our tutorial "Knowledge Editing for Large Language Models" has been accepted by COLING 2024.**
+- **2023-10-25 Our tutorial "Knowledge Editing for Large Language Models" has been accepted by AAAI 2024.**
+
+
+Previous News
+
+- **2023-10-24 The EasyEdit has supported efficient editing of [Baichuan2](https://github.com/baichuan-inc/Baichuan2), [ChatGLM2](https://github.com/THUDM/ChatGLM2-6B), [InternLM](https://github.com/InternLM/InternLM), [Qwen](https://github.com/QwenLM/Qwen) and fixed several bugs for a better user experience.**
+- **2023-10-14 We release the [MultimodalEditor](#multimodaleditor) based on the paper "[Can We Edit Multimodal Large Language Models?](https://arxiv.org/abs/2310.08475)".**
+- **2023-10-13 We release the paper "[Can We Edit Multimodal Large Language Models?](https://arxiv.org/abs/2310.08475)" accepted by EMNLP 2023.**
+- **2023-10-08 Our paper "[Editing Large Language Models: Problems, Methods, and Opportunities](https://arxiv.org/abs/2305.13172)" has been accepted by EMNLP 2023.**
+- **2023-10-07 The EasyEdit have supported editing models with multiple GPUs, using huggingface [`Accelerate`](https://github.com/zjunlp/EasyEdit/blob/main/hparams/ROME/llama-7b.yaml#L24).**
+- **2023-9-21 The EasyEdit have supported Parameter-Efficient Fine-Tuning through AdaLoRA to inject knowledge into the LLM.**
+- **2023-8-31 The EasyEdit have supported official fine-tuning API for gpt-3.5-turbo to customize ChatGPT for your editing cases.**
+- **2023-8-15 We release the paper "[EasyEdit: An Easy-to-use Knowledge Editing Framework for Large Language Models](https://arxiv.org/abs/2308.07269)."**
+- **2023-7-12 We release version 0.0.1, supporting several knowledge editing techniques for LLMs. EasyEdit helps to better align LLMs with changing needs and values of users.**
+- **2023-5-22 We release the paper "[Editing Large Language Models: Problems, Methods, and Opportunities](https://arxiv.org/abs/2305.13172)" and provide a paper list at [PaperList](https://github.com/zjunlp/KnowledgeEditingPapers).**
+- **2023-3-25 The EasyEdit project has been launched and is under development.**
+
+This repository is a subproject of [KnowLM](https://github.com/zjunlp/KnowLM).
+
+
+
+
+---
+
+> A Comprehensive Study of Knowledge Editing for Large Language Models [[paper](https://arxiv.org/abs/2401.01286)][[benchmark](https://huggingface.co/datasets/zjunlp/KnowEdit)][[code](https://github.com/zjunlp/EasyEdit)]
+
+> AAAI 2024 Tutorial [[Google Drive]()] [[Baidu Pan]()]
+
+> AACL 2023 Tutorial [[Google Drive](https://drive.google.com/file/d/1EW-cusC_llCM0wEshkIdYuYrvfBPCDRz/view?usp=sharing)] [[Baidu Pan](https://pan.baidu.com/s/1NupastGJUzcUIAjI64J1tw?pwd=i5an)]
+
+
+## Editing Demo
+
+There is a demonstration of editing. The GIF file is created by [Terminalizer](https://github.com/faressoft/terminalizer).
+
+
+
+## Knowledge Editing
+
+
+

+
+
+### Task Definition
+
+Deployed models may still make unpredictable errors. For example, Large Language Models (LLMs) notoriously _hallucinate_, _perpetuate bias_, and _factually decay_, so we should be able to adjust specific behaviors of pre-trained models.
+
+**Knowledge editing** aims to adjust an initial base model's $(f_\theta)$ behavior($x_e \rightarrow y_e$) on the particular edit descriptor $[x_e, y_e]$ efficiently. There are usually three forms:
+
+#### Knowledge insert
+Inject knowledge that LLMs have not seen before. such as:
+- *How many times has Messi won the World Cup? 0* $\rightarrow$ **1**:
+ - $x_e$: How many times has Messi won the World Cup? $\quad$ $y_e$: 1
+
+#### Knowledge update
+LLMs often suffer from knowledge cutoff issue, EasyEdit can update outdated knowledge. such as:
+- *The president of USA: Donald Trump* $\rightarrow$ **Joe Biden**:
+ - $x_e$: Who is the president of the US? $\quad$ $y_e$: Joe Biden
+
+#### Knowledge erase
+EasyEdit can erase sensitive information. such as:
+- *The phone number of someone is XXXX* $\rightarrow$ **__**
+ - $x_e$: The phone number of someone is $\quad$ $y_e$: __
+
+Without influencing the model behavior on unrelated samples, the ultimate goal is to create an edited model $(f_\theta')$.
+
+### Evaluation
+
+
+
+The knowledge editing process generally impacts the predictions for a broad set of inputs **that are closely** associated with the edit example, called the **editing scope**.
+
+A successful edit should adjust the model’s behavior within the editing scope while remaining unrelated inputs(as below formula).
+
+$$
+f_{\theta_{e}}(x) = \begin{cases}
+y_e & \text{if } x \in I(x_e,y_e) \\
+f_{\theta}(x) & \text{if } x \in O(x_e, y_e) \end{cases}
+$$
+
+In addition to this, the performance of knowledge editing should be measured from multiple dimensions:
+
+- `Reliability`: the success rate of editing with a given editing description
+- `Generalization`: the success rate of editing **within** the editing scope
+- `Locality`: whether the model's output changes after editing for unrelated inputs
+- `Portability`: the success rate of editing for factual reasoning(one hop, synonym, one-to-one relation)
+- `Efficiency`: time and memory consumption required during the editing process
+
+## 🌟Overview
+
+EasyEdit is a Python package for edit Large Language Models (LLM) like `GPT-J`, `Llama`, `GPT-NEO`, `GPT2`, `T5`(support models from **1B** to **65B**), the objective of which is to alter the behavior of LLMs efficiently within a specific domain without negatively impacting performance across other inputs. It is designed to be easy to use and easy to extend.
+
+
+
+
+
+- EasyEdit contains a unified framework for **Editor**, **Method** and **Evaluate**, respectively representing the editing scenario, editing technique, and evaluation method.
+- Each Knowledge Editing scenario comprises of three components:
+
+ - `Editor`: such as BaseEditor(**Factual Knowledge** and **Generation** Editor) for LM, MultiModalEditor(**MultiModal Knowledge**).
+ - `Method`: the specific knowledge editing technique used(such as **ROME**, **MEND**, ..).
+ - `Evaluate`: **Metrics** for evaluating knowledge editing performance.
+ - `Reliability`, `Generalization`, `Locality`, `Portability`
+
+- The current supported knowledge editing techniques are as follows:
+ - [FT](https://github.com/kmeng01/rome): Fine-Tuning with $L_\infty$ constraint
+ - [SERAC](https://github.com/eric-mitchell/serac): Mitchell et al. Memory-based
+ - [IKE](https://github.com/Zce1112zslx/IKE): Ce Zheng et al. In-Context Editing
+
+ - [MEND](https://github.com/eric-mitchell/mend): Mitchell et al. Hypernetwork
+ - [KN](https://github.com/Hunter-DDM/knowledge-neurons): Damai Dai et al. Locate then Edit
+ - [ROME](https://github.com/kmeng01/rome): Kevin Meng et al. Locate and Edit
+ - [MEMIT](https://github.com/kmeng01/memit): Kevin Meng et al. Locate and Edit
+ - [GRACE](https://github.com/thartvigsen/grace): Thomas Hartvigsen et al. Memory-based
+ - [PMET](https://github.com/xpq-tech/PMET): Xiaopeng Li et al. Locate and Edit
+ > Due to the limited compatibility of this toolkit and limited by the transformer version, some knowledge editing methods including [T-Patcher](https://github.com/ZeroYuHuang/Transformer-Patcher), [KE](https://github.com/nicola-decao/KnowledgeEditor), [CaliNet](https://github.com/dqxiu/CaliNet)
+ are not supported.
+
+#### Current Implementation
+
+You can choose different editing methods according to your specific needs.
+| **Method** | T5 | GPT-2 | GPT-J | GPT-NEO | LlaMA | Baichuan | ChatGLM2 | InternLM | Qwen | Mistral
+| :-------: | :-------: | :-------: | :-------: | :-------: | :-------: | :-------: | :-------: | :-------: | :-------: | :-------: |
+| FT | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
+| AdaLoRA | | | | | ✅ | | | | | |
+| SERAC | ✅ | ✅ | ✅ | | ✅ | | | | | |
+| IKE | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |✅ | ✅ | ✅ | ✅ |
+| MEND | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
+| KN | ✅ | ✅ | ✅ | | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
+| ROME | | ✅ | ✅ | ✅ | ✅ | ✅ |✅ | ✅ | ✅ | ✅ |
+| MEMIT | | ✅ | ✅ | ✅ | ✅ | ✅ | ✅| ✅ | ✅ | ✅ |
+| GRACE | | ✅| ✅ | | ✅| | | | | |
+| MELO | |✅ | | | | | | | | |
+| PMET | | | ✅ | | ✅| | | | | |
+
+
+
+
+
+
+
+> ❗️❗️ EasyEdit supports editing ChatGPT with FT. An edit for `gpt-3.5-turbo` returns model_name(for example, `ft: GPT-3.5-turbo-0613 :personal::7tWZkLzq`) instead model weights.
+
+> ❗️❗️ If you intend to use Mistral, please update the `transformers` library to version 4.34.0 manually. You can use the following code: `pip install transformers==4.34.0`.
+
+> ❗️❗️ If you intend to use MELO, please get the in ./easyeditor/models/melo/peft_egg and pip install it in your environment.
+
+**Dataset**
+
+**Benchmark: KnowEdit** [[Hugging Face]](https://huggingface.co/datasets/zjunlp/KnowEdit)[[WiseModel]](https://wisemodel.cn/datasets/zjunlp/KnowEdit)[[ModelScope]](https://www.modelscope.cn/datasets/zjunlp/KnowEdit)
+
+
+
+ Task |
+ Knowledge Insertion |
+ Knowledge Modification |
+ Knowledge Erasure |
+
+
+
+
+ Datasets |
+ Wikirecent |
+ ZsRE |
+ WikiBio |
+ WikiDatacounterfact |
+ Convsent |
+ Sanitation |
+
+
+ Type |
+ Fact |
+ Question Answering |
+ Hallucination |
+ Counterfact |
+ Sentiment |
+ Unwanted Info |
+
+
+ # Train |
+ 570 |
+ 10,000 |
+ 592 |
+ 1,455 |
+ 14,390 |
+ 80 |
+
+
+ # Test |
+ 1,266 |
+ 1230 |
+ 1,392 |
+ 885 |
+ 800 |
+ 80 |
+
+
+
+
+We provide **detailed scripts** for user to easily use KnowEdit, please refer to [examples](https://github.com/zjunlp/EasyEdit/blob/main/examples/KnowEdit.md).
+
+ dataset description
+
+- ZsRE: is a context-free question-answering task. Given a question based on the subject and relation, the model is expected to provide the correct object as the answer.
+- Wikirecent: This dataset specifically focuses on triplets that have been recently inserted into WikiData after July 2022.
+- WikiBio: The original dataset was created by prompting GPT-3 to generate 238 Wikipedia-style biographies using subjects from the WikiBio.
+- WikiDatacounterfact: Since tail entities are often not captured by models, and therefore are not suitable for testing modification edits, RippleEdit collects triplets about popular entities, where the subject corresponds to one of the top-viewed pages in Wikipedia.
+- Convsent: This is a sentiment editing task that assesses the model's ability to modify a dialog agent's sentiment on a specific topic without affecting its responses to other topics.
+- Sanitation: This dataset specifically addresses privacy concerns associated with learned language models.
+
+
+
+ dataset structure
+
+```text
+knowedit
+├── WikiBio
+│ ├── wikibio-test-all.json
+│ └── wikibio-train-all.json
+├── ZsRE
+│ └── ZsRE-test-all.json
+├── wiki_counterfact
+│ ├── test_cf.json
+│ └── train_cf.json
+├── convsent
+│ ├── blender_test.json
+│ ├── blender_train.json
+│ └── blender_val.json
+├── convsent
+│ ├── trivia_qa_test.json
+│ └── trivia_qa_train.json
+└── wiki_recent
+ ├── recent_test.json
+ └── recent_train.json
+```
+
+
+
+---
+
+**Datasets for Factual Knowledge**
+| **dataset** | Google Drive| BaiduNetDisk | Description |
+| :--------: | :-----------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------: | :--------------------------------------------------------------------------------: |
+| _ZsRE_ plus | [[Google Drive]](https://drive.google.com/file/d/1WRo2SqqgNtZF11Vq0sF5nL_-bHi18Wi4/view?usp=sharing) | [[BaiduNetDisk]](https://pan.baidu.com/s/1cQleUMsNjuDk4BKx2bZkag?pwd=xzky) | Question Answering dataset using question rephrasings |
+| _Counterfact_ plus | [[Google Drive]](https://drive.google.com/file/d/1WRo2SqqgNtZF11Vq0sF5nL_-bHi18Wi4/view?usp=sharing) | [[BaiduNetDisk]](https://pan.baidu.com/s/1cQleUMsNjuDk4BKx2bZkag?pwd=xzky) | Counterfact dataset using Entity replacement |
+
+
+We provide zsre and counterfact datasets to verify the effectiveness of knowledge editing. You can download them here. [[Google Drive]](https://drive.google.com/file/d/1WRo2SqqgNtZF11Vq0sF5nL_-bHi18Wi4/view?usp=sharing), [[BaiduNetDisk]](https://pan.baidu.com/s/1cQleUMsNjuDk4BKx2bZkag?pwd=xzky).
+
+- For **locality**, in addition to testing unrelated instances, we also provide tests on distracting ([reference: Detecting Edit Failures...](https://arxiv.org/abs/2305.17553)), other attribution, and other downstream tasks (such as commonsense reasoning).
+- For **portability**, it tests whether the model can apply edited instances for inference. We provide evaluations for one-hop reasoning, subject alias, and inverse relation (eg, a one-to-one relationship between spouses should be bidirectionally edited).
+
+ dataset description
+
+```text
+editing-data
+├── counterfact
+│ ├── counterfact-edit.json
+│ ├── counterfact-train.json
+│ └── counterfact-val.json
+├── locality
+│ ├── Commonsense Task
+│ │ ├── piqa_valid-labels.lst
+│ │ └── piqa_valid.jsonl
+│ ├── Distracting Neighbor
+│ │ └── counterfact_distracting_neighbor.json
+│ └── Other Attribution
+│ └── counterfact_other_attribution.json
+├── portability
+│ ├── Inverse Relation
+│ │ └── zsre_inverse_relation.json
+│ ├── One Hop
+│ │ ├── counterfact_portability_gpt4.json
+│ │ └── zsre_mend_eval_portability_gpt4.json
+│ └── Subject Replace
+│ ├── counterfact_subject_replace.json
+│ └── zsre_subject_replace.json
+└── zsre
+ ├── zsre_mend_eval.json
+ ├── zsre_mend_train_10000.json
+ └── zsre_mend_train.json
+```
+
+- counterfact: original counterfact dataset using Entity replacement
+- zsre: original question answering dataset using question rephrasings
+- locality (evaluation for locality, see details in this [paper](https://arxiv.org/abs/2305.13172))
+ - Commonsense Task: evaluation for other downstream tasks such as commonsense task
+ - Distracting Neighbor: test on distracting neighborhood ([reference: Detecting Edit Failures...](https://arxiv.org/abs/2305.17553))
+ - Other Attribution
+- portability
+ - Inverse Relation: evaluation for one-to-one relationship such as `spouse`
+ - One Hop: evaluation for one-hop reasoning
+ - Subject Replace: evaluation for synonym replacement
+
+
+---
+
+**Datasets for Multimodal Knowledge**
+| **dataset** | Google Drive| BaiduNetDisk | Description |
+| :--------: | :-----------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------: | :--------------------------------------------------------------------------------: |
+| E-IC | [[Google Drive]](https://drive.google.com/drive/folders/1jBdTJxUb9wEeHnvG-RY8dv5_I4QlDpUS?usp=drive_link) | [[BaiduNetDisk]](https://pan.baidu.com/s/1g9nMv-5BJmztxYU-BWRdvg?pwd=ik5c) | dataset for editing _Image Captioning_ |
+| E-VQA | [[Google Drive]](https://drive.google.com/drive/folders/1jBdTJxUb9wEeHnvG-RY8dv5_I4QlDpUS?usp=drive_link) | [[BaiduNetDisk]](https://pan.baidu.com/s/1g9nMv-5BJmztxYU-BWRdvg?pwd=ik5c) | dataset for editing _Visual Question Answering_ |
+
+- All **images** used in **E-IC** and **E-VQA** are available for download at [Google Drive](https://drive.google.com/file/d/1fQzJBFkok5kFZT6QUuT-HCuYKk2Vb93O/view)
+- For **locality**, it is the same as factual editing in order to measure whether unrelated facts retain their outputs.
+- For **multimodal locality**, it assesses the impact of editing on the visual module, which is similar to regular **locality**.
+
+ dataset description
+
+```text
+editing-data
+├── caption
+│ ├── caption_train_edit.json
+│ └── caption_eval_edit.json
+├── locality
+│ ├── NQ dataset
+│ │ ├── train.json
+│ │ └── validation.json
+├── multimodal_locality
+│ ├── OK-VQA dataset
+│ │ ├── okvqa_loc.json
+└── vqa
+ ├── vqa_train.json
+ └── vqa_eval.json
+```
+- Multimodal locality (evaluation for multimodal locality, see dataset's details in this [paper](http://openaccess.thecvf.com/content\_CVPR\_2019/html/Marino\_OK-VQA\_A\_Visual\_Question\_Answering\_Benchmark\_Requiring\_External\_Knowledge\_CVPR\_2019\_paper.html))
+
+
+#### Tutorial notebook
+
+| **Method** | Description | GPT-2 | LlaMA |
+| :--------: | :----------------------------: | :---------------------------------------------------------------------------------------------------: | :----------------------------------------------------------------------------------------: |
+| _IKE_ | In-Context Learning (ICL) Edit | [[Colab-gpt2]](https://colab.research.google.com/drive/1m6Xg05XCs_WZKH0D9KJQqg9z0ZiDhEkL) | [[Colab-llama]](https://colab.research.google.com/drive/1m6Xg05XCs_WZKH0D9KJQqg9z0ZiDhEkL) |
+| _ROME_ | Locate-Then-Edit Neurons | [[Colab-gpt2]](https://colab.research.google.com/drive/1KkyWqyV3BjXCWfdrrgbR-QS3AAokVZbr?usp=sharing) | [[Colab-llama]](https://colab.research.google.com/drive/1W18GPlBCV9K6lDy7eX8V5W0knTLr5r0A) |
+| _MEMIT_ | Locate-Then-Edit Neurons | [[Colab-gpt2]](https://colab.research.google.com/drive/1P1lVklP8bTyh8uxxSuHnHwB91i-1LW6Z) | [[Colab-llama]](https://colab.research.google.com/drive/19fKCKtVBU2fqj6eTvDokGoTrxvXkEPPq) |
+
+
+
+## Requirements
+
+#### 🔧Pip Installation
+
+**Note: Please use Python 3.9+ for EasyEdit**
+To get started, simply install conda and run:
+
+```shell
+git clone https://github.com/zjunlp/EasyEdit.git
+conda create -n EasyEdit python=3.9.7
+...
+pip install -r requirements.txt
+```
+
+#### 🐳Docker Installation
+
+We packaged the environment, you can download Docker from [this link](https://docs.docker.com/get-docker/).
+
+Pull the Docker image from Docker Hub or Aliyun:
+
+```bash
+docker pull zjunlp/easyedit
+```
+
+```bash
+docker pull registry.cn-hangzhou.aliyuncs.com/zjunlp/easyedit:v1
+```
+
+If you want to build the Docker image locally, you can clone the project to your local machine and build the Docker image:
+
+```bash
+git clone https://github.com/zjunlp/EasyEdit.git
+cd EasyEdit
+docker build -t your-image-name .
+```
+
+Then run the Docker image as a container:
+
+```bash
+docker run -p 8080:80 your-image-name
+```
+#### Editing GPU memory usage
+Our results are all based on the default configuration
+
+| | llama-2-7B | chatglm2 | gpt-j-6b | gpt-xl |
+|:-------:|:----------:|:--------:|:--------:|:------:|
+| FT | 60GB | 58GB | 55GB | 7GB |
+| SERAC | 42GB | 32GB | 31GB | 10GB |
+| IKE | 52GB | 38GB | 38GB | 10GB |
+| MEND | 46GB | 37GB | 37GB | 13GB |
+| KN | 42GB | 39GB | 40GB | 12GB |
+| ROME | 31GB | 29GB | 27GB | 10GB |
+| MEMIT | 33GB | 31GB | 31GB | 11GB |
+| AdaLoRA | 29GB | 24GB | 25GB | 8GB |
+| GRACE | 27GB | | 23GB | 6GB |
+
+## 📌Use EasyEdit
+
+- Edit large language models(LLMs) around **_5 seconds_**
+
+- Following example shows you how to perform editing with EasyEdit. More examples and tutorials can be found at [examples](https://github.com/zjunlp/EasyEdit/tree/main/examples)
+
+### BaseEditor
+
+> `BaseEditor`is the class for Language Modality Knowledge Editing. You can choose the appropriate editing method based on your specific needs.
+
+- Due to different transformer versions and different GPU models, the editing results may fluctuate **slightly**.
+
+#### Introduction by a Simple Example
+
+With the modularity and flexibility of `EasyEdit`, you can easily use it to edit model.
+
+**Step1: Define a PLM as the object to be edited.**
+Choose the PLM to be edited. `EasyEdit` supports partial models(`T5`, `GPTJ`, `GPT-NEO`, `LlaMA` so far) retrievable on [HuggingFace](https://huggingface.co/). The corresponding configuration file directory is `hparams/YUOR_METHOD/YOUR_MODEL.YAML`, such as `hparams/MEND/gpt2-xl.yaml`, set the corresponding `model_name` to select the object for knowledge editing.
+
+```yaml
+model_name: gpt2-xl
+model_class: GPT2LMHeadModel
+tokenizer_class: GPT2Tokenizer
+tokenizer_name: gpt2-xl
+model_parallel: false # true for multi-GPU editing
+```
+
+**Step2: Choose the appropriate Knowledge Editing Method**
+The selection of editing methods is a **crucial** step, as different methods have their own strengths and weaknesses. Users need to consider the trade-off between editing success rate, generalization, and maintaining unrelated performance. For specific performance details of each method, please refer to the paper: [Editing Large Language Models: Problems, Methods, and Opportunities](https://arxiv.org/abs/2305.13172).
+
+```python
+## In this case, we use MEND method, so you should import `MENDHyperParams`
+from easyeditor import MENDHyperParams
+## Loading config from hparams/MEMIT/gpt2-xl.yaml
+hparams = MENDHyperParams.from_hparams('./hparams/MEND/gpt2-xl')
+```
+
+**Step3: Provide the edit descriptor and edit target**
+
+```python
+## edit descriptor: prompt that you want to edit
+prompts = [
+ 'What university did Watts Humphrey attend?',
+ 'Which family does Ramalinaceae belong to',
+ 'What role does Denny Herzig play in football?'
+]
+## You can set `ground_truth` to None !!!(or set to original output)
+ground_truth = ['Illinois Institute of Technology', 'Lecanorales', 'defender']
+## edit target: expected output
+target_new = ['University of Michigan', 'Lamiinae', 'winger']
+```
+
+**Step4: Combine them into a `BaseEditor`**
+`EasyEdit` provides a simple and unified way to init Editor, like huggingface: **from_hparams**.
+
+```python
+## Construct Language Model Editor
+editor = BaseEditor.from_hparams(hparams)
+```
+
+**Step5: Provide the data for evaluation**
+Note that the data for portability and locality are both **optional**(set to None for basic editing success rate evaluation only). The data format for both is a **dict**, for each measurement dimension, you need to provide the corresponding prompt and its corresponding ground truth. Here is an example of the data:
+
+```python
+locality_inputs = {
+ 'neighborhood':{
+ 'prompt': ['Joseph Fischhof, the', 'Larry Bird is a professional', 'In Forssa, they understand'],
+ 'ground_truth': ['piano', 'basketball', 'Finnish']
+ },
+ 'distracting': {
+ 'prompt': ['Ray Charles, the violin Hauschka plays the instrument', 'Grant Hill is a professional soccer Magic Johnson is a professional', 'The law in Ikaalinen declares the language Swedish In Loviisa, the language spoken is'],
+ 'ground_truth': ['piano', 'basketball', 'Finnish']
+ }
+}
+```
+
+In the above example, we evaluate the performance of the editing methods about "neighborhood" and "distracting".
+
+**Step6: Edit and Evaluation**
+Done! We can conduct Edit and Evaluation for your model to be edited. The `edit` function will return a series of metrics related to the editing process as well as the modified model weights.
+
+```python
+metrics, edited_model, _ = editor.edit(
+ prompts=prompts,
+ ground_truth=ground_truth,
+ target_new=target_new,
+ locality_inputs=locality_inputs,
+ keep_original_weight=False
+)
+## metrics: edit success, rephrase success, locality e.g.
+## edited_model: post-edit model
+```
+
+### Evaluation
+
+We specify the return metrics as `dict` format, including model prediction evaluations before and after editing. For each edit, it will include the following metrics:
+
+- `rewrite_acc` $\rightarrow$ **Reliablilty**
+- `rephrase_acc` $\rightarrow$ **Generalization**
+- `locality` $\rightarrow$ **Locality**
+- `portablility` $\rightarrow$ **Portablility**
+
+```json
+{
+ "post": {
+ "rewrite_acc": ,
+ "rephrase_acc": ,
+ "locality": {
+ "YOUR_LOCALITY_KEY": ,
+ //...
+ },
+ "portablility": {
+ "YOUR_PORTABILITY_KEY": ,
+ //...
+ },
+ },
+ "pre": {
+ "rewrite_acc": ,
+ "rephrase_acc": ,
+ "portablility": {
+ "YOUR_PORTABILITY_KEY": ,
+ //...
+ },
+ }
+}
+```
+
+- For evaluation for Reliablilty, you only need to provide the corresponding editing `prompts` and editing `target_new`.
+- For evaluation for Generalization, `rephrase_prompts` are required.
+- For evaluation for Locality and Portablility, you need to define the name of the corresponding metric, as well as `prompts` and `ground_truth`.
+ - > Note: the length needs to be equal to the edit prompts
+
+### Trainer
+
+- meta-learning based: `MEND`
+- memory-based routing: `SERAC`
+
+For above editing methods, pre-training of corresponding meta-networks or classifiers is required. Therefore, in EasyEdit, we provide a unified framework for pretraining the relevant network structures. Take the training MEND for example:
+
+- **Step 1** and **Step 2** are the same as the example above, which involves selecting the appropriate editing model and editing method.
+
+**Step3: Provide the edit training set**
+The currently supported and available datasets are: `zsre` and `counterfact`([Google Drive](https://drive.google.com/file/d/1WRo2SqqgNtZF11Vq0sF5nL_-bHi18Wi4/view?usp=sharing)). Please place them in the "data" directory and initialize the dataset_class (`ZsreDataset` for zsre and `CounterFactDataset` for counterfact) to load the corresponding training set.
+
+```python
+train_ds = ZsreDataset('./data/zsre_mend_train.json', config=training_hparams)
+eval_ds = ZsreDataset('./data/zsre_mend_eval.json', config=training_hparams)
+```
+
+**Step4: Combine them into a `Trainer`**
+
+```python
+trainer = EditTrainer(
+ config=training_hparams,
+ train_set=train_ds,
+ val_set=eval_ds
+)
+```
+
+**Step5: Run and Edit**
+Done! We can conduct Run and Evaluation.
+
+```python
+trainer.run()
+```
+
+- Run: The `CHECKPOINT` will be saved to the path `results_dir`.
+- Edit: Set the `archive` field in the **hparams file** to `CHECKPOINT`. EasyEdit will automatically load the corresponding pre-trained weights during the editing process([Go to edit](#use-easyedit)).
+
+**Training Example**
+```python
+from easyeditor import EditTrainer, MENDTrainingHparams, ZsreDataset
+
+training_hparams = MENDTrainingHparams.from_hparams('hparams/TRAINING/MEND/llama-7b.yaml')
+train_ds = ZsreDataset('./data/zsre/zsre_mend_train.json', config=training_hparams)
+eval_ds = ZsreDataset('./data/zsre/zsre_mend_eval.json', config=training_hparams)
+trainer = EditTrainer(
+ config=training_hparams,
+ train_set=train_ds,
+ val_set=eval_ds
+)
+trainer.run()
+```
+
+
+
+
+
+### MultimodalEditor
+
+> `MultimodalEditor` is the class for Multi-Modality Editing. You can choose the appropriate editing method based on your specific needs.
+
+- Due to different transformer versions and different GPU models, the editing results may fluctuate **slightly**.
+
+**M-Generality Results**
+
+
+
+| *VQA* | KE | IKE | SERAC | MEND |
+| :---: | :---------: | :------------: | :--------: | :---------: |
+| MiniGPT-4 | 88.60 | 99.95 | 88.10 | 99.60 |
+| BLIP2 | 74.60 | 99.79 | 99.20 | 99.40 |
+
+| *Caption* | KE | IKE | SERAC | MEND |
+| :---: | :---------: | :------------: | :--------: | :---------: |
+| MiniGPT-4 | 13.60 | 91.00 | 91.47 | 93.35 |
+| BLIP2 | 1.60 | 96.55 | 99.72 | 93.48 |
+
+#### Introduction by a Simple Example
+
+With the modularity and flexibility of `EasyEdit`, you can easily use it to edit model.
+
+**Step1: Define a MLLM as the object to be edited.**
+Choose the MLLM to be edited. `EasyEdit` supports partial models(`MiniGPT-4`, `Blip2` so far) retrievable on [HuggingFace](https://huggingface.co/). The corresponding configuration file directory is `hparams/YUOR_METHOD/YOUR_MODEL.YAML`, such as `hparams/MEND/minigpt4.yaml`, set the corresponding `model_name` to select the object for editing.
+
+```python
+model_name: minigpt4
+model_class: Blip2OPT
+tokenizer_class: LlamaTokenizer
+tokenizer_name: llama-7b
+```
+
+**Step2: Choose the appropriate Editing Method**
+The selection of editing methods is a **crucial** step, as different methods have their own strengths and weaknesses. Users need to consider the trade-off between editing success rate, generalization, and maintaining unrelated performance.
+
+```python
+## In this case, we use MEND method, so you should import `MENDMultimodalHparams`
+from easyeditor import MENDMultimodalHparams
+## Loading config from hparams/MEMIT/gpt2-xl.yaml
+hparams = MENDMultimodalHparams.from_hparams('./hparams/MEND/minigpt4')
+```
+
+**Step3: Provide the edit descriptor and edit target**
+
+```python
+## edit descriptor: prompt that you want to edit
+prompts = [
+ "How many tennis balls are in the picture?",
+ "What is the red food?"
+]
+## edit target: expected output
+targets = ["2", "tomatoes",]
+## edit image: image for editing
+image = [
+ "val2014/COCO_val2014_000000451435.jpg",
+ "val2014/COCO_val2014_000000189446.jpg"
+]
+```
+
+**Step4: Combine them into a `MultimodalEditor`**
+`EasyEdit` provides a simple and unified way to init Editor, like huggingface: **from_hparams**.
+
+```python
+## Construct MLLM Editor
+editor = MultimodalEditor.from_hparams(hparams)
+```
+
+**Step5: Provide the data for evaluation**
+Note that the data for locality and multimodal locality are both **optional**(set to None for basic editing success rate evaluation only). The data format for both is a **dict**, for each measurement dimension, you need to provide the corresponding prompt and its corresponding ground truth. Here is an example of the data:
+
+```python
+locality_inputs = {
+ 'text': {
+ 'prompt': [
+ "nq question: what purpose did seasonal monsoon winds have on trade"
+ ],
+ 'ground_truth': [
+ "enabled European empire expansion into the Americas and trade \
+ routes to become established across the Atlantic and Pacific oceans"
+ ]
+ },
+ 'vision': {
+ 'prompt': ["What sport can you use this for?"],
+ 'ground_truth': ["riding"],
+ 'image': ["val2014/COCO_val2014_000000297147.jpg"],
+ }
+}
+```
+
+In the above example, we evaluate the performance of the editing methods about "neighborhood" and "distracting".
+
+**Step6: Edit and Evaluation**
+Done! We can conduct Edit and Evaluation for your model to be edited. The `edit` function will return a series of metrics related to the editing process as well as the modified model weights.
+
+```python
+metrics, edited_model, _ = editor.edit(
+ prompts=prompts,
+ target_new=target_new,
+ image=image,
+ locality_inputs=locality_inputs,
+ keep_original_weight=False
+)
+## metrics: edit success, rephrase success, locality e.g.
+## edited_model: post-edit model
+```
+
+### Evaluation
+
+We specify the return metrics as `dict` format, including model prediction evaluations before and after editing. For each edit, it will include the following metrics:
+
+- `rewrite_acc` $\rightarrow$ **Reliablilty**
+- `rephrase_acc` $\rightarrow$ **Generalization**
+- `image_rephrase_acc` $\rightarrow$ **Generalization for Multimodal**
+- `locality_acc` $\rightarrow$ **Locality**
+- `multimodal_locality_acc` $\rightarrow$ **Locality for Multimodal**
+
+```json
+{
+ "post": {
+ "rewrite_acc": ,
+ "rephrase_acc": ,
+ "image_rephrase_acc": ,
+ "locality_acc": ,
+ "multimodal_locality_acc": ,
+ },
+ "pre": {
+ "rewrite_acc": ,
+ "rephrase_acc": ,
+ "image_rephrase_acc": ,
+ }
+}
+```
+
+- For evaluation for Reliablilty, you only need to provide the corresponding editing `prompts` and editing `target_new`.
+- For evaluation for Generalization, `rephrase_prompts` are required.
+- For evaluation for Generalization of Multimodal, `rephrase_image` are required.
+- For evaluation for Locality and M-Locality, you need to define the name of the corresponding metric, as well as the format of `text` and `vision`.
+ - > Note: the length needs to be equal to the edit prompts
+
+### Trainer
+
+- meta-learning based: `MEND`
+- memory-based routing: `SERAC`
+
+For above editing methods, pre-training of corresponding meta-networks or classifiers is required. Therefore, in EasyEdit, we provide a unified framework for pretraining the relevant network structures. Take the training SERAC for example:
+
+- **Step 1** and **Step 2** are the same as the example above, which involves selecting the appropriate editing model and editing method.
+
+**Step3: Provide the edit training set**
+The currently supported and available datasets are: `Caption` and `VQA`([Google Drive](https://drive.google.com/drive/folders/1jBdTJxUb9wEeHnvG-RY8dv5_I4QlDpUS?usp=drive_link)). Please place them in the "data" directory and initialize the dataset_class (`CaptionDataset` for Caption and `VQADataset` for VQA) to load the corresponding training set.
+
+```python
+train_ds = CaptionDataset('data/caption_train_edit.json', config=training_hparams)
+eval_ds = CaptionDataset('data/caption_eval_edit.json', config=training_hparams)
+```
+
+**Step4: Combine them into a `Trainer`**
+
+```python
+trainer = MultimodalTrainer(
+ config=hparams,
+ train_set=train_ds,
+ val_set=eval_ds
+)
+```
+
+**Step5: Run and Edit**
+Done! We can conduct Run and Evaluation.
+
+```python
+trainer.run()
+```
+
+- Run: The `CHECKPOINT` will be saved to the path `results_dir`.
+- Edit: Set the `archive` field in the **hparams file** to `CHECKPOINT`. EasyEdit will automatically load the corresponding pre-trained weights during the editing process([Go to edit](#use-easyedit)).
+
+**Training Example**
+```python
+hparams = SERACMultimodalTrainingHparams.from_hparams('hparams/TRAINING/SERAC/minigpt4.yaml')
+train_ds = CaptionDataset('data/caption_train_edit.json', config=training_hparams)
+eval_ds = CaptionDataset('data/caption_eval_edit.json', config=training_hparams)
+trainer = MultimodalTrainer(
+ config=hparams,
+ train_set=train_ds,
+ val_set=eval_ds
+)
+
+trainer.run()
+```
+
+
+ TO DO
+In next version, we plan to:
+
+- Explore and integrate more robust editing methods, focusing on `locality` and `portability` metrics.
+- Provide a comprehensive evaluation suite for editing methods, including fact modification, fact erasure and hallucination erasure.
+- Provide a causal analysis component for analyzing knowledge storage mechanisms.
+- knowledge editing for other tasks(except factual editing), like `personality editing`, etc.
+
+Meanwhile, we will offer long-term maintenance to fix bugs, solve issues and meet new requests. So if you have any problems, please put issues to us.
+
+
+
+# Use EasyEdit with KnowEdit
+## Dataset
+
+KnowEdit is a benchmark dataset of knowledge editing for LLMs. You can easily obtain KnowEdit from HuggingFace, HuggingFace, and ModelScope.
+
+| **dataset** | HuggingFace| HuggingFace | ModelScope |
+| :--------: | :-----------------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------: | :--------------------------------------------------------------------------------: |
+| KnowEdit | [[HuggingFace]](https://huggingface.co/datasets/zjunlp/KnowEdit) | [[WiseModel]](https://wisemodel.cn/datasets/zjunlp/KnowEdit) | [[ModelScope]](https://www.modelscope.cn/datasets/zjunlp/KnowEdit) |
+
+
+## Usage
+
+We provide detailed scripts for user to easily use KnowEdit, please refer to [examples](https://github.com/zjunlp/EasyEdit/blob/main/examples/KnowEdit.md).
+
+# Editing Performance
+
+We present editing results of the four metrics on [LlaMA-2-7B](https://huggingface.co/meta-llama/Llama-2-7b-hf) using EasyEdit. We adopt [ZsRE](https://drive.google.com/file/d/1WRo2SqqgNtZF11Vq0sF5nL_-bHi18Wi4/view?usp=sharing) as the test dataset.
+
+> ❗️❗️Editing `llama-2-7B` requires 40G+ VRAM on GPU. (OOM [solution](https://github.com/zjunlp/EasyEdit/issues/9#issuecomment-1687284658))
+
+| | Reliability | Generalization | Locality | Portability |
+| :---: | :---------: | :------------: | :--------: | :---------: |
+| FT | 56.94 | 52.02 | 96.32 | 0.07 |
+| SERAC | 99.49 | 99.13 | **100.00** | 0.13 |
+| IKE | **100.00** | **99.98** | 69.19 | **67.56** |
+| MEND | 94.24 | 90.27 | 97.04 | 0.14 |
+| KN | 28.95 | 28.43 | 65.43 | 0.07 |
+| ROME | 92.45 | 87.04 | 99.63 | 10.46 |
+| MEMIT | 92.94 | 85.97 | 99.49 | 6.03 |
+
+
+
+We also present editing results of KnowEdit on [LlaMA-2-7B](https://huggingface.co/meta-llama/Llama-2-7b-hf) using EasyEdit.
+
+| DataSet | Metric | SERAC | ICE | AdaLoRA | MEND | ROME | MEMIT | FT-L | FT |
+|--------------------------|---------------|--------|--------|---------|--------|--------|--------|--------|--------|
+| **WikiData_recent** | | | | | | | | | |
+| | Edit Succ. | 98.68 | 60.74 | 65.61 | 76.88 | 85.08 | 85.32 | 71.18 | 31.24 |
+| | Portability | 63.52 | 36.93 | 47.22 | 50.11 | 37.45 | 37.94 | 48.71 | 15.91 |
+| | Locality | 100.00 | 33.34 | 55.78 | 92.87 | 66.2 | 64.78 | 63.7 | 3.65 |
+| | Fluency | 553.19 | 531.01 | 537.51 | 586.34 | 574.28 | 566.66 | 549.35 | 428.67 |
+| **ZsRE** | | | | | | | | | |
+| | Edit Succ. | 99.67 | 66.01 | 69.86 | 96.74 | 96.57 | 83.07 | 54.65 | 36.88 |
+| | Portability | 56.48 | 63.94 | 52.95 | 60.41 | 52.20 | 51.43 | 45.02 | 8.72 |
+| | Locality | 30.23 | 23.14 | 72.21 | 92.79 | 27.14 | 25.46 | 71.12 | 0.31 |
+| | Fluency | 410.89 | 541.14 | 532.82 | 524.33 | 570.47 | 559.72 | 474.18 | 471.29 |
+| **WikiBio** | | | | | | | | | |
+| | Edit Succ. | 99.69 | 95.53 | 97.02 | 93.66 | 95.05 | 94.29 | 66.27 | 95.64 |
+| | Locality | 69.79 | 47.90 | 57.87 | 69.51 | 46.96 | 51.56 | 60.14 | 13.38 |
+| | Fluency | 606.95 | 632.92 | 615.86 | 609.39 | 617.25 | 616.65 | 604.00 | 589.22 |
+| **WikiData_counterfact** | | | | | | | | | |
+| | Edit Succ. | 99.99 | 69.83 | 72.14 | 78.82 | 83.21 | 83.41 | 51.12 | 26.78 |
+| | Portability | 76.07 | 45.32 | 55.17 | 57.53 | 38.69 | 40.09 | 39.07 | 16.94 |
+| | Locality | 98.96 | 32.38 | 66.78 | 94.16 | 65.4 | 63.68 | 62.51 | 0.29 |
+| | Fluency | 549.91 | 547.22 | 553.85 | 588.94 | 578.84 | 568.58 | 544.80 | 483.71 |
+| **ConvSent** | | | | | | | | | |
+| | Edit Succ. | 62.75 | 52.78 | 44.89 | 50.76 | 45.79 | 44.75 | 49.50 | 61.93 |
+| | Locality | 0.26 | 49.73 | 0.18 | 3.42 | 0.00 | 0.00 | 0.00 | 0.00 |
+| | Fluency | 458.21 | 621.45 | 606.42 | 379.43 | 606.32 | 602.62 | 607.86 | 546.24 |
+| **Sanitation** | | | | | | | | | |
+| | Edit Succ. | 0.00 | 72.50 | 2.50 | 0.00 | 85.00 | 48.75 | 0.00 | 60.00 |
+| | Locality | 100.00 | 56.58 | 65.50 | 5.29 | 50.31 | 67.47 | 14.78 | 42.61 |
+| | Fluency | 416.29 | 794.15 | 330.44 | 407.18 | 465.12 | 466.10 | 439.10 | 351.39 |
+
+## Citation
+
+Please cite our paper if you use EasyEdit in your work.
+
+```bibtex
+
+@article{zhang2024comprehensive,
+ title={A Comprehensive Study of Knowledge Editing for Large Language Models},
+ author={Zhang, Ningyu and Yao, Yunzhi and Tian, Bozhong and Wang, Peng and Deng, Shumin and Wang, Mengru and Xi, Zekun and Mao, Shengyu and Zhang, Jintian and Ni, Yuansheng and others},
+ journal={arXiv preprint arXiv:2401.01286},
+ year={2024}
+}
+
+@article{wang2023easyedit,
+ title={Easyedit: An easy-to-use knowledge editing framework for large language models},
+ author={Wang, Peng and Zhang, Ningyu and Xie, Xin and Yao, Yunzhi and Tian, Bozhong and Wang, Mengru and Xi, Zekun and Cheng, Siyuan and Liu, Kangwei and Zheng, Guozhou and others},
+ journal={arXiv preprint arXiv:2308.07269},
+ year={2023}
+}
+
+@article{yao2023editing,
+ title={Editing Large Language Models: Problems, Methods, and Opportunities},
+ author={Yao, Yunzhi and Wang, Peng and Tian, Bozhong and Cheng, Siyuan and Li, Zhoubo and Deng, Shumin and Chen, Huajun and Zhang, Ningyu},
+ journal={arXiv preprint arXiv:2305.13172},
+ year={2023}
+}
+
+@article{cheng2023edit,
+ title={Can We Edit Multimodal Large Language Models?},
+ author={Cheng, Siyuan and Tian, Bozhong and Liu, Qingbin and Chen, Xi and Wang, Yongheng and Chen, Huajun and Zhang, Ningyu},
+ journal={arXiv preprint arXiv:2310.08475},
+ year={2023}
+}
+
+@article{mao2023editing,
+ title={Editing personality for llms},
+ author={Mao, Shengyu and Zhang, Ningyu and Wang, Xiaohan and Wang, Mengru and Yao, Yunzhi and Jiang, Yong and Xie, Pengjun and Huang, Fei and Chen, Huajun},
+ journal={arXiv preprint arXiv:2310.02168},
+ year={2023}
+}
+
+@misc{knowlm,
+ author = {Ningyu Zhang and Jintian Zhang and Xiaohan Wang and Honghao Gui and Kangwei Liu and Yinuo Jiang and Xiang Chen and Shengyu Mao and Shuofei Qiao and Yuqi Zhu and Zhen Bi and Jing Chen and Xiaozhuan Liang and Yixin Ou and Runnan Fang and Zekun Xi and Xin Xu and Lei Li and Peng Wang and Mengru Wang and Yunzhi Yao and Bozhong Tian and Yin Fang and Guozhou Zheng and Huajun Chen},
+ title = {KnowLM Technical Report},
+ year = {2023},
+ url = {http://knowlm.zjukg.cn/},
+}
+```
+
+## 🎉Contributors
+
+
+
+
+
+We thank all the contributors to this project, more contributors are welcome!
+
+#### Other Related Projects
+
+- [ROME](https://github.com/kmeng01/rome)
+- [FastEdit](https://github.com/hiyouga/FastEdit)
+- [GRACE](https://github.com/Thartvigsen/GRACE)
+- [MELO](https://github.com/ECNU-ICALK/MELO)
+- [PMET](https://github.com/xpq-tech/PMET)
+- [PitfallsKnowledgeEditing](https://github.com/zjunlp/PitfallsKnowledgeEditing)
+- [EditBias](https://github.com/zjunlp/EditBias)
+- [WikiLLM](https://github.com/laramohan/wikillm)
+
+🙌 We would like to express our heartfelt gratitude for the contribution of [FastEdit](https://github.com/hiyouga/FastEdit), [ROME](https://github.com/kmeng01/rome), [GRACE](https://github.com/Thartvigsen/GRACE), [MELO](https://github.com/ECNU-ICALK/MELO), [PMET](https://github.com/xpq-tech/PMET) to our project, as we have utilized portions of their source code in our project. Many thanks to all the colleagues in the community for submitting issues and providing technical support.
diff --git a/calculate.py b/calculate.py
new file mode 100644
index 00000000..f4825574
--- /dev/null
+++ b/calculate.py
@@ -0,0 +1,102 @@
+import json
+import argparse
+import math
+from numpy import *
+
+
+if __name__ == "__main__":
+ parser = argparse.ArgumentParser()
+ parser.add_argument("--method", default="FT", type=str)
+ parser.add_argument("--model", default="mistral",type=str)
+ parser.add_argument("--module",default="intra",type=str)
+ args = parser.parse_args()
+
+ result_dir = "./final_result_upload"
+ with open(f"{result_dir}/{args.method}_results_{args.model}_{args.module}.json", "r") as f:
+ result = json.load(f)
+
+ rewrite_acc = 0
+ rephrase_acc = 0
+ locality = 0
+ loc_list = []
+ instance = 0
+ port_list = []
+
+ for i, item in enumerate(result):
+
+ case = item["post"]
+ # print(case)
+ if not math.isnan(case["rewrite_acc"][0]):
+ rewrite_acc = ((rewrite_acc * i) + mean(case["rewrite_acc"][0])) / (i + 1)
+ else:
+ print(f'{i}: {case}')
+ if not math.isnan(case["rephrase_acc"][0]):
+ rephrase_acc = ((rephrase_acc * i) + mean(case["rephrase_acc"][0])) / (i + 1)
+ else:
+ print(f'{i}: {case}')
+
+ locality_ = 0
+ instance_ = 0
+ if "locality" in case.keys() and case["locality"]:
+ if "neighborhood_acc" in case["locality"].keys():
+ locality_ += mean(case["locality"]["neighborhood_acc"])
+ if not math.isnan(locality_):
+ loc_list.append(locality_)
+
+ if "instance" in case.keys() and case["instance"]:
+ if "instance_change" in case["instance"].keys():
+ if case["instance"]["instance_change"] == -1:
+ case["instance"]["instance_change"] = 0
+ instance_ += mean(case["instance"]["instance_change"])
+ if not math.isnan(instance_):
+ port_list.append(instance_)
+ locality = mean(loc_list) if loc_list else 0
+ instance = mean(port_list) if port_list else 0
+
+ sub1 = instance
+ # print(f'dir: {result_dir}\npost\nrewrite_acc: {rewrite_acc*100}\nlocality: {locality*100}\nrephrase_acc: {rephrase_acc*100}\ninstance_new: {instance}\n')
+ print(f'dir: {result_dir}\npost\nReliability: {rewrite_acc*100}\nGeneralization: {rephrase_acc*100}\nLocality: {locality*100}')
+
+
+ rewrite_acc = 0
+ rephrase_acc = 0
+ locality = 0
+ loc_list = []
+ instance = 0
+ port_list = []
+
+ for i, item in enumerate(result):
+ case = item["pre"]
+ if not math.isnan(case["rewrite_acc"][0]):
+ rewrite_acc = ((rewrite_acc * i) + mean(case["rewrite_acc"][0])) / (i + 1)
+ else:
+ print(f'{i}: {case}')
+ if not math.isnan(case["rephrase_acc"][0]):
+ rephrase_acc = ((rephrase_acc * i) + mean(case["rephrase_acc"][0])) / (i + 1)
+ else:
+ print(f'{i}: {case}')
+
+ locality_ = 0
+ instance_ = 0
+ if "locality" in case.keys() and case["locality"]:
+ if "neighborhood_acc" in case["locality"].keys():
+ locality_ += mean(case["locality"]["neighborhood_acc"])
+ if not math.isnan(locality_):
+ loc_list.append(locality_)
+
+ if "instance" in case.keys() and case["instance"]:
+ if "instance_change" in case["instance"].keys():
+ if case["instance"]["instance_change"] == -1:
+ case["instance"]["instance_change"] = 0
+ instance_ += mean(case["instance"]["instance_change"])
+ if not math.isnan(instance_):
+ port_list.append(instance_)
+ locality = mean(loc_list) if loc_list else 0
+ instance = mean(port_list) if port_list else 0
+ sub2 =instance
+
+
+ # print(f'dir: {result_dir}\npre\nrewrite_acc: {rewrite_acc*100}\nlocality: {locality*100}\nrephrase_acc: {rephrase_acc*100}\ninstance_new: {instance}\n')
+
+ print('instance_change: ',end='')
+ print((sub2-sub1)*100)
\ No newline at end of file
diff --git a/command.sh b/command.sh
new file mode 100644
index 00000000..38a9797f
--- /dev/null
+++ b/command.sh
@@ -0,0 +1,55 @@
+# Model gpt2-xl
+
+# python run_concept_editing.py --editing_method=FT --edited_model gpt2 --hparams_dir=./hparams/FT/gpt2-xl
+# python run_concept_editing.py --editing_method=FT --edited_model gpt2 --hparams_dir=./hparams/FT/gpt2-xl --inter
+
+# python run_concept_editing.py --editing_method=ROME --edited_model gpt2 --hparams_dir=./hparams/ROME/gpt2-xl
+# python run_concept_editing.py --editing_method=ROME --edited_model gpt2 --hparams_dir=./hparams/ROME/gpt2-xl --inter
+
+# python run_concept_editing.py --editing_method=MEMIT --edited_model gpt2 --hparams_dir=./hparams/MEMIT/gpt2-xl
+# python run_concept_editing.py --editing_method=MEMIT --edited_model gpt2 --hparams_dir=./hparams/MEMIT/gpt2-xl --inter
+
+# python run_concept_editing.py --editing_method=PROMPT --edited_model gpt2 --hparams_dir=None
+# python run_concept_editing.py --editing_method=PROMPT --edited_model gpt2 --hparams_dir=None --inter
+
+# Model gpt-j-6B
+
+# python run_concept_editing.py --editing_method=FT --edited_model gptj --hparams_dir=./hparams/FT/gpt-j-6B
+# python run_concept_editing.py --editing_method=FT --edited_model gptj --hparams_dir=./hparams/FT/gpt-j-6B --inter
+
+# python run_concept_editing.py --editing_method=ROME --edited_model gptj --hparams_dir=./hparams/ROME/gpt-j-6B
+# python run_concept_editing.py --editing_method=ROME --edited_model gptj --hparams_dir=./hparams/ROME/gpt-j-6B --inter
+
+# python run_concept_editing.py --editing_method=MEMIT --edited_model gptj --hparams_dir=./hparams/MEMIT/gpt-j-6B
+# python run_concept_editing.py --editing_method=MEMIT --edited_model gptj --hparams_dir=./hparams/MEMIT/gpt-j-6B --inter
+
+# python run_concept_editing.py --editing_method=PROMPT --edited_model gptj --hparams_dir=None
+# python run_concept_editing.py --editing_method=PROMPT --edited_model gptj --hparams_dir=None --inter
+
+# Model llama2-chat-7b
+
+# python run_concept_editing.py --editing_method=FT --edited_model llama2chat --hparams_dir=./hparams/FT/llama-7b
+# python run_concept_editing.py --editing_method=FT --edited_model llama2chat --hparams_dir=./hparams/FT/llama-7b --inter
+
+# python run_concept_editing.py --editing_method=ROME --edited_model llama2chat --hparams_dir=./hparams/ROME/llama-7b
+# python run_concept_editing.py --editing_method=ROME --edited_model llama2chat --hparams_dir=./hparams/ROME/llama-7b --inter
+
+# python run_concept_editing.py --editing_method=MEMIT --edited_model llama2chat --hparams_dir=./hparams/MEMIT/llama-7b
+# python run_concept_editing.py --editing_method=MEMIT --edited_model llama2chat --hparams_dir=./hparams/MEMIT/llama-7b --inter
+
+# python run_concept_editing.py --editing_method=PROMPT --edited_model llama2chat --hparams_dir=None
+# python run_concept_editing.py --editing_method=PROMPT --edited_model llama2chat --hparams_dir=None --inter
+
+# Model mistral-7b
+
+# python run_concept_editing.py --editing_method=FT --hparams_dir=./hparams/FT/mistral-7b
+# python run_concept_editing.py --editing_method=FT --hparams_dir=./hparams/FT/mistral-7b --inter
+
+# python run_concept_editing.py --editing_method=ROME --edited_model mistral --hparams_dir=./hparams/ROME/mistral-7b
+# python run_concept_editing.py --editing_method=ROME --edited_model mistral --hparams_dir=./hparams/ROME/mistral-7b --inter
+
+# python run_concept_editing.py --editing_method=MEMIT --edited_model mistral --hparams_dir=./hparams/MEMIT/mistral-7b
+# python run_concept_editing.py --editing_method=MEMIT --edited_model mistral --hparams_dir=./hparams/MEMIT/mistral-7b --inter
+
+# python run_concept_editing.py --editing_method=PROMPT --edited_model mistral --hparams_dir=None
+# python run_concept_editing.py --editing_method=PROMPT --edited_model mistral --hparams_dir=None --inter
diff --git a/easyeditor/editors/__init__.py b/easyeditor/editors/__init__.py
index dd94df5c..3b8fea99 100644
--- a/easyeditor/editors/__init__.py
+++ b/easyeditor/editors/__init__.py
@@ -1,2 +1,3 @@
from .editor import *
from .multimodal_editor import *
+from .concept_editor import *
\ No newline at end of file
diff --git a/easyeditor/editors/concept_editor.py b/easyeditor/editors/concept_editor.py
new file mode 100644
index 00000000..0f43d461
--- /dev/null
+++ b/easyeditor/editors/concept_editor.py
@@ -0,0 +1,329 @@
+import os.path
+from typing import Optional, Union, List, Tuple, Dict
+from time import time
+from transformers import AutoTokenizer, AutoModelForCausalLM, AutoModel
+from transformers import LlamaTokenizer, LlamaForCausalLM
+from transformers import GPT2TokenizerFast, GPT2Tokenizer
+from tqdm import tqdm
+import json
+import torch
+import logging
+import numpy as np
+import random
+# from .editor import BaseEditor
+from ..util.globals import *
+from ..evaluate import compute_concept_edit_quality
+from ..util import nethook
+from ..util.hparams import HyperParams
+from ..util.alg_dict import *
+
+logging.basicConfig(format = '%(asctime)s - %(levelname)s - %(name)s - %(message)s',
+ datefmt = '%m/%d/%Y %H:%M:%S',
+ level = logging.INFO)
+
+LOG = logging.getLogger(__name__)
+os.environ['CUDA_LAUNCH_BLOCKING'] = "1"
+
+def make_logs():
+
+ f_h, s_h = get_handler('logs', log_name='run.log')
+ LOG.addHandler(f_h)
+ LOG.addHandler(s_h)
+
+def seed_everything(seed):
+ if seed >= 10000:
+ raise ValueError("seed number should be less than 10000")
+ if torch.distributed.is_initialized():
+ rank = torch.distributed.get_rank()
+ else:
+ rank = 0
+ seed = (rank * 100000) + seed
+
+ torch.manual_seed(seed)
+ np.random.seed(seed)
+ random.seed(seed)
+
+seed_everything(42)
+
+
+# class ConceptEditor(BaseEditor):
+class ConceptEditor:
+
+ @classmethod
+ def from_hparams(cls, hparams: HyperParams, prompt_hparams: Dict= None):
+ if hparams is None :
+ if prompt_hparams is None:
+ raise NotImplementedError
+ phparams = HyperParams()
+ phparams.alg_name = 'prompt'
+ phparams.model_name = prompt_hparams['model_name']
+ phparams.device = prompt_hparams['device']
+ phparams.max_length = 40
+ phparams.model_parallel = False
+ return cls(phparams)
+ return cls(hparams)
+
+ # def __init__(self):
+ # super().__init__()
+
+ def __init__(self,
+ hparams: HyperParams,
+ ):
+
+ assert hparams is not None, print('Error: hparams is None.')
+
+ self.model_name = hparams.model_name
+ if hparams.alg_name != 'prompt':
+ self.apply_algo = ALG_DICT[hparams.alg_name]
+ self.alg_name = hparams.alg_name
+
+ make_logs()
+
+ LOG.info("Instantiating model")
+
+ if type(self.model_name) is str:
+ device_map = 'auto' if hparams.model_parallel else None
+ torch_dtype = torch.float16 if hasattr(hparams, 'fp16') and hparams.fp16 else torch.float32
+ # if 't5' in self.model_name.lower():
+ # self.model = T5ForConditionalGeneration.from_pretrained(self.model_name, torch_dtype=torch_dtype, device_map=device_map)
+ # self.tok = T5Tokenizer.from_pretrained(self.model_name)
+ # elif 'gpt-3.5' in self.model_name.lower():
+ # self.model, self.tok = None, None
+ if 'gpt' in self.model_name.lower():
+ self.model = AutoModelForCausalLM.from_pretrained(self.model_name, torch_dtype=torch_dtype, device_map=device_map)
+ self.tok = GPT2Tokenizer.from_pretrained(self.model_name)
+ self.tok.pad_token_id = self.tok.eos_token_id
+ elif 'llama' in self.model_name.lower():
+ self.model = LlamaForCausalLM.from_pretrained(self.model_name, torch_dtype=torch_dtype, device_map=device_map)
+ self.tok = LlamaTokenizer.from_pretrained(self.model_name)
+ self.tok.pad_token_id = self.tok.eos_token_id
+ # elif 'baichuan' in self.model_name.lower():
+ # self.model = AutoModelForCausalLM.from_pretrained(self.model_name, torch_dtype=torch_dtype, trust_remote_code=True, device_map=device_map)
+ # self.tok = AutoTokenizer.from_pretrained(self.model_name,trust_remote_code=True)
+ # self.tok.pad_token_id = self.tok.eos_token_id
+ # elif 'chatglm' in self.model_name.lower():
+ # self.model = AutoModel.from_pretrained(self.model_name,trust_remote_code=True, torch_dtype=torch_dtype, device_map=device_map)
+ # self.tok = AutoTokenizer.from_pretrained(self.model_name,trust_remote_code=True)
+ # self.tok.unk_token_id = 64787
+ # # self.tok.pad_token_id = self.tok.eos_token_id
+ # elif 'internlm' in self.model_name.lower():
+ # self.model = AutoModel.from_pretrained(self.model_name,trust_remote_code=True, torch_dtype=torch_dtype, device_map=device_map)
+ # self.tok = AutoTokenizer.from_pretrained(self.model_name,trust_remote_code=True)
+ # self.tok.pad_token_id = self.tok.eos_token_id
+ # elif 'qwen' in self.model_name.lower():
+ # self.model = AutoModelForCausalLM.from_pretrained(self.model_name,fp32=False,trust_remote_code=True, device_map=device_map)
+ # self.tok = AutoTokenizer.from_pretrained(self.model_name, eos_token='<|endoftext|>', pad_token='<|endoftext|>',unk_token='<|endoftext|>', trust_remote_code=True)
+ elif 'mistral' in self.model_name.lower():
+ self.model = AutoModelForCausalLM.from_pretrained(self.model_name, torch_dtype=torch_dtype, device_map=device_map)
+ self.tok = AutoTokenizer.from_pretrained(self.model_name)
+ self.tok.pad_token_id = self.tok.eos_token_id
+ else:
+ raise NotImplementedError
+
+ if self.tok is not None and (isinstance(self.tok, GPT2Tokenizer) or isinstance(self.tok, GPT2TokenizerFast) or isinstance(self.tok, LlamaTokenizer)) and (hparams.alg_name not in ['ROME', 'MEMIT']):
+ LOG.info('AutoRegressive Model detected, set the padding side of Tokenizer to left...')
+ self.tok.padding_side = 'left'
+ if self.tok is not None and ('mistral' in self.model_name.lower()) and (hparams.alg_name in ['ROME', 'MEMIT']):
+ LOG.info('AutoRegressive Model detected, set the padding side of Tokenizer to right...')
+ self.tok.padding_side = 'right'
+ else:
+ self.model, self.tok = self.model_name
+
+ if hparams.model_parallel:
+ hparams.device = str(self.model.device).split(":")[1]
+ if not hparams.model_parallel and hasattr(hparams, 'device'):
+ self.model.to(f'cuda:{hparams.device}')
+
+ self.hparams = hparams
+
+
+ def edit(self,
+ prompts: Union[str, List[str]],
+ target_new: Union[str, List[str]],
+ ground_truth: Optional[Union[str, List[str]]] = None,
+ rephrase_prompts: Optional[Union[str, List[str]]] = None,
+ locality_inputs: Optional[Dict] = None,
+ instance_inputs: Optional[Dict] = None,
+ keep_original_weight=False,
+ verbose=True,
+ **kwargs
+ ):
+ concept_consistency = kwargs['concept_consistency'] if 'concept_consistency' in kwargs.keys() else False
+ if isinstance(prompts, List):
+ assert len(prompts) == len(target_new)
+ else:
+ prompts, target_new = [prompts,], [target_new,]
+
+ if hasattr(self.hparams, 'batch_size'): # For Singleton Editing, bs=1
+ self.hparams.batch_size = 1
+
+ if ground_truth is not None:
+ if isinstance(ground_truth, str):
+ ground_truth = [ground_truth,]
+ else:
+ assert len(ground_truth) == len(prompts)
+ else: # Default ground truth is <|endoftext|>
+ ground_truth = ['<|endoftext|>' for _ in range(len(prompts))]
+
+ if "requests" in kwargs.keys():
+ requests = kwargs["requests"]
+ else:
+ requests = self._prepare_requests(prompts, target_new, ground_truth, rephrase_prompts,
+ locality_inputs, instance_inputs, **kwargs)
+ if hasattr(self.hparams, 'batch_size') :
+ assert self.hparams.batch_size == 1, print(f'Single Edit, pls set the batch_size to 1....')
+
+ all_metrics = []
+ if 'pre_edit' in kwargs and kwargs['pre_edit'] is not None:
+ metrics = kwargs['pre_edit']
+ all_metrics = metrics
+ else:
+ for i, request in tqdm(enumerate(requests)):
+ metrics = {
+ "pre": compute_concept_edit_quality(self.model, self.model_name, self.hparams, self.tok, request,
+ self.hparams.device, test_concept_consistency=False)
+ }
+ all_metrics.append(metrics)
+ for i, request in enumerate(requests):
+ start = time()
+
+ if self.alg_name == 'prompt':
+ PMT = f"Definition of {request['subject']}: {request['target_new']}\n"
+ exec_time = time() - start
+ LOG.info(f"Execution {i} editing took {exec_time}")
+ start = time()
+ all_metrics[i].update({
+ 'case_id': i,
+ "requested_rewrite": request,
+ "time": exec_time,
+ "post": compute_concept_edit_quality(self.model, self.model_name, self.hparams, self.tok, request,
+ self.hparams.device, test_concept_consistency=concept_consistency, P=PMT),
+ })
+
+ edited_model = self.model
+ weights_copy = None
+ else:
+ edited_model, weights_copy = self.apply_algo(
+ self.model,
+ self.tok,
+ [request],
+ self.hparams,
+ copy=False,
+ return_orig_weights=True,
+ keep_original_weight=keep_original_weight,
+ train_ds= None
+ )
+ exec_time = time() - start
+ LOG.info(f"Execution {i} editing took {exec_time}")
+
+ start = time()
+ all_metrics[i].update({
+ 'case_id': i,
+ "requested_rewrite": request,
+ "time": exec_time,
+ "post": compute_concept_edit_quality(edited_model, self.model_name, self.hparams, self.tok, request, self.hparams.device, test_concept_consistency=concept_consistency),
+ })
+ with torch.no_grad():
+ for k, v in weights_copy.items():
+ nethook.get_parameter(self.model, k)[...] = v.to(f"cuda:{self.hparams.device}")
+ if 'locality' in all_metrics[i]['post'].keys():
+ for locality_key in request['locality'].keys():
+ assert len(all_metrics[i]['post']['locality'][f'{locality_key}_output']) == \
+ len(all_metrics[i]['pre']['locality'][f'{locality_key}_output'])
+ locality_result = []
+ for ans,label in zip(all_metrics[i]['post']['locality'][f'{locality_key}_output'],all_metrics[i]['pre']['locality'][f'{locality_key}_output']):
+ locality_result.append(np.mean(np.equal(ans, label)))
+ all_metrics[i]['post']['locality'][f'{locality_key}_acc'] = locality_result
+ all_metrics[i]['post']['locality'].pop(f'{locality_key}_output')
+ all_metrics[i]['pre'].pop('locality')
+
+ LOG.info(f"Evaluation took {time() - start}")
+
+ if verbose:
+ LOG.info(
+ f"{i} editing: {request['prompt']} -> {request['target_new']} \n {all_metrics[i]}"
+ )
+
+ return all_metrics, edited_model, weights_copy
+
+ def _prepare_requests(self,
+ prompts: Union[str, List[str]],
+ target_new: Union[str, List[str]],
+ ground_truth: Union[str, List[str]],
+ rephrase_prompts: Optional[Union[str, List[str]]] = None,
+ locality_inputs: Optional[Dict] = None,
+ instance_inputs: Optional[Dict] = None,
+ **kwargs
+ ):
+
+ requests = [{
+ 'prompt': prompt,
+ 'target_new': target_new_,
+ 'ground_truth': ground_truth_,
+ 'instance': {},
+ 'locality': {}
+ }
+ for prompt, ground_truth_, target_new_ in zip(prompts, ground_truth, target_new)
+ ]
+
+ if 'subject' in kwargs:
+ if isinstance(kwargs['subject'], str):
+ kwargs['subject'] = [kwargs['subject'],]
+ else:
+ assert len(kwargs['subject']) == len(prompts)
+ for prompt_, subject_ in zip(prompts, kwargs['subject']):
+ assert subject_ in prompt_, print(f'Subject:{subject_} do not exist in prompt: {prompt_}')
+
+ for i, request in enumerate(requests):
+ request.update(
+ {
+ 'subject': kwargs['subject'][i]
+ }
+ )
+
+ if rephrase_prompts is not None:
+ if isinstance(rephrase_prompts, str):
+ rephrase_prompts = [rephrase_prompts,]
+
+ for i, request in enumerate(requests):
+ request.update(
+ {
+ 'rephrase_prompt': rephrase_prompts[i],
+ }
+ )
+ if locality_inputs is not None:
+ for locality_key in locality_inputs.keys():
+ if isinstance(locality_inputs[locality_key]['prompt'], str):
+ locality_inputs[locality_key]['prompt'] = [locality_inputs[locality_key]['prompt'],]
+ locality_inputs[locality_key]['ground_truth'] = [locality_inputs[locality_key]['ground_truth'], ]
+ assert len(locality_inputs[locality_key]['prompt']) == len(locality_inputs[locality_key]['ground_truth']) \
+ == len(requests), print('One Edit instance needs one locality input.....')
+
+ for i, request in enumerate(requests):
+ if locality_inputs[locality_key]['prompt'][i] is not None:
+ request['locality'].update(
+ {
+ locality_key: {
+ f'prompt': locality_inputs[locality_key]['prompt'][i],
+ f'ground_truth': locality_inputs[locality_key]['ground_truth'][i]
+ }
+ }
+ )
+
+ if instance_inputs is not None:
+ for instance_key in instance_inputs.keys():
+ if isinstance(instance_inputs[instance_key]['prompt'], str):
+ instance_inputs[instance_key]['prompt'] = [instance_inputs[instance_key]['prompt'],]
+ for i, request in enumerate(requests):
+ if instance_inputs[instance_key]['prompt'][i] is not None:
+ request['instance'].update(
+ {
+ instance_key: {
+ 'prompt': instance_inputs[instance_key]['prompt'][i]
+ }
+ }
+ )
+ return requests
+
+ def b(self):
+ print("ConceptEditor's b function")
\ No newline at end of file
diff --git a/easyeditor/evaluate/evaluate.py b/easyeditor/evaluate/evaluate.py
index 93647cf7..5b1c63ef 100644
--- a/easyeditor/evaluate/evaluate.py
+++ b/easyeditor/evaluate/evaluate.py
@@ -20,6 +20,8 @@
test_batch_prediction_acc,
test_prediction_acc,
test_generation_quality,
+ test_concept_gen,
+ test_instance_change,
PPL,
kl_loc_loss,
es_sent,
@@ -703,3 +705,59 @@ def get_edit_labels(ids, prompts=None):
if test_generation:
result['fluency'] = test_generation_quality(model=model,tok=tok,prefixes=metric_kwargs["inner_q"] if isinstance(metric_kwargs["inner_q"],list) else [metric_kwargs["inner_q"],], max_out_len=100)
return result
+
+def compute_concept_edit_quality(
+ model,
+ model_name,
+ hparams: HyperParams,
+ tok: AutoTokenizer,
+ record: typing.Dict,
+ device,
+ eval_metric: str = 'token_em',
+ test_concept_consistency = False,
+ P = None
+) -> typing.Dict:
+
+ target_new, ground_truth = (
+ record[x] for x in ["target_new", "ground_truth"]
+ )
+ if P is None:
+ PMT= ''
+ else:
+ PMT= str(P)
+
+ rewrite_prompts = record["prompt"]
+ rephrase_prompts = record["rephrase_prompt"] if 'rephrase_prompt' in record.keys() else None
+
+ ret = compute_rewrite_or_rephrase_quality(model, model_name, hparams, tok,
+ PMT + rewrite_prompts, target_new, device=device, eval_metric=eval_metric)
+ if test_concept_consistency:
+ least_length_gen = 40
+ ret['gen_concept_text']= test_concept_gen(model,tok,least_length_gen,
+ PMT + rewrite_prompts,target_new,device=device)
+
+ ret['locality'] = {}
+ ret['instance'] = {}
+ if rephrase_prompts is not None:
+ ret.update(
+ compute_rewrite_or_rephrase_quality(model, model_name, hparams, tok,
+ PMT + rephrase_prompts, target_new, device=device, test_rephrase=True, eval_metric=eval_metric)
+ )
+
+ if 'locality' in record.keys() and any(record['locality']):
+ for locality_key in record['locality'].keys():
+ ret['locality'].update(
+ compute_locality_quality(model, model_name, hparams, tok, locality_key,
+ PMT + record['locality'][locality_key]['prompt'],
+ record['locality'][locality_key]['ground_truth'], device=device)
+ )
+
+ if 'instance' in record.keys() and any(record['instance']):
+ for instance_key in record['instance'].keys():
+ ret['instance'].update(
+ {'instance_change': test_instance_change(model,tok,hparams.max_length,
+ record['instance'][instance_key]['prompt'], 'yes', device=device, P=P)[0]}
+ )
+
+ return ret
+
diff --git a/easyeditor/evaluate/evaluate_utils.py b/easyeditor/evaluate/evaluate_utils.py
index a9dee940..1ec28b52 100644
--- a/easyeditor/evaluate/evaluate_utils.py
+++ b/easyeditor/evaluate/evaluate_utils.py
@@ -391,3 +391,66 @@ def F1(model, tok, hparams, prompts, targets, device, locality=False):
labels = slice_list(labels,prompt_len,left=False)
return f1_score(answers, labels, average='macro')
+
+
+
+def test_instance_change(model, tok, max_length, prompts, targets, device, P = None):
+ demo1_str = "Whether FrancoAngeli belongs to category publisher? Yes\nWhether And Other Stories belongs to category people? No\n"
+ if P is None:
+ prompts = demo1_str +prompts
+ else:
+ prompts = P + demo1_str + prompts
+
+ if isinstance(prompts, str):
+ prompts,targets = [prompts,], [targets,]
+ prompt_target = [prompt + ' ' + target for prompt, target in zip(prompts,targets)]
+ max_prompt_len = max([len(tok.encode(_)) for _ in prompt_target]) + 1
+ prompt_tok = tok(
+ prompts,
+ padding=True,
+ truncation=True,
+ max_length=max(max_length, max_prompt_len),
+ return_tensors="pt",
+ )
+ with torch.no_grad():
+ pre_edit_outputs = model.generate(
+ input_ids=prompt_tok['input_ids'].to(f"cuda:{device}"),
+ attention_mask=prompt_tok['attention_mask'].to(f"cuda:{device}"),
+ max_new_tokens=2
+ )
+
+ model_response = [tok.decode(x, skip_special_tokens=True) for x in pre_edit_outputs.detach().cpu().numpy().tolist()]
+ answer = model_response[0][model_response[0].rfind('?')+2:]
+ # print(model_response[0], answer)
+
+ if "yes" in answer.lower():
+ return np.ones(1)
+ else:
+ if "no" not in answer.lower():
+ print(f"entity error in define yes or no: {answer}")
+ return np.array([-1.0])
+ return np.zeros(1)
+
+def test_concept_gen(model, tok, max_length, prompts, targets, device):
+ if isinstance(prompts, str):
+ prompts,targets = [prompts,], [targets,]
+ prompts = [prompt + ' ' for prompt in prompts]
+ prompt_target = [prompt + ' ' + target for prompt, target in zip(prompts,targets)]
+ max_prompt_len = max([len(tok.encode(_)) for _ in prompt_target]) + 1
+ prompt_tok = tok(
+ prompts,
+ padding=True,
+ truncation=True,
+ max_length=max(max_length, max_prompt_len),
+ return_tensors="pt",
+ )
+ with torch.no_grad():
+ pre_edit_outputs = model.generate(
+ input_ids=prompt_tok['input_ids'].to(f"cuda:{device}"),
+ attention_mask=prompt_tok['attention_mask'].to(f"cuda:{device}"),
+ max_new_tokens=40
+ )
+
+ model_response = [tok.decode(x, skip_special_tokens=True) for x in pre_edit_outputs.detach().cpu().numpy().tolist()]
+ answer = model_response[0][len(prompts[0]):]
+ return answer
\ No newline at end of file
diff --git a/figs/flow1.gif b/figs/flow1.gif
new file mode 100644
index 00000000..fb55651b
Binary files /dev/null and b/figs/flow1.gif differ
diff --git a/run_concept_editing.py b/run_concept_editing.py
new file mode 100644
index 00000000..bee510a1
--- /dev/null
+++ b/run_concept_editing.py
@@ -0,0 +1,107 @@
+import os.path
+import sys
+sys.path.append('..')
+import json
+import random
+from easyeditor import FTHyperParams, MEMITHyperParams, ROMEHyperParams, HyperParams
+from easyeditor import ConceptEditor
+import numpy as np
+import torch
+
+
+import argparse
+
+models_implement = ['mistral','llama2chat','gpt2','gptj']
+model_names = ['./hugging_cache/Mistral-7B-v0.1','./hugging_cache/llama2-7b-chat','./hugging_cache/gpt2-xl','./hugging_cache/gpt-j-6B']
+
+def setup_seed(seed):
+ torch.manual_seed(seed)
+ torch.cuda.manual_seed_all(seed)
+ np.random.seed(seed)
+ random.seed(seed)
+ torch.backends.cudnn.deterministic = True
+
+if __name__ == "__main__":
+ parser = argparse.ArgumentParser()
+ parser.add_argument('--edited_model', required=True, type=str)
+ parser.add_argument('--editing_method', required=True, type=str)
+ parser.add_argument('--hparams_dir', required=True, type=str)
+ parser.add_argument('--data_dir', default='./data', type=str)
+ parser.add_argument('--metrics_save_dir', default='./final_result_upload', type=str)
+ parser.add_argument('--inter', action='store_true')
+
+ args = parser.parse_args()
+
+ if args.edited_model not in models_implement:
+ raise NotImplementedError
+
+ if args.editing_method == 'FT':
+ editing_hparams = FTHyperParams
+ elif args.editing_method == 'MEMIT':
+ editing_hparams = MEMITHyperParams
+ elif args.editing_method == 'ROME':
+ editing_hparams = ROMEHyperParams
+ elif args.editing_method == 'PROMPT':
+ editing_hparams = HyperParams
+ else:
+ raise NotImplementedError
+
+ if args.inter:
+ module = "inter"
+ else:
+ module = "intra"
+
+ test_data = json.load(open(os.path.join(args.data_dir, f"final_{args.edited_model}_{module}.json"), 'r', encoding='utf-8'))
+
+
+ # 设置随机数种子
+ setup_seed(42)
+
+ # test_data= test_data[:5]
+
+ prompts = [test_data_['prompt'] for test_data_ in test_data]
+ rephrase_prompts = [edit_data_['phrase_prompt'] for edit_data_ in test_data]
+ target_new = [edit_data_['target_new_desc'] for edit_data_ in test_data]
+ entity_prompts = [edit_data_['instance_prompt'] for edit_data_ in test_data]
+ in_locality_prompts = [edit_data_['locality_prompt'] for edit_data_ in test_data]
+ in_locality_ans = [edit_data_['locality_answer'] for edit_data_ in test_data]
+
+ locality_inputs = {
+ 'neighborhood':{
+ 'prompt': in_locality_prompts,
+ 'ground_truth': in_locality_ans
+ }
+ }
+ instance_inputs = {
+ 'instance':{
+ 'prompt': entity_prompts
+ },
+ }
+
+ subject = [edit_data_['label'] for edit_data_ in test_data]
+ train_ds = None
+
+
+
+ if args.editing_method == 'PROMPT':
+ prompt_hparams = {'model_name': model_names[models_implement.index(args.edited_model)], 'device': 0}
+ hparams = None
+ editor = ConceptEditor.from_hparams(hparams, prompt_hparams)
+
+ else:
+ hparams = editing_hparams.from_hparams(args.hparams_dir)
+ editor = ConceptEditor.from_hparams(hparams)
+
+ metrics, edited_model, _ = editor.edit(
+ prompts=prompts,
+ rephrase_prompts=rephrase_prompts,
+ target_new=target_new,
+ subject=subject,
+ train_ds=train_ds,
+ locality_inputs=locality_inputs,
+ instance_inputs=instance_inputs,
+ # concept_consistency = True,
+ keep_original_weight=True
+ )
+
+ json.dump(metrics, open(os.path.join(args.metrics_save_dir, f'{args.editing_method}_results_{args.edited_model}_{module}.json'), 'w'), indent=4)
diff --git a/transform_check.py b/transform_check.py
new file mode 100644
index 00000000..0ce096ae
--- /dev/null
+++ b/transform_check.py
@@ -0,0 +1,77 @@
+import json
+import argparse
+import os
+
+data = json.load(open('./data/concept_data.json'))
+
+parser = argparse.ArgumentParser()
+parser.add_argument("--method", default="FT", type=str)
+parser.add_argument("--model", default="llama2chat",type=str)
+parser.add_argument("--module",default="intra",type=str)
+args = parser.parse_args()
+
+
+def process_str(s):
+ s = s.lstrip().replace('\n','')
+ while s.startswith(':') or s.startswith('\"'):
+ s = s[1:]
+ if '.' in s:
+ first_period_index = s.find('.')
+ if s[first_period_index-1].isdigit():
+ next_period_index = s.find('.', first_period_index + 20)
+ return s[first_period_index +1 :next_period_index]
+ return s[:first_period_index+1]
+ else:
+ return s
+
+
+
+
+str_temp ='''Prediction sentence: [PREDICTION]
+
+Sentence A: [TARGET].
+Sentence B: [GROUND].
+
+Check the prediction sentence and Give a score from -1 to 1:
+Score 1: close meaning to sentence A
+Score 0: neither relevant to A nor B
+Score -1: close meaning to sentence B
+
+Output format is {Score:{}, Reason:{}}
+'''
+
+print(str_temp)
+
+
+out_dir = 'trans_check'
+if os.path.exists(out_dir) is False:
+ os.makedirs(out_dir)
+result_dir = "final_result_upload"
+with open(f"{result_dir}/{args.method}_results_{args.model}_{args.module}.json", "r") as f:
+ result = json.load(f)
+
+outputs = []
+result = [i['post']['gen_concept_text'] for i in result]
+for id , i in enumerate(data):
+ item = {}
+ item['id'] = id
+ item['label'] = i['concept_name']
+ target_str = i[f'module_{args.module}']['replace_def']
+ ground_str = i['concept_def']
+ processed_str = process_str(result[id])
+
+ input_str = str_temp.replace('[PREDICTION]',processed_str).replace('[TARGET]',target_str).replace('[GROUND]',ground_str)
+ item['input_str'] = input_str
+ outputs.append(item)
+
+json.dump(outputs,open(f'{out_dir}/{args.method}_check_{args.model}_{args.module}.json','w'),indent=4)
+
+
+
+import random
+a = random.sample(outputs,5)
+a = [i['input_str'] for i in a]
+for t in a:
+ print(t)
+ print('='*40)
+
\ No newline at end of file