Skip to content

Latest commit

 

History

History
72 lines (50 loc) · 2.05 KB

File metadata and controls

72 lines (50 loc) · 2.05 KB

第 7 章:微调以遵循指令

此文件夹包含可用于准备指令数据集的实用程序代码。

通过以下方式安装附加包要求:

pip install -r requirements-extra.txt

查找近似重复项

find-near-duplicates.py 函数可用于识别指令数据集中的重复项和近似重复项。例如,

python find-near-duplicates.py --json_file instructions-examples.json
scikit-learn version: 1.3.1


==================================================
Searching 'instruction' for duplicates ...
==================================================
Duplicate pair found with similarity 0.94:
1. Edit the following sentence to make it more formal.
2. Edit the sentence to make it more formal.

Duplicate pair found with similarity 1.00:
1. Name a dwarf planet in our solar system.
2. Name a dwarf planet in our solar system.

Duplicate pair found with similarity 0.91:
1. Change the sentences from active voice to passive voice.
2. Change the sentence from passive to active voice.



==================================================
Searching 'input' for duplicates ...
==================================================
No duplicates found


==================================================
Searching 'output' for duplicates ...
==================================================
Duplicate pair found with similarity 1.00:
1. One dwarf planet in our solar system is Pluto.
2. One dwarf planet in our solar system is Pluto.


  您可以使用 --threshold 设置,其值为 0 到 1 之间的值,以降低或增加敏感度。 默认阈值为 0.9。

 

创建被动语态条目

{  
   'instruction': 'Identify the verb in the following sentence',
   'input': 'The cat sleeps on the couch.',
   'output': 'The verb in the sentence is "sleeps."',
   'output_2': 'The sentence is "sleeps."'   #  <---- Newly created entry
}