Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to split during conversion #6942

Merged
merged 73 commits into from
Jun 24, 2024
Merged
Show file tree
Hide file tree
Changes from 69 commits
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
874c341
support splits in convert.py
christianazinn Apr 27, 2024
72cbd4e
Support split by size and dry run to write estimated shards/filesizes
christianazinn Apr 28, 2024
702a744
Move split functionality to new GGUFManager class
christianazinn Apr 28, 2024
c33bdf3
fix improper function signature
christianazinn Apr 29, 2024
b7c6120
tentative push of convert-hf-to-gguf support
christianazinn May 5, 2024
14b3291
Merge branch 'master' into convert-split
mofosyne May 9, 2024
87a98a5
resolve merge + SplitArguments for easier parsing
christianazinn May 10, 2024
2dd7841
Merge remote-tracking branch 'origin' into convert-split
christianazinn May 23, 2024
3ff27ef
Fix eager tensor memory leak and remove convert.py changes
christianazinn May 23, 2024
6b5c375
refactor SplitStrategy to be a deque
christianazinn May 24, 2024
09baf2f
fix Q8 quantization
christianazinn Jun 3, 2024
240243e
remove unnecessary imports in gguf_manager
christianazinn Jun 3, 2024
140eb52
Merge branch 'master' into convert-split
christianazinn Jun 3, 2024
a9c7703
fix final? merge issue
christianazinn Jun 3, 2024
efead04
fix gguf_writer placement and remove comments
christianazinn Jun 3, 2024
c8ecbc6
oops, actually fix gguf_writer placement
christianazinn Jun 3, 2024
3e9430d
reduce duplicated code from gguf_writer
christianazinn Jun 5, 2024
f6fd3ea
further simplify GGUFManager
christianazinn Jun 5, 2024
bb5ee02
simplify even further and standardize with GGUFWriter
christianazinn Jun 5, 2024
5ad397d
reduce diffs with master
christianazinn Jun 5, 2024
ce7e698
form shards while adding tensors, SHA256 sums agree with master
christianazinn Jun 5, 2024
706bd69
re-add type hint
christianazinn Jun 6, 2024
6a05183
GGUFWriter compatibility fix
christianazinn Jun 6, 2024
3328b0a
Shard dataclass and un-negative dont_add_architecture
christianazinn Jun 6, 2024
1cbab22
type consistency in format_n_bytes_to_str
christianazinn Jun 6, 2024
2037eab
move kv keys to constants.py
christianazinn Jun 6, 2024
83e4a3f
make pathlib explicit
christianazinn Jun 6, 2024
13ffe22
base-1024 bytes to base-1000
christianazinn Jun 6, 2024
6d3a256
rename GGUFManager to GGUFWriterSplit
christianazinn Jun 7, 2024
1312e28
Update gguf-py/gguf/constants.py
christianazinn Jun 7, 2024
5f29d4a
fix convert-hf-to-gguf.py permissions
christianazinn Jun 7, 2024
0283fc1
fix line endings
christianazinn Jun 7, 2024
dc5cf5f
Update gguf-py/gguf/gguf_writer_split.py
christianazinn Jun 7, 2024
e093dfb
convert-hf : restore executable file permission
compilade Jun 7, 2024
9576965
examples/convert-legacy-llama.py: restore executable file permission
christianazinn Jun 8, 2024
c6ae1d6
reinstate original gguf package import and fix type annotation
christianazinn Jun 8, 2024
2e70fa1
attempt to appease the linter
christianazinn Jun 8, 2024
891b19c
attempt 2 to appease the linter
christianazinn Jun 8, 2024
02be0dd
attempt 3 to appease the linter
christianazinn Jun 8, 2024
f658e91
comma consistency
christianazinn Jun 8, 2024
079dfe3
Update convert-hf-to-gguf.py
christianazinn Jun 8, 2024
282e71f
edit cmd line args
christianazinn Jun 9, 2024
666bb09
Merge branch 'master' into convert-split
christianazinn Jun 9, 2024
03cc9bc
use simplification from #7827
christianazinn Jun 9, 2024
97dd416
kv/ti data are still wrong
christianazinn Jun 9, 2024
ff2dd7d
try to refactor kv data (still fails)
christianazinn Jun 9, 2024
ba1be97
fix ti data messiness
christianazinn Jun 9, 2024
69d6e7a
Merge branch 'master' into convert-split
christianazinn Jun 9, 2024
0779f2f
tidy up
christianazinn Jun 9, 2024
a234bf8
fix linting
christianazinn Jun 9, 2024
49b9fbe
actually make the linter happy
christianazinn Jun 9, 2024
0471f67
cleanup round 1
christianazinn Jun 9, 2024
5a96b8f
remove SplitStrategy, SplitArguments
christianazinn Jun 9, 2024
f7ecd99
appease linter
christianazinn Jun 9, 2024
9d7f694
fix typing and clean up
christianazinn Jun 9, 2024
0417104
fix linting
christianazinn Jun 9, 2024
70a6bc9
Update gguf-py/gguf/gguf_writer.py
christianazinn Jun 9, 2024
1e2d9cb
progress bar, fix split logic
christianazinn Jun 9, 2024
f7e7983
Update gguf-py/gguf/gguf_writer.py
christianazinn Jun 10, 2024
79bd2bf
catch oversights
christianazinn Jun 10, 2024
7eea552
Update gguf-py/gguf/gguf_writer.py
christianazinn Jun 10, 2024
99f9a24
Update gguf-py/gguf/gguf_writer.py
christianazinn Jun 10, 2024
ad02c94
Update gguf-py/gguf/gguf_writer.py
christianazinn Jun 10, 2024
c1b1a29
Update gguf-py/gguf/gguf_writer.py
christianazinn Jun 10, 2024
4550826
Update gguf-py/gguf/gguf_writer.py
christianazinn Jun 10, 2024
efa0609
swap bar orders
christianazinn Jun 10, 2024
b843445
Update gguf-py/gguf/gguf_writer.py
christianazinn Jun 10, 2024
854bd64
Update gguf-py/gguf/gguf_writer.py
christianazinn Jun 10, 2024
05b183f
compatibility fix
christianazinn Jun 10, 2024
e9895d2
Update gguf-py/gguf/gguf_writer.py
christianazinn Jun 10, 2024
4e4e376
Merge branch 'master' into convert-split
christianazinn Jun 15, 2024
163712e
Update convert-hf-to-gguf.py
mofosyne Jun 23, 2024
6e4182c
Merge branch 'master' into convert-split
christianazinn Jun 24, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 52 additions & 7 deletions convert-hf-to-gguf.py
christianazinn marked this conversation as resolved.
Show resolved Hide resolved
christianazinn marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,8 @@ class Model:
# subclasses should define this!
model_arch: gguf.MODEL_ARCH

def __init__(self, dir_model: Path, ftype: gguf.LlamaFileType, fname_out: Path, is_big_endian: bool, use_temp_file: bool, eager: bool, model_name: str | None):
def __init__(self, dir_model: Path, ftype: gguf.LlamaFileType, fname_out: Path, is_big_endian: bool, use_temp_file: bool, eager: bool,
model_name: str | None, split_max_tensors: int = 0, split_max_size: int = 0, dry_run: bool = False, small_first_shard: bool = False):
if type(self) is Model:
raise TypeError(f"{type(self).__name__!r} should not be directly instantiated")
self.dir_model = dir_model
Expand Down Expand Up @@ -96,7 +97,8 @@ def __init__(self, dir_model: Path, ftype: gguf.LlamaFileType, fname_out: Path,
ftype_lw: str = ftype_up.lower()
# allow templating the file name with the output ftype, useful with the "auto" ftype
self.fname_out = fname_out.parent / fname_out.name.format(ftype_lw, outtype=ftype_lw, ftype=ftype_lw, OUTTYPE=ftype_up, FTYPE=ftype_up)
self.gguf_writer = gguf.GGUFWriter(path=None, arch=gguf.MODEL_ARCH_NAMES[self.model_arch], endianess=self.endianess, use_temp_file=self.use_temp_file)
self.gguf_writer = gguf.GGUFWriter(None, gguf.MODEL_ARCH_NAMES[self.model_arch],endianess=self.endianess, use_temp_file=self.use_temp_file,
mofosyne marked this conversation as resolved.
Show resolved Hide resolved
split_max_tensors=split_max_tensors, split_max_size=split_max_size, dry_run=dry_run, small_first_shard=small_first_shard)

@classmethod
def __init_subclass__(cls):
Expand Down Expand Up @@ -332,6 +334,8 @@ def write(self):
self.gguf_writer.close()

def write_vocab(self):
if len(self.gguf_writer.tensors) != 1:
raise ValueError('Splitting the vocabulary is not supported')
self.gguf_writer.write_header_to_file(self.fname_out)
self.gguf_writer.write_kv_data_to_file()
self.gguf_writer.close()
Expand Down Expand Up @@ -2801,10 +2805,44 @@ def parse_args() -> argparse.Namespace:
"--verbose", action="store_true",
help="increase output verbosity",
)
parser.add_argument(
"--split-max-tensors", type=int, default=0,
help="max tensors in each split",
)
parser.add_argument(
"--split-max-size", type=str, default="0",
help="max size per split N(M|G)",
)
parser.add_argument(
"--dry-run", action="store_true",
help="only print out a split plan and exit, without writing any new files",
)
parser.add_argument(
"--no-tensor-first-split", action="store_true",
help="do not add tensors to the first split (disabled by default)"
)

return parser.parse_args()


def split_str_to_n_bytes(split_str: str) -> int:
if split_str.endswith("K"):
n = int(split_str[:-1]) * 1000
elif split_str.endswith("M"):
n = int(split_str[:-1]) * 1000 * 1000
elif split_str.endswith("G"):
n = int(split_str[:-1]) * 1000 * 1000 * 1000
elif split_str.isnumeric():
n = int(split_str)
else:
raise ValueError(f"Invalid split size: {split_str}, must be a number, optionally followed by K, M, or G")

if n < 0:
raise ValueError(f"Invalid split size: {split_str}, must be positive")

return n


def main() -> None:
args = parse_args()

Expand Down Expand Up @@ -2837,6 +2875,10 @@ def main() -> None:
"auto": gguf.LlamaFileType.GUESSED,
}

if args.use_temp_file and (args.split_max_tensors > 0 or args.split_max_size != "0"):
logger.error("Error: Cannot use temp file when splitting")
sys.exit(1)

if args.outfile is not None:
fname_out = args.outfile
else:
Expand All @@ -2854,7 +2896,10 @@ def main() -> None:
logger.error(f"Model {hparams['architectures'][0]} is not supported")
sys.exit(1)

model_instance = model_class(dir_model, ftype_map[args.outtype], fname_out, args.bigendian, args.use_temp_file, args.no_lazy, args.model_name)
model_instance = model_class(dir_model, ftype_map[args.outtype], fname_out, args.bigendian, args.use_temp_file,
args.no_lazy, args.model_name, split_max_tensors=args.split_max_tensors,
split_max_size=split_str_to_n_bytes(args.split_max_size), dry_run=args.dry_run,
small_first_shard=args.no_tensor_first_split)

logger.info("Set model parameters")
model_instance.set_gguf_parameters()
Expand All @@ -2865,13 +2910,13 @@ def main() -> None:
model_instance.gguf_writer.add_quantization_version(gguf.GGML_QUANT_VERSION)

if args.vocab_only:
logger.info(f"Exporting model vocab to '{model_instance.fname_out}'")
logger.info("Exporting model vocab...")
model_instance.write_vocab()
logger.info("Model vocab successfully exported.")
else:
logger.info(f"Exporting model to '{model_instance.fname_out}'")
logger.info("Exporting model...")
model_instance.write()

logger.info(f"Model successfully exported to '{model_instance.fname_out}'")
logger.info("Model successfully exported.")


if __name__ == '__main__':
Expand Down
5 changes: 5 additions & 0 deletions gguf-py/gguf/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,11 @@ class Rope:
SCALING_FINETUNED = "{arch}.rope.scaling.finetuned"
SCALING_YARN_LOG_MUL = "{arch}.rope.scaling.yarn_log_multiplier"

class Split:
LLM_KV_SPLIT_NO = "split.no"
LLM_KV_SPLIT_COUNT = "split.count"
LLM_KV_SPLIT_TENSORS_COUNT = "split.tensors.count"

class SSM:
CONV_KERNEL = "{arch}.ssm.conv_kernel"
INNER_SIZE = "{arch}.ssm.inner_size"
Expand Down
Loading
Loading