-
Notifications
You must be signed in to change notification settings - Fork 961
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
loading big models into memory #3153
Comments
from huggingface_hub import snapshot_download
import torch
from accelerate import infer_auto_device_map
from transformers import AutoModelForCausalLM, AutoConfig
checkpoint = "marcsun13/gpt2-xl-linear-sharded"
weights_location = snapshot_download(repo_id=checkpoint)
# Instead of loading directly from checkpoint, use 'gpt2-xl' as base
# and load the sharded weights into it.
config = AutoConfig.from_pretrained("gpt2-xl") # Load config for gpt2-xl
# Now load the model using the gpt2-xl configuration and downloaded sharded weights
model = AutoModelForCausalLM.from_pretrained(
weights_location, config=config, torch_dtype=torch.float16, ignore_mismatched_sizes=True
)
# Now use the model object in infer_auto_device_map
device_map = infer_auto_device_map(
model, max_memory={0: "10GiB", "cpu": "10GiB"}
)
from accelerate import init_empty_weights
from mingpt.model import GPT
model_config = GPT.get_default_config()
model_config.model_type = 'gpt2-xl'
model_config.vocab_size = 50257
model_config.block_size = 1024
with init_empty_weights():
model = GPT(model_config)
from accelerate import load_checkpoint_and_dispatch
model = load_checkpoint_and_dispatch(
model, checkpoint=weights_location, device_map="auto", no_split_module_classes=['Block']
)
model.hf_device_map
from mingpt.bpe import BPETokenizer
tokenizer = BPETokenizer()
inputs = tokenizer("Who is Napoleon Bonaparte?").to(0)
# Use 'inputs' instead of 'x1' here
outputs = model.generate(inputs, max_new_tokens=512, do_sample=False)[0]
tokenizer.decode(outputs.cpu().squeeze())
Fetching 9 files: 100%
9/9 [00:00<00:00, 370.91it/s]
Loading checkpoint shards: 100%
7/7 [00:01<00:00, 4.30it/s]
Some weights of GPT2LMHeadModel were not initialized from the model checkpoint at /root/.cache/huggingface/hub/models--marcsun13--gpt2-xl-linear-sharded/snapshots/aeb281f0cd2bfc947d4702b27aecd9194c322c7e and are newly initialized because the shapes did not match:
- transformer.h.0.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.0.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.0.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.1.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.1.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.1.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.2.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.2.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.2.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.3.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.3.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.3.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.4.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.4.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.4.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.5.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.10.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.10.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.10.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.11.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.11.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.11.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.12.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.12.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.12.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.5.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.5.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.6.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.6.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.6.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.7.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.7.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.7.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.8.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.8.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.8.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.9.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.9.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.9.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.13.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.13.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.13.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.14.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.14.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.14.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.15.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.15.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.15.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.16.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.16.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.16.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.17.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.17.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.17.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.18.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.18.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.18.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.19.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.19.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.19.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.20.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.20.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.20.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.21.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.21.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.21.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.22.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.22.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.22.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.23.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.23.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.23.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.24.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.24.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.24.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.25.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.25.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.25.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.26.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.26.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.26.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.27.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.27.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.27.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.28.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.28.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.28.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.29.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.29.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.29.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.30.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.30.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.30.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.31.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.31.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.31.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.32.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.32.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.32.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.33.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.33.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.33.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.34.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.34.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.34.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.35.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.35.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.35.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.36.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.36.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.36.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.37.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.37.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.37.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.38.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.38.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.38.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.39.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.39.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.39.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.40.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.40.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.40.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.41.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.41.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.41.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.42.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.42.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.42.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.43.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.43.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.43.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.44.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.44.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.44.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.45.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.45.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.45.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.46.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.46.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.46.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
- transformer.h.47.attn.c_attn.weight: found shape torch.Size([4800, 1600]) in the checkpoint and torch.Size([1600, 4800]) in the model instantiated
- transformer.h.47.mlp.c_fc.weight: found shape torch.Size([6400, 1600]) in the checkpoint and torch.Size([1600, 6400]) in the model instantiated
- transformer.h.47.mlp.c_proj.weight: found shape torch.Size([1600, 6400]) in the checkpoint and torch.Size([6400, 1600]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
number of parameters: 1557.61M
Who is Napoleon Bonaparte?\n\nNapoleon Bonaparte was a French general who led the French army during the French Revolution. He was the first to use the term "Napoleon" to describe himself.\n\nWhat is the name of the French Revolution?\n\nThe French Revolution was a period of political and social upheaval in France that began in 1789. It was the first of the French revolutions, and was the first to be led by a man.\n\nWhat is the name of the French Revolution?\n\nThe French Revolution was a period of political and social upheaval in France that began in 1789. It was the first of the French revolutions, and was the first to be led by a man.\n\nWhat is the name of the French Revolution?\n\nThe French Revolution was a period of political and social upheaval in France that began in 1789. It was the first of the French revolutions, and was the first to be led by a man.\n\nWhat is the name of the French Revolution?\n\nThe French Revolution was a period of political and social upheaval in Fran |
If I have a single 16 GB Vega and a processor, how do I run a larger model of Vega on the Vega and the processor so that I can benefit from the Vega acceleration? Are the codes that I ran correct or can they be modified to achieve good results? |
What are the steps from a to z to run a model larger than the 16 GB Vega on the Vega and the processor? Starting from downloading the model, then creating an empty model, then placing the weights in it, then running it with a request or completing the text |
@werruww please do not spam this with nearly the same result. It makes us think that this is an LLM instead of a real problem, and bloats our notifications as well |
In general, do |
from huggingface_hub import snapshot_download from accelerate import init_empty_weights model_config = GPT.get_default_config() with init_empty_weights(): from accelerate import load_checkpoint_and_dispatch model = load_checkpoint_and_dispatch( from mingpt.bpe import BPETokenizer Change x1 to inputsoutputs = model.generate(inputs, max_new_tokens=512, do_sample=False)[0] device_map="auto" This is the code what is the modification؟ |
clear and pot |
If you allow me to write a complete code that I trust |
If possible, a collab page tpu 24g |
ValueError Traceback (most recent call last) 2 frames ValueError: Trying to set a tensor of shape torch.Size([32768, 4096]) in "weight" (which has shape torch.Size([32768, 768])), this looks incorrect. code from huggingface_hub import snapshot_download from accelerate import init_empty_weights model_config = GPT.get_default_config() from accelerate import init_empty_weights with init_empty_weights(): from accelerate import load_checkpoint_and_dispatch model = load_checkpoint_and_dispatch( !model.hf_device_map from mingpt.bpe import BPETokenizer Use 'inputs' instead of 'x1' for model generationoutputs = model.generate(inputs, max_new_tokens=1024, do_sample=False)[0] |
Extended vocabulary to 32768 |
I ran the code. on colab t4 12 ram |
ValueError Traceback (most recent call last) 2 frames ValueError: Trying to set a tensor of shape torch.Size([32768, 4096]) in "weight" (which has shape torch.Size([32768, 768])), this looks incorrect. |
| {
-- | --
| "architectures": [
| "MistralForCausalLM"
| ],
| "attention_dropout": 0.0,
| "bos_token_id": 1,
| "eos_token_id": 2,
| "hidden_act": "silu",
| "hidden_size": 4096,
| "initializer_range": 0.02,
| "intermediate_size": 14336,
| "max_position_embeddings": 32768,
| "model_type": "mistral",
| "num_attention_heads": 32,
| "num_hidden_layers": 32,
| "num_key_value_heads": 8,
| "rms_norm_eps": 1e-05,
| "rope_theta": 1000000.0,
| "sliding_window": null,
| "tie_word_embeddings": false,
| "torch_dtype": "bfloat16",
| "transformers_version": "4.42.0.dev0",
| "use_cache": true,
| "vocab_size": 32768
| }
no config.block_size |
from huggingface_hub import snapshot_download import torch.nn as nn # import the torch.nn module and alias it as nn with init_empty_weights(): import torch Download checkpoint weightscheckpoint = "openai-community/gpt2" Initialize an empty model of the correct type, but load the weights immediatelyinstead of using init_empty_weightswith init_empty_weights(): # Remove this linemodel = GPT2LMHeadModel.from_pretrained(checkpoint, torch_dtype=torch.float16)Initialize the model and load weights directlymodel = GPT2LMHeadModel.from_pretrained(checkpoint, torch_dtype=torch.float16) Load the checkpoint weights into the model, dispatching to appropriate devicesNote: If you want to load specific weights from the checkpoint file,you'll need to modify this part to load the state_dict explicitly.model = load_checkpoint_and_dispatch( import torch.nn as nn # import the torch.nn module and alias it as nn Download the checkpointcheckpoint = "openai-community/gpt2" Option 1: Load the pre-trained GPT-2 modelInstead of creating a sequential model, use AutoModelForCausalLM to load GPT-2 directlymodel = AutoModelForCausalLM.from_pretrained(checkpoint) Option 2: Update the device_map to be compatible with the Sequential model.NOTE: This assumes the checkpoint is compatible with your sequential model.It is more likely that you will need to create a model compatible with your checkpoint.device_map = {}for i in range(1000):device_map[f"{i}.weight"] = "cpu" # Map weights of each layer to CPUdevice_map[f"{i}.bias"] = "cpu" # Map biases of each layer to CPULoad the checkpoint and dispatchmodel = load_checkpoint_and_dispatch( import torch Instantiate the GPT-2 tokenizer instead of ByteLevelBPETokenizertokenizer = GPT2Tokenizer.from_pretrained("gpt2") Use the tokenizerinputs = tokenizer("Hello, my name is", return_tensors="pt").input_ids.to("cpu") outputs = model.generate(inputs, max_new_tokens=10, do_sample=False)[0] /usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: |
colab no t4 no tpu |
How do I create a model without a family gpt and without minGPT like mistral, phi3.5,lama3.1,qwen |
System Info
Information
Tasks
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
)Reproduction
!git clone https://github.com/karpathy/minGPT.git
!pip install minGPT/
!pip install huggingface_hub
!pip install accelerate --upgrade
Expected behavior
code run good
python is a popular open source Python library for data analysis. It is used by many Python developers to perform data analysis tasks.
Python is a very popular programming language. It is used by many people to do many things.
Python is a very popular programming language. It is used by many people to do many things.
Python is a very popular programming language. It is used by many people to do many things.
Python is a very popular programming language. It is used by many people to do many things.
Python is a very popular programming language. It is used by many people to do many things.
Python is a very popular programming language. It is used by many people to do many things.
Python is a very popular programming language. It is used by many people to do many things.
Python is a very popular programming language. It is used by many people to do many things.
Python is a very popular programming language. It is used by many people to do many things.
Python is a very popular programming language. It is used by many people to do many things.
Python is a very popular programming language. It is used by many people to do many things.
Python is a very popular programming language. It is used by many people to do many things.
Python is a very popular programming language. It is used by many people to do many things.
Python is a very popular programming language. It is used by many people to do many things.
Python is a very popular programming language. It is used by many people to do many things.
Python is a very popular programming language. It is used by many people to do many things.
Python is a very popular programming language. It is used by many people to do many things.
Python is a very popular programming language. It is used by many people to do many things.
Python is a very popular programming language. It is used by many people to do many things.
Python is a very popular programming language. It is used by many people to do many things.
Python is a very popular programming language. It is used by many people to do many things.
Python is a very popular programming language. It is used by many people to do many things.
Python is a very popular programming language. It is used by many people to do many things.
Python is a very popular programming language. It is used by many people to do many things.
Python is a very popular programming language
who is python?
I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure. I'm not sure.
The text was updated successfully, but these errors were encountered: