Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CogVLM #27718

Closed
wants to merge 30 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
1e16091
First draft
NielsRogge Nov 23, 2023
c75720b
Improve conversion script
NielsRogge Nov 23, 2023
ed527a0
More improvements
NielsRogge Nov 23, 2023
e633dca
More improvements
NielsRogge Nov 23, 2023
988d430
Add config attributes, improve conversion script
NielsRogge Nov 24, 2023
79cd06c
Make conversion work
NielsRogge Nov 24, 2023
8be1ded
Rename images to pixel_values
NielsRogge Nov 24, 2023
202fcc2
Add processor
NielsRogge Nov 25, 2023
b76b1b9
Remove einops dependency
NielsRogge Nov 25, 2023
185151d
Remove xformers dependency
NielsRogge Nov 25, 2023
1b4de2a
Improve vision config
NielsRogge Nov 25, 2023
98d47a2
Update test
NielsRogge Nov 25, 2023
4f1aa8b
Fix more tests, update conversion script
NielsRogge Nov 26, 2023
17581cc
Fix more tests
NielsRogge Nov 26, 2023
d10cbca
Fix more tests, add docstrings
NielsRogge Nov 26, 2023
5efde22
Improve variable names, docstrings
NielsRogge Nov 26, 2023
7ddd120
Improve more variable names
NielsRogge Nov 26, 2023
4071e89
Leverage _prepare_4d_causal_attention_mask
NielsRogge Nov 26, 2023
e6bd4ed
Rename classes
NielsRogge Nov 26, 2023
a80529f
Remove script
NielsRogge Nov 26, 2023
38ed9bf
Update README and docs
NielsRogge Nov 27, 2023
79f981d
Use native torch rotary embeddings
NielsRogge Nov 28, 2023
2ea6b18
Remove triton dependency
NielsRogge Dec 6, 2023
7f1e274
Remove file
NielsRogge Dec 6, 2023
d3c5fc3
Make fixup
NielsRogge Dec 6, 2023
456a439
Make fixup
NielsRogge Dec 6, 2023
3410c80
Merge branch 'main' into add_cogvlm
younesbelkada Dec 7, 2023
cc5e3ad
More improvements
NielsRogge Dec 16, 2023
2cba884
Add print statements
NielsRogge Dec 18, 2023
1cfabaf
Debug more
NielsRogge Dec 21, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Add config attributes, improve conversion script
  • Loading branch information
NielsRogge committed Dec 6, 2023
commit 988d430f7efa324735e4d1834934694e22c197ee
63 changes: 59 additions & 4 deletions src/transformers/models/cogvlm/configuration_cogvlm.py
Original file line number Diff line number Diff line change
Expand Up @@ -132,8 +132,37 @@ class CogVLMConfig(PretrainedConfig):
documentation from [`PretrainedConfig`] for more information.

Args:
kwargs (*optional*):
Dictionary of keyword arguments.
vocab_size (`int`, *optional*, defaults to 32000):
Vocabulary size of the LLaMA model. Defines the number of different tokens that can be represented by the
`inputs_ids` passed when calling [`LlamaModel`]
hidden_size (`int`, *optional*, defaults to 4096):
Dimension of the hidden representations.
intermediate_size (`int`, *optional*, defaults to 11008):
Dimension of the MLP representations.
num_hidden_layers (`int`, *optional*, defaults to 32):
Number of hidden layers in the Transformer decoder.
num_attention_heads (`int`, *optional*, defaults to 32):
Number of attention heads for each attention layer in the Transformer decoder.
hidden_act (`str` or `function`, *optional*, defaults to `"silu"`):
The non-linear activation function (function or string) in the decoder.
max_position_embeddings (`int`, *optional*, defaults to 2048):
The maximum sequence length that this model might ever be used with. Llama 1 supports up to 2048 tokens,
Llama 2 up to 4096, CodeLlama up to 16384.
initializer_range (`float`, *optional*, defaults to 0.02):
The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
rms_norm_eps (`float`, *optional*, defaults to 1e-06):
The epsilon used by the rms normalization layers.
use_cache (`bool`, *optional*, defaults to `True`):
Whether or not the model should return the last key/values attentions (not used by all models). Only
relevant if `config.is_decoder=True`.
pad_token_id (`int`, *optional*):
Padding token id.
bos_token_id (`int`, *optional*, defaults to 1):
Beginning of stream token id.
eos_token_id (`int`, *optional*, defaults to 2):
End of stream token id.
tie_word_embeddings (`bool`, *optional*, defaults to `False`):
Whether to tie weight embeddings

Example:

Expand All @@ -152,8 +181,34 @@ class CogVLMConfig(PretrainedConfig):

model_type = "cogvlm"

def __init__(self, **kwargs):
super().__init__(**kwargs)
def __init__(
self,
vocab_size=32000,
hidden_size=4096,
intermediate_size=11008,
num_hidden_layers=32,
num_attention_heads=32,
hidden_act="silu",
max_position_embeddings=2048,
initializer_range=0.02,
rms_norm_eps=1e-6,
use_cache=True,
pad_token_id=None,
bos_token_id=1,
eos_token_id=2,
tie_word_embeddings=False,
**kwargs,
):
self.vocab_size = vocab_size
self.max_position_embeddings = max_position_embeddings
self.hidden_size = hidden_size
self.intermediate_size = intermediate_size
self.num_hidden_layers = num_hidden_layers
self.num_attention_heads = num_attention_heads
self.hidden_act = hidden_act
self.initializer_range = initializer_range
self.rms_norm_eps = rms_norm_eps
self.use_cache = use_cache

if vision_config is None:
vision_config = {}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ def convert_cogvlm_checkpoint(model_name, pytorch_dump_folder_path=None, push_to
inputs = original_model.build_conversation_input_ids(
tokenizer, query=query, history=[], images=[image]
) # chat mode
original_pixel_values = inputs["images"][0]
inputs = {
"input_ids": inputs["input_ids"].unsqueeze(0).to("cuda"),
"token_type_ids": inputs["token_type_ids"].unsqueeze(0).to("cuda"),
Expand All @@ -75,7 +76,7 @@ def convert_cogvlm_checkpoint(model_name, pytorch_dump_folder_path=None, push_to
# model = CogVLMForCausalLM(config)

# create processor
image_size = 224
image_size = original_model.config.vision_config["image_size"]
image_processor = CLIPImageProcessor(
size={"height": image_size, "width": image_size},
do_center_crop=False,
Expand All @@ -85,6 +86,9 @@ def convert_cogvlm_checkpoint(model_name, pytorch_dump_folder_path=None, push_to
processor = CogVLMProcessor(image_processor=image_processor, tokenizer=tokenizer)
pixel_values = processor(images=image, return_tensors="pt").pixel_values.to("cuda")

# verify pixel values
assert torch.allclose(pixel_values, original_pixel_values.to(pixel_values.device))

# make sure processor creates exact same pixel values
# assert torch.allclose(pixel_values, original_pixel_values.to(pixel_values.device))

Expand Down