Skip to content

Commit

Permalink
Merge branch 'huggingface:main' into custom-separator
Browse files Browse the repository at this point in the history
  • Loading branch information
beep-bebop authored Aug 28, 2024
2 parents a3a354c + c35d2cc commit 07c6db0
Show file tree
Hide file tree
Showing 128 changed files with 8,124 additions and 423 deletions.
3 changes: 0 additions & 3 deletions .github/workflows/self-scheduled-caller.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,6 @@ name: Self-hosted runner (scheduled)


on:
repository_dispatch:
schedule:
- cron: "17 2 * * *"
push:
branches:
- run_scheduled_ci*
Expand Down
2 changes: 1 addition & 1 deletion docker/consistency.dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ ARG REF=main
RUN apt-get update && apt-get install -y time git g++ pkg-config make git-lfs
ENV UV_PYTHON=/usr/local/bin/python
RUN pip install uv && uv venv && uv pip install --no-cache-dir -U pip setuptools GitPython
RUN pip install --no-cache-dir --upgrade 'torch' 'torchaudio' --index-url https://download.pytorch.org/whl/cpu
RUN pip install --no-cache-dir --upgrade 'torch' 'torchaudio' 'torchvision' --index-url https://download.pytorch.org/whl/cpu
# tensorflow pin matching setup.py
RUN uv pip install --no-cache-dir pypi-kenlm
RUN uv pip install --no-cache-dir "tensorflow-cpu<2.16" "tf-keras<2.16"
Expand Down
2 changes: 1 addition & 1 deletion docker/transformers-pytorch-deepspeed-amd-gpu/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ RUN apt update && \
apt clean && \
rm -rf /var/lib/apt/lists/*

RUN python3 -m pip install --no-cache-dir --upgrade pip ninja "pydantic<2"
RUN python3 -m pip install --no-cache-dir --upgrade pip ninja "pydantic>=2.0.0"
RUN python3 -m pip uninstall -y apex torch torchvision torchaudio
RUN python3 -m pip install torch==$PYTORCH torchvision==$TORCH_VISION torchaudio==$TORCH_AUDIO --index-url https://download.pytorch.org/whl/rocm$ROCM --no-cache-dir

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,5 +49,5 @@ RUN DS_BUILD_CPU_ADAM=1 DS_BUILD_FUSED_ADAM=1 python3 -m pip install deepspeed -
RUN cd transformers && python3 setup.py develop

# The base image ships with `pydantic==1.8.2` which is not working - i.e. the next command fails
RUN python3 -m pip install -U --no-cache-dir "pydantic<2"
RUN python3 -m pip install -U --no-cache-dir "pydantic>=2.0.0"
RUN python3 -c "from deepspeed.launcher.runner import main"
4 changes: 4 additions & 0 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -412,6 +412,8 @@
title: GPTSAN Japanese
- local: model_doc/gpt-sw3
title: GPTSw3
- local: model_doc/granite
title: Granite
- local: model_doc/herbert
title: HerBERT
- local: model_doc/ibert
Expand Down Expand Up @@ -514,6 +516,8 @@
title: Qwen2Audio
- local: model_doc/qwen2_moe
title: Qwen2MoE
- local: model_doc/qwen2_vl
title: Qwen2VL
- local: model_doc/rag
title: RAG
- local: model_doc/realm
Expand Down
2 changes: 1 addition & 1 deletion docs/source/en/custom_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -185,7 +185,7 @@ class ResnetModelForImageClassification(PreTrainedModel):
def forward(self, tensor, labels=None):
logits = self.model(tensor)
if labels is not None:
loss = torch.nn.cross_entropy(logits, labels)
loss = torch.nn.functional.cross_entropy(logits, labels)
return {"loss": loss, "logits": logits}
return {"logits": logits}
```
Expand Down
2 changes: 2 additions & 0 deletions docs/source/en/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,7 @@ Flax), PyTorch, and/or TensorFlow.
| [GPT-Sw3](model_doc/gpt-sw3) ||||
| [GPTBigCode](model_doc/gpt_bigcode) ||||
| [GPTSAN-japanese](model_doc/gptsan-japanese) ||||
| [Granite](model_doc/granite) ||||
| [Graphormer](model_doc/graphormer) ||||
| [Grounding DINO](model_doc/grounding-dino) ||||
| [GroupViT](model_doc/groupvit) ||||
Expand Down Expand Up @@ -260,6 +261,7 @@ Flax), PyTorch, and/or TensorFlow.
| [Qwen2](model_doc/qwen2) ||||
| [Qwen2Audio](model_doc/qwen2_audio) ||||
| [Qwen2MoE](model_doc/qwen2_moe) ||||
| [Qwen2VL](model_doc/qwen2_vl) ||||
| [RAG](model_doc/rag) ||||
| [REALM](model_doc/realm) ||||
| [RecurrentGemma](model_doc/recurrent_gemma) ||||
Expand Down
15 changes: 14 additions & 1 deletion docs/source/en/model_doc/blip-2.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,4 +87,17 @@ If you're interested in submitting a resource to be included here, please feel f

[[autodoc]] Blip2ForConditionalGeneration
- forward
- generate
- generate

## Blip2ForImageTextRetrieval

[[autodoc]] Blip2ForImageTextRetrieval
- forward

## Blip2TextModelWithProjection

[[autodoc]] Blip2TextModelWithProjection

## Blip2VisionModelWithProjection

[[autodoc]] Blip2VisionModelWithProjection
74 changes: 74 additions & 0 deletions docs/source/en/model_doc/granite.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->

# Granite

## Overview

The Granite model was proposed in [Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler](https://arxiv.org/abs/2408.13359) by Yikang Shen, Matthew Stallone, Mayank Mishra, Gaoyuan Zhang, Shawn Tan, Aditya Prasad, Adriana Meza Soria, David D. Cox and Rameswar Panda.

PowerLM-3B is a 3B state-of-the-art small language model trained with the Power learning rate scheduler. It is trained on a wide range of open-source and synthetic datasets with permissive licenses. PowerLM-3B has shown promising results compared to other models in the size categories across various benchmarks, including natural language multi-choices, code generation, and math reasoning.

The abstract from the paper is the following:

*Finding the optimal learning rate for language model pretraining is a challenging task.
This is not only because there is a complicated correlation between learning rate, batch size, number of training tokens, model size, and other hyperparameters but also because it is prohibitively expensive to perform a hyperparameter search for large language models with Billions or Trillions of parameters. Recent studies propose using small proxy models and small corpus to perform hyperparameter searches and transposing the optimal parameters to large models and large corpus. While the zero-shot transferability is theoretically and empirically proven for model size related hyperparameters, like depth and width, the zero-shot transfer from small corpus to large corpus is underexplored.
In this paper, we study the correlation between optimal learning rate, batch size, and number of training tokens for the recently proposed WSD scheduler. After thousands of small experiments, we found a power-law relationship between variables and demonstrated its transferability across model sizes. Based on the observation, we propose a new learning rate scheduler, Power scheduler, that is agnostic about the number of training tokens and batch size. The experiment shows that combining the Power scheduler with Maximum Update Parameterization (\mup) can consistently achieve impressive performance with one set of hyperparameters regardless of the number of training tokens, batch size, model size, and even model architecture. Our 3B dense and MoE models trained with the Power scheduler achieve comparable performance as state-of-the-art small language models.
We [open source](https://huggingface.co/collections/ibm/power-lm-66be64ae647ddf11b9808000) these pretrained models.*

Tips:

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = "ibm/PowerLM-3b"
tokenizer = AutoTokenizer.from_pretrained(model_path)

# drop device_map if running on CPU
model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto")
model.eval()

# change input text as desired
prompt = "Write a code to find the maximum value in a list of numbers."

# tokenize the text
input_tokens = tokenizer(prompt, return_tensors="pt")
# generate output tokens
output = model.generate(**input_tokens, max_new_tokens=100)
# decode output tokens into text
output = tokenizer.batch_decode(output)
# loop over the batch to print, in this example the batch size is 1
for i in output:
print(i)
```

This model was contributed by [mayank-mishra](https://huggingface.co/mayank-mishra).


## GraniteConfig

[[autodoc]] GraniteConfig

## GraniteModel

[[autodoc]] GraniteModel
- forward

## GraniteForCausalLM

[[autodoc]] GraniteForCausalLM
- forward
4 changes: 2 additions & 2 deletions docs/source/en/model_doc/mamba2.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,11 +39,11 @@ The original code can be found [here](https://github.com/state-spaces/mamba).

### A simple generation example:
```python
from transformers import MambaConfig, MambaForCausalLM, AutoTokenizer
from transformers import Mamba2Config, Mamba2ForCausalLM, AutoTokenizer
import torch
model_id = 'mistralai/Mamba-Codestral-7B-v0.1'
tokenizer = AutoTokenizer.from_pretrained(model_id, revision='refs/pr/9', from_slow=True, legacy=False)
model = MambaForCausalLM.from_pretrained(model_id, revision='refs/pr/9')
model = Mamba2ForCausalLM.from_pretrained(model_id, revision='refs/pr/9')
input_ids = tokenizer("Hey how are you doing?", return_tensors= "pt")["input_ids"]

out = model.generate(input_ids, max_new_tokens=10)
Expand Down
Loading

0 comments on commit 07c6db0

Please sign in to comment.