Skip to content

Commit 6cd9ba3

Browse files
authored
Merge branch 'main' into add-qwen2moe-modelcard
2 parents cb1537a + 3fb7e7b commit 6cd9ba3

File tree

9 files changed

+26
-231
lines changed

9 files changed

+26
-231
lines changed

docker/transformers-all-latest-gpu/Dockerfile

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,6 @@ SHELL ["sh", "-lc"]
1010
# to be used as arguments for docker build (so far).
1111

1212
ARG PYTORCH='2.6.0'
13-
# (not always a valid torch version)
14-
ARG INTEL_TORCH_EXT='2.3.0'
1513
# Example: `cu102`, `cu113`, etc.
1614
ARG CUDA='cu121'
1715
# Disable kernel mapping for now until all tests pass
@@ -32,8 +30,6 @@ RUN python3 -m pip install --no-cache-dir -e ./transformers[dev,onnxruntime] &&
3230

3331
RUN python3 -m pip uninstall -y flax jax
3432

35-
RUN python3 -m pip install --no-cache-dir intel_extension_for_pytorch==$INTEL_TORCH_EXT -f https://developer.intel.com/ipex-whl-stable-cpu
36-
3733
RUN python3 -m pip install --no-cache-dir git+https://github.com/facebookresearch/detectron2.git pytesseract
3834
RUN python3 -m pip install -U "itsdangerous<2.1.0"
3935

docs/source/en/perf_infer_cpu.md

Lines changed: 0 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -78,26 +78,3 @@ python examples/pytorch/question-answering/run_qa.py \
7878
--no_cuda \
7979
--jit_mode_eval
8080
```
81-
82-
## IPEX
83-
84-
[Intel Extension for PyTorch](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/getting_started.html) (IPEX) offers additional optimizations for PyTorch on Intel CPUs. IPEX further optimizes TorchScript with [graph optimization](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/features/graph_optimization.html) which fuses operations like Multi-head attention, Concat Linear, Linear + Add, Linear + Gelu, Add + LayerNorm, and more, into single kernels for faster execution.
85-
86-
Make sure IPEX is installed, and set the `--use_opex` and `--jit_mode_eval` flags in [`Trainer`] to enable IPEX graph optimization and TorchScript.
87-
88-
```bash
89-
!pip install intel_extension_for_pytorch
90-
```
91-
92-
```bash
93-
python examples/pytorch/question-answering/run_qa.py \
94-
--model_name_or_path csarron/bert-base-uncased-squad-v1 \
95-
--dataset_name squad \
96-
--do_eval \
97-
--max_seq_length 384 \
98-
--doc_stride 128 \
99-
--output_dir /tmp/ \
100-
--no_cuda \
101-
--use_ipex \
102-
--jit_mode_eval
103-
```

docs/source/en/perf_train_cpu.md

Lines changed: 2 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -17,30 +17,9 @@ rendered properly in your Markdown viewer.
1717

1818
A modern CPU is capable of efficiently training large models by leveraging the underlying optimizations built into the hardware and training on fp16 or bf16 data types.
1919

20-
This guide focuses on how to train large models on an Intel CPU using mixed precision and the [Intel Extension for PyTorch (IPEX)](https://intel.github.io/intel-extension-for-pytorch/index.html) library.
20+
This guide focuses on how to train large models on an Intel CPU using mixed precision. AMP is enabled for CPU backends training with PyTorch.
2121

22-
You can Find your PyTorch version by running the command below.
23-
24-
```bash
25-
pip list | grep torch
26-
```
27-
28-
Install IPEX with the PyTorch version from above.
29-
30-
```bash
31-
pip install intel_extension_for_pytorch==<version_name> -f https://developer.intel.com/ipex-whl-stable-cpu
32-
```
33-
34-
> [!TIP]
35-
> Refer to the IPEX [installation](https://intel.github.io/intel-extension-for-pytorch/index.html#installation) guide for more details.
36-
37-
IPEX provides additional performance optimizations for Intel CPUs. These include additional CPU instruction level architecture (ISA) support such as [Intel AVX512-VNNI](https://en.wikichip.org/wiki/x86/avx512_vnni) and [Intel AMX](https://www.intel.com/content/www/us/en/products/docs/accelerator-engines/what-is-intel-amx.html). Both of these features are designed to accelerate matrix multiplication. Older AMD and Intel CPUs with only Intel AVX2, however, aren't guaranteed better performance with IPEX.
38-
39-
IPEX also supports [Auto Mixed Precision (AMP)](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/features/amp.html) training with the fp16 and bf16 data types. Reducing precision speeds up training and reduces memory usage because it requires less computation. The loss in accuracy from using full-precision is minimal. 3rd, 4th, and 5th generation Intel Xeon Scalable processors natively support bf16, and the 6th generation processor also natively supports fp16 in addition to bf16.
40-
41-
AMP is enabled for CPU backends training with PyTorch.
42-
43-
[`Trainer`] supports AMP training with a CPU by adding the `--use_cpu`, `--use_ipex`, and `--bf16` parameters. The example below demonstrates the [run_qa.py](https://github.com/huggingface/transformers/tree/main/examples/pytorch/question-answering) script.
22+
[`Trainer`] supports AMP training with CPU by adding the `--use_cpu`, and `--bf16` parameters. The example below demonstrates the [run_qa.py](https://github.com/huggingface/transformers/tree/main/examples/pytorch/question-answering) script.
4423

4524
```bash
4625
python run_qa.py \
@@ -54,7 +33,6 @@ python run_qa.py \
5433
--max_seq_length 384 \
5534
--doc_stride 128 \
5635
--output_dir /tmp/debug_squad/ \
57-
--use_ipex \
5836
--bf16 \
5937
--use_cpu
6038
```
@@ -65,7 +43,6 @@ These parameters can also be added to [`TrainingArguments`] as shown below.
6543
training_args = TrainingArguments(
6644
output_dir="./outputs",
6745
bf16=True,
68-
use_ipex=True,
6946
use_cpu=True,
7047
)
7148
```

docs/source/en/perf_train_cpu_many.md

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -75,8 +75,7 @@ python3 run_qa.py \
7575
--doc_stride 128 \
7676
--output_dir /tmp/debug_squad/ \
7777
--no_cuda \
78-
--ddp_backend ccl \
79-
--use_ipex
78+
--ddp_backend ccl
8079
```
8180

8281
</hfoption>
@@ -115,7 +114,6 @@ python3 run_qa.py \
115114
--output_dir /tmp/debug_squad/ \
116115
--no_cuda \
117116
--ddp_backend ccl \
118-
--use_ipex \
119117
--bf16
120118
```
121119

@@ -201,8 +199,7 @@ spec:
201199
--output_dir /tmp/pvc-mount/output_$(date +%Y%m%d_%H%M%S) \
202200
--no_cuda \
203201
--ddp_backend ccl \
204-
--bf16 \
205-
--use_ipex;
202+
--bf16;
206203
env:
207204
- name: LD_PRELOAD
208205
value: "/usr/lib/x86_64-linux-gnu/libtcmalloc.so.4.5.9:/usr/local/lib/libiomp5.so"

src/transformers/trainer.py

Lines changed: 0 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -157,7 +157,6 @@
157157
is_galore_torch_available,
158158
is_grokadamw_available,
159159
is_in_notebook,
160-
is_ipex_available,
161160
is_liger_kernel_available,
162161
is_lomo_available,
163162
is_peft_available,
@@ -1916,29 +1915,6 @@ def torch_jit_model_eval(self, model, dataloader, training=False):
19161915

19171916
return model
19181917

1919-
def ipex_optimize_model(self, model, training=False, dtype=torch.float32):
1920-
if not is_ipex_available():
1921-
raise ImportError(
1922-
"Using IPEX but IPEX is not installed or IPEX's version does not match current PyTorch, please refer"
1923-
" to https://github.com/intel/intel-extension-for-pytorch."
1924-
)
1925-
1926-
import intel_extension_for_pytorch as ipex
1927-
1928-
if not training:
1929-
model.eval()
1930-
dtype = torch.bfloat16 if not self.is_in_train and self.args.bf16_full_eval else dtype
1931-
# conv_bn_folding is disabled as it fails in symbolic tracing, resulting in ipex warnings
1932-
model = ipex.optimize(model, dtype=dtype, level="O1", conv_bn_folding=False, inplace=not self.is_in_train)
1933-
else:
1934-
if not model.training:
1935-
model.train()
1936-
model, self.optimizer = ipex.optimize(
1937-
model, dtype=dtype, optimizer=self.optimizer, inplace=True, level="O1"
1938-
)
1939-
1940-
return model
1941-
19421918
def compare_trainer_and_checkpoint_args(self, training_args, trainer_state):
19431919
attributes_map = {
19441920
"logging_steps": "logging_steps",
@@ -1968,10 +1944,6 @@ def compare_trainer_and_checkpoint_args(self, training_args, trainer_state):
19681944
logger.warning_once(warning_str)
19691945

19701946
def _wrap_model(self, model, training=True, dataloader=None):
1971-
if self.args.use_ipex:
1972-
dtype = torch.bfloat16 if self.use_cpu_amp else torch.float32
1973-
model = self.ipex_optimize_model(model, training, dtype=dtype)
1974-
19751947
if is_sagemaker_mp_enabled():
19761948
# Wrapping the base model twice in a DistributedModel will raise an error.
19771949
if isinstance(self.model_wrapped, smp.model.DistributedModel):

src/transformers/training_args.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1581,6 +1581,12 @@ def __post_init__(self):
15811581
FutureWarning,
15821582
)
15831583
self.use_cpu = self.no_cuda
1584+
if self.use_ipex:
1585+
warnings.warn(
1586+
"using `use_ipex` is deprecated and will be removed in version 4.54 of 🤗 Transformers. "
1587+
"You only need PyTorch for the needed optimizations on Intel CPU and XPU.",
1588+
FutureWarning,
1589+
)
15841590

15851591
self.eval_strategy = IntervalStrategy(self.eval_strategy)
15861592
self.logging_strategy = IntervalStrategy(self.logging_strategy)

tests/models/canine/test_modeling_canine.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -241,6 +241,14 @@ def setUp(self):
241241
# we set has_text_modality to False as the config has no vocab_size attribute
242242
self.config_tester = ConfigTester(self, config_class=CanineConfig, has_text_modality=False, hidden_size=37)
243243

244+
@unittest.skip("failing. Will fix only when the community opens an issue for it.")
245+
def test_torchscript_output_hidden_state(self):
246+
pass
247+
248+
@unittest.skip("failing. Will fix only when the community opens an issue for it.")
249+
def test_torchscript_simple(self):
250+
pass
251+
244252
def test_config(self):
245253
self.config_tester.run_common_tests()
246254

tests/models/moonshine/test_modeling_moonshine.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -150,6 +150,14 @@ def setUp(self):
150150
self.model_tester = MoonshineModelTester(self)
151151
self.config_tester = ConfigTester(self, config_class=MoonshineConfig)
152152

153+
@unittest.skip("failing. Will fix only when the community opens an issue for it.")
154+
def test_torchscript_output_hidden_state(self):
155+
pass
156+
157+
@unittest.skip("failing. Will fix only when the community opens an issue for it.")
158+
def test_torchscript_simple(self):
159+
pass
160+
153161
def test_config(self):
154162
self.config_tester.run_common_tests()
155163

tests/trainer/test_trainer.py

Lines changed: 0 additions & 146 deletions
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,6 @@
7979
require_deepspeed,
8080
require_galore_torch,
8181
require_grokadamw,
82-
require_intel_extension_for_pytorch,
8382
require_liger_kernel,
8483
require_lomo,
8584
require_non_hpu,
@@ -1325,37 +1324,6 @@ def test_number_of_steps_in_training(self):
13251324
train_output = trainer.train()
13261325
self.assertEqual(train_output.global_step, 10)
13271326

1328-
@require_torch_bf16
1329-
@require_intel_extension_for_pytorch
1330-
def test_number_of_steps_in_training_with_ipex(self):
1331-
for mix_bf16 in [True, False]:
1332-
tmp_dir = self.get_auto_remove_tmp_dir()
1333-
# Regular training has n_epochs * len(train_dl) steps
1334-
trainer = get_regression_trainer(
1335-
learning_rate=0.1, use_ipex=True, bf16=mix_bf16, use_cpu=True, output_dir=tmp_dir
1336-
)
1337-
train_output = trainer.train()
1338-
self.assertEqual(train_output.global_step, self.n_epochs * 64 / trainer.args.train_batch_size)
1339-
1340-
# Check passing num_train_epochs works (and a float version too):
1341-
trainer = get_regression_trainer(
1342-
learning_rate=0.1,
1343-
num_train_epochs=1.5,
1344-
use_ipex=True,
1345-
bf16=mix_bf16,
1346-
use_cpu=True,
1347-
output_dir=tmp_dir,
1348-
)
1349-
train_output = trainer.train()
1350-
self.assertEqual(train_output.global_step, int(1.5 * 64 / trainer.args.train_batch_size))
1351-
1352-
# If we pass a max_steps, num_train_epochs is ignored
1353-
trainer = get_regression_trainer(
1354-
learning_rate=0.1, max_steps=10, use_ipex=True, bf16=mix_bf16, use_cpu=True, output_dir=tmp_dir
1355-
)
1356-
train_output = trainer.train()
1357-
self.assertEqual(train_output.global_step, 10)
1358-
13591327
def test_torch_compile_loss_func_compatibility(self):
13601328
config = LlamaConfig(vocab_size=100, hidden_size=32, num_hidden_layers=3, num_attention_heads=4)
13611329
tiny_llama = LlamaForCausalLM(config)
@@ -2628,69 +2596,6 @@ def test_evaluate_with_jit(self):
26282596
expected_acc = AlmostAccuracy()((pred + 1, y))["accuracy"]
26292597
self.assertAlmostEqual(results["eval_accuracy"], expected_acc)
26302598

2631-
@require_torch_bf16
2632-
@require_intel_extension_for_pytorch
2633-
def test_evaluate_with_ipex(self):
2634-
for mix_bf16 in [True, False]:
2635-
with tempfile.TemporaryDirectory() as tmp_dir:
2636-
trainer = get_regression_trainer(
2637-
a=1.5,
2638-
b=2.5,
2639-
use_ipex=True,
2640-
compute_metrics=AlmostAccuracy(),
2641-
bf16=mix_bf16,
2642-
use_cpu=True,
2643-
output_dir=tmp_dir,
2644-
)
2645-
results = trainer.evaluate()
2646-
2647-
x, y = trainer.eval_dataset.x, trainer.eval_dataset.ys[0]
2648-
pred = 1.5 * x + 2.5
2649-
expected_loss = ((pred - y) ** 2).mean()
2650-
self.assertAlmostEqual(results["eval_loss"], expected_loss)
2651-
expected_acc = AlmostAccuracy()((pred, y))["accuracy"]
2652-
self.assertAlmostEqual(results["eval_accuracy"], expected_acc)
2653-
2654-
# With a number of elements not a round multiple of the batch size
2655-
trainer = get_regression_trainer(
2656-
a=1.5,
2657-
b=2.5,
2658-
use_ipex=True,
2659-
eval_len=66,
2660-
compute_metrics=AlmostAccuracy(),
2661-
bf16=mix_bf16,
2662-
use_cpu=True,
2663-
output_dir=tmp_dir,
2664-
)
2665-
results = trainer.evaluate()
2666-
2667-
x, y = trainer.eval_dataset.x, trainer.eval_dataset.ys[0]
2668-
pred = 1.5 * x + 2.5
2669-
expected_loss = ((pred - y) ** 2).mean()
2670-
self.assertAlmostEqual(results["eval_loss"], expected_loss)
2671-
expected_acc = AlmostAccuracy()((pred, y))["accuracy"]
2672-
self.assertAlmostEqual(results["eval_accuracy"], expected_acc)
2673-
2674-
# With logits preprocess
2675-
trainer = get_regression_trainer(
2676-
a=1.5,
2677-
b=2.5,
2678-
use_ipex=True,
2679-
compute_metrics=AlmostAccuracy(),
2680-
preprocess_logits_for_metrics=lambda logits, labels: logits + 1,
2681-
bf16=mix_bf16,
2682-
use_cpu=True,
2683-
output_dir=tmp_dir,
2684-
)
2685-
results = trainer.evaluate()
2686-
2687-
x, y = trainer.eval_dataset.x, trainer.eval_dataset.ys[0]
2688-
pred = 1.5 * x + 2.5
2689-
expected_loss = ((pred - y) ** 2).mean()
2690-
self.assertAlmostEqual(results["eval_loss"], expected_loss)
2691-
expected_acc = AlmostAccuracy()((pred + 1, y))["accuracy"]
2692-
self.assertAlmostEqual(results["eval_accuracy"], expected_acc)
2693-
26942599
def test_predict(self):
26952600
with tempfile.TemporaryDirectory() as tmp_dir:
26962601
trainer = get_regression_trainer(a=1.5, b=2.5, output_dir=tmp_dir)
@@ -2830,57 +2735,6 @@ def test_predict_with_jit(self):
28302735
self.assertTrue(np.array_equal(labels[0], trainer.eval_dataset.ys[0]))
28312736
self.assertTrue(np.array_equal(labels[1], trainer.eval_dataset.ys[1]))
28322737

2833-
@require_torch_bf16
2834-
@require_intel_extension_for_pytorch
2835-
def test_predict_with_ipex(self):
2836-
for mix_bf16 in [True, False]:
2837-
with tempfile.TemporaryDirectory() as tmp_dir:
2838-
trainer = get_regression_trainer(
2839-
a=1.5, b=2.5, use_ipex=True, bf16=mix_bf16, use_cpu=True, output_dir=tmp_dir
2840-
)
2841-
preds = trainer.predict(trainer.eval_dataset).predictions
2842-
x = trainer.eval_dataset.x
2843-
self.assertTrue(np.allclose(preds, 1.5 * x + 2.5))
2844-
2845-
# With a number of elements not a round multiple of the batch size
2846-
trainer = get_regression_trainer(
2847-
a=1.5, b=2.5, eval_len=66, use_ipex=True, bf16=mix_bf16, use_cpu=True, output_dir=tmp_dir
2848-
)
2849-
preds = trainer.predict(trainer.eval_dataset).predictions
2850-
x = trainer.eval_dataset.x
2851-
self.assertTrue(np.allclose(preds, 1.5 * x + 2.5))
2852-
2853-
# With more than one output of the model
2854-
trainer = get_regression_trainer(
2855-
a=1.5, b=2.5, double_output=True, use_ipex=True, bf16=mix_bf16, use_cpu=True, output_dir=tmp_dir
2856-
)
2857-
preds = trainer.predict(trainer.eval_dataset).predictions
2858-
x = trainer.eval_dataset.x
2859-
self.assertEqual(len(preds), 2)
2860-
self.assertTrue(np.allclose(preds[0], 1.5 * x + 2.5))
2861-
self.assertTrue(np.allclose(preds[1], 1.5 * x + 2.5))
2862-
2863-
# With more than one output/label of the model
2864-
trainer = get_regression_trainer(
2865-
a=1.5,
2866-
b=2.5,
2867-
double_output=True,
2868-
label_names=["labels", "labels_2"],
2869-
use_ipex=True,
2870-
bf16=mix_bf16,
2871-
use_cpu=True,
2872-
output_dir=tmp_dir,
2873-
)
2874-
outputs = trainer.predict(trainer.eval_dataset)
2875-
preds = outputs.predictions
2876-
labels = outputs.label_ids
2877-
x = trainer.eval_dataset.x
2878-
self.assertEqual(len(preds), 2)
2879-
self.assertTrue(np.allclose(preds[0], 1.5 * x + 2.5))
2880-
self.assertTrue(np.allclose(preds[1], 1.5 * x + 2.5))
2881-
self.assertTrue(np.array_equal(labels[0], trainer.eval_dataset.ys[0]))
2882-
self.assertTrue(np.array_equal(labels[1], trainer.eval_dataset.ys[1]))
2883-
28842738
def test_dynamic_shapes(self):
28852739
eval_dataset = DynamicShapesDataset(batch_size=self.batch_size)
28862740
model = RegressionModel(a=2, b=1)

0 commit comments

Comments
 (0)