Description
Hi guys,
I'm trying to use the cpu_offload
function of DeepSpeed
integration with HuggingFace
's Trainer
integration on a single GPU on AWS Sagemaker (ml.p2.xlarge
instance). However I've been struggling for quite some time to get it to work properly. Here are the current versions I'm using:
CUDA: 10.1 (V10.1.243)
transformers: 4.6.0
pytorch: 1.7.1
deepspeed: 0.3.16 / 0.4.0 (master)
OS: Amazon Linux AMI 2018.03 (x86_64)
gcc/g++/c++: (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28)
I've taken a look at similar issues (#889, #694, #885) but haven't had any success so far. So far I have tried:
- Changing the versions of the pytorch, deepseed and transformers libraries
- Pre-building the ops of deepspeed (
DS_BUILD_OPS=1
andDS_BUILD_CPU_ADAM=1
) - Installing DeepSpeed and Trasnformers from source
Here is a simplified version of the code I'm running on Sagemaker:
os.environ['MASTER_ADDR'] = 'localhost'
os.environ['MASTER_PORT'] = '9993' # modify if RuntimeError: Address already in use
os.environ['RANK'] = "0"
os.environ['LOCAL_RANK'] = "0"
os.environ['WORLD_SIZE'] = "1"
# Training Args
MAX_LEN = 512
TRAIN_BATCH_SIZE = 8
VAL_BATCH_SIZE = 8
EPOCHS = 1
LEARNING_RATE = 1e-05
args = TrainingArguments(
output_dir = "../flujo_nlp/outputs/",
overwrite_output_dir = True,
per_device_train_batch_size = TRAIN_BATCH_SIZE,
per_device_eval_batch_size = VAL_BATCH_SIZE,
learning_rate = LEARNING_RATE,
weight_decay = 0.01,
max_grad_norm = 1.0,
num_train_epochs = EPOCHS, # Si esta el modo max_steps entonces ese se toma para entrenar el modelo
max_steps = 2000, # 2000
evaluation_strategy = "steps",
eval_steps = 200, # 200
lr_scheduler_type = 'linear',
warmup_ratio = 0.0,
warmup_steps = 0,
logging_dir = "../flujo_nlp/logs/",
logging_strategy = 'steps',
logging_steps = 200,
seed = 42,
fp16 = False,
dataloader_drop_last = False,
dataloader_num_workers = 0,
label_names = ["labels"],
load_best_model_at_end = True,
metric_for_best_model = "eval_loss",
greater_is_better = False,
ignore_data_skip = False,
deepspeed = "deepspeed_config_1gpu.json"
)
cbks = [
EarlyStoppingCallback(early_stopping_patience = 2, early_stopping_threshold = 0),
PrinterCallback()
]
# Trainer
trainer = MultilabelTrainer(
num_labels = n_labels,
loss_fct = loss,
model = TextClassifier(model_dict["model_path"], n_labels, loss, n_extra_layers = n_extra_layers),
args = args,
train_dataset = train_dataset,
eval_dataset = val_dataset,
compute_metrics = compute_metrics_fct,
callbacks = cbks
)
train_output = trainer.train()
My json config file:
{
"fp16": {
"enabled": "auto",
"loss_scale": 0,
"loss_scale_window": 1000,
"initial_scale_power": 16,
"hysteresis": 2,
"min_loss_scale": 1
},
"optimizer": {
"type": "AdamW",
"params": {
"lr": "auto",
"betas": "auto",
"eps": "auto",
"weight_decay": "auto"
}
},
"scheduler": {
"type": "WarmupLR",
"params": {
"warmup_min_lr": "auto",
"warmup_max_lr": "auto",
"warmup_num_steps": "auto"
}
},
"zero_optimization": {
"stage": 2,
"allgather_partitions": true,
"allgather_bucket_size": 2e8,
"overlap_comm": true,
"reduce_scatter": true,
"reduce_bucket_size": 2e8,
"contiguous_gradients": true,
"cpu_offload": true
},
"gradient_accumulation_steps": "auto",
"gradient_clipping": "auto",
"train_batch_size": "auto",
"steps_per_print": 2000,
"wall_clock_breakdown": false
}
The output of running ds_report
:
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
/bin/sh: line 0: type: llvm-config: not found
/bin/sh: line 0: type: llvm-config-9: not found
[WARNING] sparse_attn requires one of the following commands '['llvm-config', 'llvm-config-9']', but it does not exist!
sparse_attn ............ [NO] ....... [NO]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
[WARNING] async_io requires the libraries: ['libaio-dev'] but are missing. Can be fixed by: `apt install libaio-dev`.
async_io ............... [NO] ....... [NO]
transformer_inference .. [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/ec2-user/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/torch']
torch version .................... 1.7.1
torch cuda version ............... 10.1
nvcc version ..................... 10.1
deepspeed install path ........... ['/home/ec2-user/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/deepspeed']
deepspeed info ................... 0.4.0+11e94e6, 11e94e6, master
deepspeed wheel compiled w. ...... torch 1.7, cuda 10.1
This is the stack trace I get if I try to run the code without pre-building
[2021-06-02 21:06:31,700] [INFO] [logging.py:60:log_dist] [Rank 0] DeepSpeed info: version=0.4.0+11e94e6, git-hash=11e94e6, git-branch=master
[2021-06-02 21:06:31,707] [WARNING] [config.py:80:_sanity_check] DeepSpeedConfig: cpu_offload is deprecated. Please use offload_optimizer.
[2021-06-02 21:06:31,858] [INFO] [utils.py:13:_initialize_parameter_parallel_groups] data_parallel_size: 1, parameter_parallel_size: 1
[2021-06-02 21:06:31,968] [INFO] [engine.py:173:__init__] DeepSpeed Flops Profiler Enabled: False
Using /home/ec2-user/.cache/torch_extensions as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/ec2-user/.cache/torch_extensions/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
---------------------------------------------------------------------------
CalledProcessError Traceback (most recent call last)
~/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/torch/utils/cpp_extension.py in _run_ninja_build(build_directory, verbose, error_prefix)
1538 check=True,
-> 1539 env=env)
1540 else:
~/anaconda3/envs/pytorch_latest_p36/lib/python3.6/subprocess.py in run(input, timeout, check, *popenargs, **kwargs)
437 raise CalledProcessError(retcode, process.args,
--> 438 output=stdout, stderr=stderr)
439 return CompletedProcess(process.args, retcode, stdout, stderr)
CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
RuntimeError Traceback (most recent call last)
<ipython-input-13-4a7b77bf5678> in <module>
46 )
47
---> 48 train_output = trainer.train()
49
50 # Evalua
~/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/transformers/trainer.py in train(self, resume_from_checkpoint, trial, **kwargs)
1112 if args.deepspeed:
1113 deepspeed_engine, optimizer, lr_scheduler = deepspeed_init(
-> 1114 self, num_training_steps=max_steps, resume_from_checkpoint=resume_from_checkpoint
1115 )
1116 self.model = deepspeed_engine.module
~/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/transformers/integrations.py in deepspeed_init(trainer, num_training_steps, resume_from_checkpoint)
520 config_params=config,
521 optimizer=optimizer,
--> 522 lr_scheduler=lr_scheduler,
523 )
524
~/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/deepspeed/__init__.py in initialize(args, model, optimizer, model_parameters, training_data, lr_scheduler, mpu, dist_init_required, collate_fn, config, config_params)
134 collate_fn=collate_fn,
135 config=config,
--> 136 config_params=config_params)
137 else:
138 assert mpu is None, "mpu must be None with pipeline parallelism"
~/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/deepspeed/runtime/engine.py in __init__(self, args, model, optimizer, model_parameters, training_data, lr_scheduler, mpu, dist_init_required, collate_fn, config, config_params, dont_change_device)
185 self.lr_scheduler = None
186 if model_parameters or optimizer:
--> 187 self._configure_optimizer(optimizer, model_parameters)
188 self._configure_lr_scheduler(lr_scheduler)
189 self._report_progress(0)
~/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/deepspeed/runtime/engine.py in _configure_optimizer(self, client_optimizer, model_parameters)
687 logger.info('Using client Optimizer as basic optimizer')
688 else:
--> 689 basic_optimizer = self._configure_basic_optimizer(model_parameters)
690 if self.global_rank == 0:
691 logger.info(
~/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/deepspeed/runtime/engine.py in _configure_basic_optimizer(self, model_parameters)
758 optimizer = DeepSpeedCPUAdam(model_parameters,
759 **optimizer_parameters,
--> 760 adamw_mode=effective_adam_w_mode)
761 else:
762 from deepspeed.ops.adam import FusedAdam
~/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/deepspeed/ops/adam/cpu_adam.py in __init__(self, model_params, lr, bias_correction, betas, eps, weight_decay, amsgrad, adamw_mode)
76 DeepSpeedCPUAdam.optimizer_id = DeepSpeedCPUAdam.optimizer_id + 1
77 self.adam_w_mode = adamw_mode
---> 78 self.ds_opt_adam = CPUAdamBuilder().load()
79
80 self.ds_opt_adam.create_adam(self.opt_id,
~/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/deepspeed/ops/op_builder/builder.py in load(self, verbose)
214 return importlib.import_module(self.absolute_name())
215 else:
--> 216 return self.jit_load(verbose)
217
218 def jit_load(self, verbose=True):
~/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/deepspeed/ops/op_builder/builder.py in jit_load(self, verbose)
251 extra_cuda_cflags=self.nvcc_args(),
252 extra_ldflags=self.extra_ldflags(),
--> 253 verbose=verbose)
254 build_duration = time.time() - start_build
255 if verbose:
~/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/torch/utils/cpp_extension.py in load(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda, is_python_module, keep_intermediates)
995 with_cuda,
996 is_python_module,
--> 997 keep_intermediates=keep_intermediates)
998
999
~/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/torch/utils/cpp_extension.py in _jit_compile(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda, is_python_module, keep_intermediates)
1200 build_directory=build_directory,
1201 verbose=verbose,
-> 1202 with_cuda=with_cuda)
1203 finally:
1204 baton.release()
~/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/torch/utils/cpp_extension.py in _write_ninja_file_and_build_library(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda)
1298 build_directory,
1299 verbose,
-> 1300 error_prefix="Error building extension '{}'".format(name))
1301
1302
~/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/torch/utils/cpp_extension.py in _run_ninja_build(build_directory, verbose, error_prefix)
1553 if hasattr(error, 'output') and error.output: # type: ignore
1554 message += ": {}".format(error.output.decode()) # type: ignore
-> 1555 raise RuntimeError(message) from e
1556
1557
RuntimeError: Error building extension 'cpu_adam'
And finally, the stack trace I get if I try to pre-build while installing with DS_BUILD_CPU_ADAM=1 pip install deepspeed
:
Collecting deepspeed
Downloading deepspeed-0.3.16.tar.gz (385 kB)
|████████████████████████████████| 385 kB 19.1 MB/s eta 0:00:01
Requirement already satisfied: torch>=1.2 in /home/ec2-user/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages (from deepspeed) (1.7.1)
Requirement already satisfied: torchvision>=0.4.0 in /home/ec2-user/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages (from deepspeed) (0.8.2)
Requirement already satisfied: tqdm in /home/ec2-user/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages (from deepspeed) (4.61.0)
Collecting tensorboardX==1.8
Downloading tensorboardX-1.8-py2.py3-none-any.whl (216 kB)
|████████████████████████████████| 216 kB 46.3 MB/s eta 0:00:01
Collecting ninja
Downloading ninja-1.10.0.post2-py3-none-manylinux1_x86_64.whl (107 kB)
|████████████████████████████████| 107 kB 57.8 MB/s eta 0:00:01
Requirement already satisfied: numpy in /home/ec2-user/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages (from deepspeed) (1.19.2)
Requirement already satisfied: psutil in /home/ec2-user/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages (from deepspeed) (5.8.0)
Requirement already satisfied: protobuf>=3.2.0 in /home/ec2-user/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages (from tensorboardX==1.8->deepspeed) (3.15.8)
Requirement already satisfied: six in /home/ec2-user/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages (from tensorboardX==1.8->deepspeed) (1.15.0)
Requirement already satisfied: typing_extensions in /home/ec2-user/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages (from torch>=1.2->deepspeed) (3.7.4.3)
Requirement already satisfied: dataclasses in /home/ec2-user/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages (from torch>=1.2->deepspeed) (0.8)
Requirement already satisfied: pillow>=4.1.1 in /home/ec2-user/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages (from torchvision>=0.4.0->deepspeed) (8.1.0)
Building wheels for collected packages: deepspeed
Building wheel for deepspeed (setup.py) ... error
ERROR: Command errored out with exit status 1:
command: /home/ec2-user/anaconda3/envs/pytorch_latest_p36/bin/python3.6 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-mg0l8e9x/deepspeed_429da1b49e0440b5894b5291a4e649c0/setup.py'"'"'; __file__='"'"'/tmp/pip-install-mg0l8e9x/deepspeed_429da1b49e0440b5894b5291a4e649c0/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-p5cawauv
cwd: /tmp/pip-install-mg0l8e9x/deepspeed_429da1b49e0440b5894b5291a4e649c0/
Complete output (254 lines):
DS_BUILD_OPS=0
/bin/sh: line 0: type: llvm-config: not found
/bin/sh: line 0: type: llvm-config-9: not found
[WARNING] sparse_attn requires one of the following commands '['llvm-config', 'llvm-config-9']', but it does not exist!
[WARNING] async_io requires the libraries: ['libaio-dev'] but are missing.
/bin/sh: line 0: type: llvm-config: not found
/bin/sh: line 0: type: llvm-config-9: not found
[WARNING] sparse_attn requires one of the following commands '['llvm-config', 'llvm-config-9']', but it does not exist!
[WARNING] async_io requires the libraries: ['libaio-dev'] but are missing.
Install Ops={'cpu_adam': 1, 'fused_adam': False, 'fused_lamb': False, 'sparse_attn': False, 'transformer': False, 'stochastic_transformer': False, 'utils': False, 'async_io': False}
fatal: not a git repository (or any of the parent directories): .git
version=0.3.16, git_hash=unknown, git_branch=unknown
install_requires=['torch>=1.2', 'torchvision>=0.4.0', 'tqdm', 'tensorboardX==1.8', 'ninja', 'numpy', 'psutil']
compatible_ops={'cpu_adam': True, 'fused_adam': True, 'fused_lamb': True, 'sparse_attn': False, 'transformer': True, 'stochastic_transformer': True, 'utils': True, 'async_io': False}
ext_modules=[<setuptools.extension.Extension('deepspeed.ops.adam.cpu_adam_op') at 0x7f10bd356668>]
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.6
creating build/lib.linux-x86_64-3.6/deepspeed
copying deepspeed/__init__.py -> build/lib.linux-x86_64-3.6/deepspeed
copying deepspeed/constants.py -> build/lib.linux-x86_64-3.6/deepspeed
copying deepspeed/git_version_info_installed.py -> build/lib.linux-x86_64-3.6/deepspeed
copying deepspeed/git_version_info.py -> build/lib.linux-x86_64-3.6/deepspeed
copying deepspeed/env_report.py -> build/lib.linux-x86_64-3.6/deepspeed
creating build/lib.linux-x86_64-3.6/op_builder
copying op_builder/__init__.py -> build/lib.linux-x86_64-3.6/op_builder
copying op_builder/fused_lamb.py -> build/lib.linux-x86_64-3.6/op_builder
copying op_builder/transformer.py -> build/lib.linux-x86_64-3.6/op_builder
copying op_builder/utils.py -> build/lib.linux-x86_64-3.6/op_builder
copying op_builder/async_io.py -> build/lib.linux-x86_64-3.6/op_builder
copying op_builder/builder.py -> build/lib.linux-x86_64-3.6/op_builder
copying op_builder/fused_adam.py -> build/lib.linux-x86_64-3.6/op_builder
copying op_builder/cpu_adam.py -> build/lib.linux-x86_64-3.6/op_builder
copying op_builder/sparse_attn.py -> build/lib.linux-x86_64-3.6/op_builder
copying op_builder/stochastic_transformer.py -> build/lib.linux-x86_64-3.6/op_builder
creating build/lib.linux-x86_64-3.6/deepspeed/ops
copying deepspeed/ops/__init__.py -> build/lib.linux-x86_64-3.6/deepspeed/ops
copying deepspeed/ops/module_inject.py -> build/lib.linux-x86_64-3.6/deepspeed/ops
creating build/lib.linux-x86_64-3.6/deepspeed/module_inject
copying deepspeed/module_inject/__init__.py -> build/lib.linux-x86_64-3.6/deepspeed/module_inject
copying deepspeed/module_inject/inject.py -> build/lib.linux-x86_64-3.6/deepspeed/module_inject
copying deepspeed/module_inject/replace_module.py -> build/lib.linux-x86_64-3.6/deepspeed/module_inject
creating build/lib.linux-x86_64-3.6/deepspeed/utils
copying deepspeed/utils/__init__.py -> build/lib.linux-x86_64-3.6/deepspeed/utils
copying deepspeed/utils/zero_to_fp32.py -> build/lib.linux-x86_64-3.6/deepspeed/utils
copying deepspeed/utils/timer.py -> build/lib.linux-x86_64-3.6/deepspeed/utils
copying deepspeed/utils/logging.py -> build/lib.linux-x86_64-3.6/deepspeed/utils
copying deepspeed/utils/distributed.py -> build/lib.linux-x86_64-3.6/deepspeed/utils
creating build/lib.linux-x86_64-3.6/deepspeed/elasticity
copying deepspeed/elasticity/__init__.py -> build/lib.linux-x86_64-3.6/deepspeed/elasticity
copying deepspeed/elasticity/config.py -> build/lib.linux-x86_64-3.6/deepspeed/elasticity
copying deepspeed/elasticity/constants.py -> build/lib.linux-x86_64-3.6/deepspeed/elasticity
copying deepspeed/elasticity/elasticity.py -> build/lib.linux-x86_64-3.6/deepspeed/elasticity
creating build/lib.linux-x86_64-3.6/deepspeed/launcher
copying deepspeed/launcher/__init__.py -> build/lib.linux-x86_64-3.6/deepspeed/launcher
copying deepspeed/launcher/constants.py -> build/lib.linux-x86_64-3.6/deepspeed/launcher
copying deepspeed/launcher/launch.py -> build/lib.linux-x86_64-3.6/deepspeed/launcher
copying deepspeed/launcher/runner.py -> build/lib.linux-x86_64-3.6/deepspeed/launcher
copying deepspeed/launcher/multinode_runner.py -> build/lib.linux-x86_64-3.6/deepspeed/launcher
creating build/lib.linux-x86_64-3.6/deepspeed/pipe
copying deepspeed/pipe/__init__.py -> build/lib.linux-x86_64-3.6/deepspeed/pipe
creating build/lib.linux-x86_64-3.6/deepspeed/profiling
copying deepspeed/profiling/__init__.py -> build/lib.linux-x86_64-3.6/deepspeed/profiling
copying deepspeed/profiling/config.py -> build/lib.linux-x86_64-3.6/deepspeed/profiling
copying deepspeed/profiling/constants.py -> build/lib.linux-x86_64-3.6/deepspeed/profiling
creating build/lib.linux-x86_64-3.6/deepspeed/runtime
copying deepspeed/runtime/__init__.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime
copying deepspeed/runtime/config_utils.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime
copying deepspeed/runtime/dataloader.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime
copying deepspeed/runtime/engine.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime
copying deepspeed/runtime/config.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime
copying deepspeed/runtime/utils.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime
copying deepspeed/runtime/constants.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime
copying deepspeed/runtime/lr_schedules.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime
copying deepspeed/runtime/progressive_layer_drop.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime
copying deepspeed/runtime/csr_tensor.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime
creating build/lib.linux-x86_64-3.6/deepspeed/ops/sparse_attention
copying deepspeed/ops/sparse_attention/__init__.py -> build/lib.linux-x86_64-3.6/deepspeed/ops/sparse_attention
copying deepspeed/ops/sparse_attention/sparse_self_attention.py -> build/lib.linux-x86_64-3.6/deepspeed/ops/sparse_attention
copying deepspeed/ops/sparse_attention/matmul.py -> build/lib.linux-x86_64-3.6/deepspeed/ops/sparse_attention
copying deepspeed/ops/sparse_attention/softmax.py -> build/lib.linux-x86_64-3.6/deepspeed/ops/sparse_attention
copying deepspeed/ops/sparse_attention/bert_sparse_self_attention.py -> build/lib.linux-x86_64-3.6/deepspeed/ops/sparse_attention
copying deepspeed/ops/sparse_attention/sparsity_config.py -> build/lib.linux-x86_64-3.6/deepspeed/ops/sparse_attention
copying deepspeed/ops/sparse_attention/sparse_attention_utils.py -> build/lib.linux-x86_64-3.6/deepspeed/ops/sparse_attention
creating build/lib.linux-x86_64-3.6/deepspeed/ops/aio
copying deepspeed/ops/aio/__init__.py -> build/lib.linux-x86_64-3.6/deepspeed/ops/aio
creating build/lib.linux-x86_64-3.6/deepspeed/ops/transformer
copying deepspeed/ops/transformer/__init__.py -> build/lib.linux-x86_64-3.6/deepspeed/ops/transformer
copying deepspeed/ops/transformer/transformer.py -> build/lib.linux-x86_64-3.6/deepspeed/ops/transformer
creating build/lib.linux-x86_64-3.6/deepspeed/ops/lamb
copying deepspeed/ops/lamb/__init__.py -> build/lib.linux-x86_64-3.6/deepspeed/ops/lamb
copying deepspeed/ops/lamb/fused_lamb.py -> build/lib.linux-x86_64-3.6/deepspeed/ops/lamb
creating build/lib.linux-x86_64-3.6/deepspeed/ops/adam
copying deepspeed/ops/adam/__init__.py -> build/lib.linux-x86_64-3.6/deepspeed/ops/adam
copying deepspeed/ops/adam/fused_adam.py -> build/lib.linux-x86_64-3.6/deepspeed/ops/adam
copying deepspeed/ops/adam/multi_tensor_apply.py -> build/lib.linux-x86_64-3.6/deepspeed/ops/adam
copying deepspeed/ops/adam/cpu_adam.py -> build/lib.linux-x86_64-3.6/deepspeed/ops/adam
creating build/lib.linux-x86_64-3.6/deepspeed/ops/op_builder
copying deepspeed/ops/op_builder/__init__.py -> build/lib.linux-x86_64-3.6/deepspeed/ops/op_builder
copying deepspeed/ops/op_builder/fused_lamb.py -> build/lib.linux-x86_64-3.6/deepspeed/ops/op_builder
copying deepspeed/ops/op_builder/transformer.py -> build/lib.linux-x86_64-3.6/deepspeed/ops/op_builder
copying deepspeed/ops/op_builder/utils.py -> build/lib.linux-x86_64-3.6/deepspeed/ops/op_builder
copying deepspeed/ops/op_builder/async_io.py -> build/lib.linux-x86_64-3.6/deepspeed/ops/op_builder
copying deepspeed/ops/op_builder/builder.py -> build/lib.linux-x86_64-3.6/deepspeed/ops/op_builder
copying deepspeed/ops/op_builder/fused_adam.py -> build/lib.linux-x86_64-3.6/deepspeed/ops/op_builder
copying deepspeed/ops/op_builder/cpu_adam.py -> build/lib.linux-x86_64-3.6/deepspeed/ops/op_builder
copying deepspeed/ops/op_builder/sparse_attn.py -> build/lib.linux-x86_64-3.6/deepspeed/ops/op_builder
copying deepspeed/ops/op_builder/stochastic_transformer.py -> build/lib.linux-x86_64-3.6/deepspeed/ops/op_builder
creating build/lib.linux-x86_64-3.6/deepspeed/ops/sparse_attention/trsrc
copying deepspeed/ops/sparse_attention/trsrc/__init__.py -> build/lib.linux-x86_64-3.6/deepspeed/ops/sparse_attention/trsrc
creating build/lib.linux-x86_64-3.6/deepspeed/profiling/flops_profiler
copying deepspeed/profiling/flops_profiler/__init__.py -> build/lib.linux-x86_64-3.6/deepspeed/profiling/flops_profiler
copying deepspeed/profiling/flops_profiler/profiler.py -> build/lib.linux-x86_64-3.6/deepspeed/profiling/flops_profiler
creating build/lib.linux-x86_64-3.6/deepspeed/runtime/activation_checkpointing
copying deepspeed/runtime/activation_checkpointing/__init__.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime/activation_checkpointing
copying deepspeed/runtime/activation_checkpointing/config.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime/activation_checkpointing
copying deepspeed/runtime/activation_checkpointing/checkpointing.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime/activation_checkpointing
creating build/lib.linux-x86_64-3.6/deepspeed/runtime/zero
copying deepspeed/runtime/zero/__init__.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime/zero
copying deepspeed/runtime/zero/config.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime/zero
copying deepspeed/runtime/zero/stage3.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime/zero
copying deepspeed/runtime/zero/linear.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime/zero
copying deepspeed/runtime/zero/utils.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime/zero
copying deepspeed/runtime/zero/offload_config.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime/zero
copying deepspeed/runtime/zero/constants.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime/zero
copying deepspeed/runtime/zero/stage2.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime/zero
copying deepspeed/runtime/zero/contiguous_memory_allocator.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime/zero
copying deepspeed/runtime/zero/tiling.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime/zero
copying deepspeed/runtime/zero/test.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime/zero
copying deepspeed/runtime/zero/offload_constants.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime/zero
copying deepspeed/runtime/zero/stage1.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime/zero
copying deepspeed/runtime/zero/partition_parameters.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime/zero
creating build/lib.linux-x86_64-3.6/deepspeed/runtime/swap_tensor
copying deepspeed/runtime/swap_tensor/__init__.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime/swap_tensor
copying deepspeed/runtime/swap_tensor/aio_config.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime/swap_tensor
copying deepspeed/runtime/swap_tensor/utils.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime/swap_tensor
copying deepspeed/runtime/swap_tensor/pipelined_optimizer_swapper.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime/swap_tensor
copying deepspeed/runtime/swap_tensor/constants.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime/swap_tensor
copying deepspeed/runtime/swap_tensor/async_swapper.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime/swap_tensor
copying deepspeed/runtime/swap_tensor/optimizer_utils.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime/swap_tensor
copying deepspeed/runtime/swap_tensor/partitioned_optimizer_swapper.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime/swap_tensor
copying deepspeed/runtime/swap_tensor/partitioned_param_swapper.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime/swap_tensor
creating build/lib.linux-x86_64-3.6/deepspeed/runtime/pipe
copying deepspeed/runtime/pipe/__init__.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime/pipe
copying deepspeed/runtime/pipe/engine.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime/pipe
copying deepspeed/runtime/pipe/p2p.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime/pipe
copying deepspeed/runtime/pipe/schedule.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime/pipe
copying deepspeed/runtime/pipe/module.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime/pipe
copying deepspeed/runtime/pipe/topology.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime/pipe
creating build/lib.linux-x86_64-3.6/deepspeed/runtime/compression
copying deepspeed/runtime/compression/__init__.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime/compression
copying deepspeed/runtime/compression/cupy.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime/compression
creating build/lib.linux-x86_64-3.6/deepspeed/runtime/fp16
copying deepspeed/runtime/fp16/__init__.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime/fp16
copying deepspeed/runtime/fp16/unfused_optimizer.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime/fp16
copying deepspeed/runtime/fp16/fused_optimizer.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime/fp16
copying deepspeed/runtime/fp16/loss_scaler.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime/fp16
creating build/lib.linux-x86_64-3.6/deepspeed/runtime/comm
copying deepspeed/runtime/comm/__init__.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime/comm
copying deepspeed/runtime/comm/mpi.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime/comm
copying deepspeed/runtime/comm/nccl.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime/comm
creating build/lib.linux-x86_64-3.6/deepspeed/runtime/fp16/onebit
copying deepspeed/runtime/fp16/onebit/__init__.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime/fp16/onebit
copying deepspeed/runtime/fp16/onebit/adam.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime/fp16/onebit
copying deepspeed/runtime/fp16/onebit/lamb.py -> build/lib.linux-x86_64-3.6/deepspeed/runtime/fp16/onebit
running egg_info
writing deepspeed.egg-info/PKG-INFO
writing dependency_links to deepspeed.egg-info/dependency_links.txt
writing requirements to deepspeed.egg-info/requires.txt
writing top-level names to deepspeed.egg-info/top_level.txt
reading manifest file 'deepspeed.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching '*.cc' under directory 'deepspeed'
warning: no files found matching '*.tr' under directory 'csrc'
warning: no files found matching '*.cc' under directory 'csrc'
writing manifest file 'deepspeed.egg-info/SOURCES.txt'
creating build/lib.linux-x86_64-3.6/deepspeed/ops/csrc
creating build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/adam
copying deepspeed/ops/csrc/adam/compat.h -> build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/adam
copying deepspeed/ops/csrc/adam/cpu_adam.cpp -> build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/adam
copying deepspeed/ops/csrc/adam/custom_cuda_kernel.cu -> build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/adam
copying deepspeed/ops/csrc/adam/fused_adam_frontend.cpp -> build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/adam
copying deepspeed/ops/csrc/adam/multi_tensor_adam.cu -> build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/adam
copying deepspeed/ops/csrc/adam/multi_tensor_apply.cuh -> build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/adam
creating build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/aio
creating build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/aio/common
copying deepspeed/ops/csrc/aio/common/deepspeed_aio_common.cpp -> build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/aio/common
copying deepspeed/ops/csrc/aio/common/deepspeed_aio_common.h -> build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/aio/common
copying deepspeed/ops/csrc/aio/common/deepspeed_aio_types.cpp -> build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/aio/common
copying deepspeed/ops/csrc/aio/common/deepspeed_aio_types.h -> build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/aio/common
copying deepspeed/ops/csrc/aio/common/deepspeed_aio_utils.cpp -> build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/aio/common
copying deepspeed/ops/csrc/aio/common/deepspeed_aio_utils.h -> build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/aio/common
creating build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/aio/py_lib
copying deepspeed/ops/csrc/aio/py_lib/deepspeed_aio_thread.cpp -> build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/aio/py_lib
copying deepspeed/ops/csrc/aio/py_lib/deepspeed_aio_thread.h -> build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/aio/py_lib
copying deepspeed/ops/csrc/aio/py_lib/deepspeed_py_aio.cpp -> build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/aio/py_lib
copying deepspeed/ops/csrc/aio/py_lib/deepspeed_py_aio.h -> build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/aio/py_lib
copying deepspeed/ops/csrc/aio/py_lib/deepspeed_py_aio_handle.cpp -> build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/aio/py_lib
copying deepspeed/ops/csrc/aio/py_lib/deepspeed_py_aio_handle.h -> build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/aio/py_lib
copying deepspeed/ops/csrc/aio/py_lib/deepspeed_py_copy.cpp -> build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/aio/py_lib
copying deepspeed/ops/csrc/aio/py_lib/deepspeed_py_copy.h -> build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/aio/py_lib
copying deepspeed/ops/csrc/aio/py_lib/py_ds_aio.cpp -> build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/aio/py_lib
creating build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/includes
copying deepspeed/ops/csrc/includes/StopWatch.h -> build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/includes
copying deepspeed/ops/csrc/includes/Timer.h -> build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/includes
copying deepspeed/ops/csrc/includes/context.h -> build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/includes
copying deepspeed/ops/csrc/includes/cpu_adam.h -> build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/includes
copying deepspeed/ops/csrc/includes/cublas_wrappers.h -> build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/includes
copying deepspeed/ops/csrc/includes/custom_cuda_layers.h -> build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/includes
copying deepspeed/ops/csrc/includes/dropout.h -> build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/includes
copying deepspeed/ops/csrc/includes/ds_transformer_cuda.h -> build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/includes
copying deepspeed/ops/csrc/includes/feed_forward.h -> build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/includes
copying deepspeed/ops/csrc/includes/gelu.h -> build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/includes
copying deepspeed/ops/csrc/includes/gemm_test.h -> build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/includes
copying deepspeed/ops/csrc/includes/general_kernels.h -> build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/includes
copying deepspeed/ops/csrc/includes/normalize_layer.h -> build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/includes
copying deepspeed/ops/csrc/includes/softmax.h -> build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/includes
copying deepspeed/ops/csrc/includes/strided_batch_gemm.h -> build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/includes
copying deepspeed/ops/csrc/includes/type_shim.h -> build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/includes
creating build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/lamb
copying deepspeed/ops/csrc/lamb/fused_lamb_cuda.cpp -> build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/lamb
copying deepspeed/ops/csrc/lamb/fused_lamb_cuda_kernel.cu -> build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/lamb
creating build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/sparse_attention
copying deepspeed/ops/csrc/sparse_attention/utils.cpp -> build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/sparse_attention
creating build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/transformer
copying deepspeed/ops/csrc/transformer/cublas_wrappers.cu -> build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/transformer
copying deepspeed/ops/csrc/transformer/dropout_kernels.cu -> build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/transformer
copying deepspeed/ops/csrc/transformer/ds_transformer_cuda.cpp -> build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/transformer
copying deepspeed/ops/csrc/transformer/gelu_kernels.cu -> build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/transformer
copying deepspeed/ops/csrc/transformer/general_kernels.cu -> build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/transformer
copying deepspeed/ops/csrc/transformer/normalize_kernels.cu -> build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/transformer
copying deepspeed/ops/csrc/transformer/softmax_kernels.cu -> build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/transformer
copying deepspeed/ops/csrc/transformer/transform_kernels.cu -> build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/transformer
creating build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/utils
copying deepspeed/ops/csrc/utils/flatten_unflatten.cpp -> build/lib.linux-x86_64-3.6/deepspeed/ops/csrc/utils
copying deepspeed/ops/sparse_attention/trsrc/matmul.tr -> build/lib.linux-x86_64-3.6/deepspeed/ops/sparse_attention/trsrc
copying deepspeed/ops/sparse_attention/trsrc/softmax_bwd.tr -> build/lib.linux-x86_64-3.6/deepspeed/ops/sparse_attention/trsrc
copying deepspeed/ops/sparse_attention/trsrc/softmax_fwd.tr -> build/lib.linux-x86_64-3.6/deepspeed/ops/sparse_attention/trsrc
running build_ext
building 'deepspeed.ops.adam.cpu_adam_op' extension
creating build/temp.linux-x86_64-3.6
creating build/temp.linux-x86_64-3.6/csrc
creating build/temp.linux-x86_64-3.6/csrc/adam
/home/ec2-user/anaconda3/envs/pytorch_latest_p36/bin/x86_64-conda-linux-gnu-cc -DNDEBUG -fwrapv -O2 -Wall -Wstrict-prototypes -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/ec2-user/anaconda3/envs/pytorch_latest_p36/include -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /home/ec2-user/anaconda3/envs/pytorch_latest_p36/include -fPIC -Icsrc/includes -I/usr/local/cuda-10.1/include -I/home/ec2-user/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/torch/include -I/home/ec2-user/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/ec2-user/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/torch/include/TH -I/home/ec2-user/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/torch/include/THC -I/usr/local/cuda-10.1/include -I/home/ec2-user/anaconda3/envs/pytorch_latest_p36/include/python3.6m -c csrc/adam/cpu_adam.cpp -o build/temp.linux-x86_64-3.6/csrc/adam/cpu_adam.o -O3 -std=c++14 -L/usr/local/cuda-10.1/lib64 -lcudart -lcublas -g -Wno-reorder -march=native -fopenmp -D__AVX256__ -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=cpu_adam_op -D_GLIBCXX_USE_CXX11_ABI=0
cc1plus: warning: command line option '-Wstrict-prototypes' is valid for C/ObjC but not for C++
/usr/local/cuda-10.1/bin/nvcc -Icsrc/includes -I/usr/local/cuda-10.1/include -I/home/ec2-user/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/torch/include -I/home/ec2-user/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/ec2-user/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/torch/include/TH -I/home/ec2-user/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/torch/include/THC -I/usr/local/cuda-10.1/include -I/home/ec2-user/anaconda3/envs/pytorch_latest_p36/include/python3.6m -c csrc/adam/custom_cuda_kernel.cu -o build/temp.linux-x86_64-3.6/csrc/adam/custom_cuda_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -DTORCH_EXTENSION_NAME=cpu_adam_op -D_GLIBCXX_USE_CXX11_ABI=0 -ccbin /home/ec2-user/anaconda3/envs/pytorch_latest_p36/bin/x86_64-conda-linux-gnu-cc
In file included from /usr/local/cuda-10.1/include/cuda_runtime.h:83,
from <command-line>:
/usr/local/cuda-10.1/include/crt/host_config.h:138:2: error: #error -- unsupported GNU version! gcc versions later than 8 are not supported!
138 | #error -- unsupported GNU version! gcc versions later than 8 are not supported!
| ^~~~~
error: command '/usr/local/cuda-10.1/bin/nvcc' failed with exit status 1
Any input on how to solve this issue would be very much appreciated!