Repo for coding challenge done for a job interview.
Involved doing inference on the new OPT models and trying to make the inference better/more efficient
For e.g. 13b run:
$ mkdir 13b$ cd 13b$ python3 ../quantization_tools.py --specific_model=opt-13b --optimize --quantize$ cd ..
Again for 13b you can run
$ python3 benchmark.py --model_version=13b --num_samples=10
- Setup the app_config.json: optimization_level can be "baseline", "onnx", "onnxruntime", "fusion", or "quantized". opt_version varies over the different model versions: "125m", "250m", etc.
$ python3 app.py
Not yet properly implemented...
Additionally had to change one line in optimum/onnxruntime namely adding DisableShapeInference to the extra options