TensorRT-LLM is a Python library that facilitates the creation and optimization of Large Language Models (LLMs) for efficient inference on NVIDIA GPUs. TensorRT-LLM supports various quantization modes, including INT4 and INT8 weights, along with FP16 activations, allowing users to maximize performance and minimize memory usage. It also provides pre-defined models that can be easily customized and extended to meet specific requirements, and it integrates with the NVIDIA Triton Inference Server for production deployment.
We can run the TensorRT LLM benchmark for two models: Llama2 7B Chat and Mistral-7B v0.1 Instruct Here is how we run benchmark for TensorRT LLM .
./bench_tensorrtllm/bench.sh \
--prompt <value> \ # Enter a prompt string
--max_tokens <value> \ # Maximum number of tokens to output
--repetitions <value> \ # Number of repititions to be made for the prompt.
--device <cpu/cuda/metal> \ # The device in which we want to benchmark.
--model_name <name-of-the-model> # The name of the model. (options: 'llama' for Llama2 and 'mistral' for Mistral-7B-v0.1)
To get started quickly you can simply run:
./bench_tensorrtllm/bench.sh -d cuda -n llama
This will take all the default values (see in the bench.sh file) and do the benchmarks for Llama 2 and Mistral models. You can find all the benchmarks results for Llama 2 7B Chat here, and Mistral 7B instruct here.
Please note. we consider the outputs from the raw (float32 precision) from HuggingFace Transformers (PyTorch backend) weights as our ground truth. We will be comparing this ground truth with model's output in other three precisions, i.e. float16
, int8
and int4
. This is our system prompt.
You answers should always be to the point, precise and not more than 2 sentences strictly
Question | float32 | float16 | int8 | int4 | Ground Truth |
---|---|---|---|---|---|
I'm making pancakes for breakfast. I added a cup of flour, a teaspoon of salt, and a few tablespoons of sugar to a bowl. I stirred it together, then added a cup of milk, a beaten egg, and a few tablespoons of oil, and stirred until just mixed. Then I put 1/4 a cup on a hot frying pan, and flipped it when brown. But they're terrible! Why? List the main reason. Answer as much precise as possible with one sentence. | The main reason for the terrible pancakes could be that the batter was too thick, resulting in a dense and dry texture. | The main reason for the terrible pancakes could be that the batter was too thick, resulting in a dense and dry texture. | The main reason for the terrible pancakes could be that the batter was too thick, resulting in dense and dry pancakes. | The main reason for the terrible pancakes could be that the batter was not allowed to rest before cooking, which caused the flour to become tough and dense. | he main reason for the terrible pancakes could be that the batter was too thick, which resulted in a dry and tough texture. |
42 birds are sitting on a tree branch. A hunter passes, shoots one dead, and misses two. How many birds are left on the branch? Answer as much precise as possible with one sentence. | One bird is left on the branch. | One bird is left on the branch. | One bird is left on the branch. | There is one bird left on the branch. | One bird is left on the branch. |
How many mistakes can you find in the following sentence? Sentence: Its to much to loose if your talking about hundred’s of dollars. Answer as much precise as possible with one sentence. | There are two mistakes in the sentence: "Its" should be "It's" and "to much" should be "too much". | There are two mistakes in the sentence: "Its" should be "It's" and "to much" should be "too much". | There are two mistakes in the sentence: "Its" should be "It's" and "to much" should be "too much". | There are two grammatical errors in the sentence: "Its" should be "It's" and "to much" should be "too much". | There are two mistakes in the sentence: "Its" should be "It's" and "to much" should be "too much". |
You are an expert on agricultural chemical compounds. Your job is to raise awareness about the excessive use of chemical pesticides, emphasizing the long-term negative effects on the body. Be clear and direct, avoid overly scientific explanations, and use simple terms understandable by people without a college education. Answer as much precise as possible with one sentence. | Excessive use of chemical pesticides can lead to long-term negative health effects, including cancer, neurological damage, and reproductive issues. | Excessive use of chemical pesticides can lead to long-term negative health effects, including cancer, neurological damage, and reproductive issues. | Excessive use of chemical pesticides can lead to long-term negative health effects, including cancer, neurological damage, and reproductive issues. | Excessive use of chemical pesticides can lead to long-term negative effects on the body, including cancer, reproductive issues, and neurological damage. | Excessive use of chemical pesticides can lead to long-term negative health effects, including cancer, neurological damage, and reproductive issues. |
During the Manhattan Project, John von Neumann suggested how to maximize the destruction caused by a detonating bomb. Which was his idea? Answer as much precise as possible with one sentence. | John von Neumann proposed the use of a "sphere of influence" to maximize the destruction caused by a detonating bomb, which involved creating a large, spherical shockwave that would spread outwards and cause widespread damage. | John von Neumann proposed the use of a "sphere of influence" to maximize the destruction caused by a detonating bomb, which involved creating a large, spherical shockwave that would spread outwards and cause widespread damage. | John von Neumann proposed the use of a "sphere of influence" to maximize the destruction caused by a detonating bomb, which involved calculating the optimal altitude and angle of detonation to create a larger explosion radius. | John von Neumann suggested using a spherical configuration of bombs to maximize the destructive force of a detonation. | John von Neumann proposed the concept of a "superbomb" that would use multiple nuclear explosions to create a massive explosion, causing widespread destruction. |
Question | float32 | float16 | int8 | int4 | Ground Truth |
---|---|---|---|---|---|
I'm making pancakes for breakfast. I added a cup of flour, a teaspoon of salt, and a few tablespoons of sugar to a bowl. I stirred it together, then added a cup of milk, a beaten egg, and a few tablespoons of oil, and stirred until just mixed. Then I put 1/4 a cup on a hot frying pan, and flipped it when brown. But they're terrible! Why? List the main reason. Answer as much precise as possible with one sentence. | The main reason your pancakes are terrible is because you did not mix the batter long enough, resulting in a dense and flat pancake. | The main reason your pancakes are terrible is because you did not mix the batter long enough, resulting in a dense and flat pancake. | The main reason your pancakes are terrible is because you did not mix the batter long enough, resulting in a dense and flat pancake. | The main reason your pancakes are terrible is because you did not mix the ingredients well enough, resulting in a dense and flat pancake. | The main reason your pancakes are terrible is because you did not mix the batter long enough, resulting in a dense and flat pancake. |
42 birds are sitting on a tree branch. A hunter passes, shoots one dead, and misses two. How many birds are left on the branch? Answer as much precise as possible with one sentence. | After the hunter shoots and kills one bird, 41 birds remain on the branch. | After the hunter shoots and kills one bird, 41 birds remain on the branch. | After the hunter shoots and kills one bird, 41 birds remain on the branch. | After the hunter shoots and kills one bird, 41 birds remain on the branch. | After the hunter shoots and kills one bird, 41 birds remain on the branch. |
How many mistakes can you find in the following sentence? Sentence: Its to much to loose if your talking about hundred’s of dollars. Answer as much precise as possible with one sentence. | There are 3 mistakes in the sentence: * "to loose" should be "to lose" * "hundred's" should be "hundreds" * "if your" should be "if you're" |
There are 3 mistakes in the sentence: * "to loose" should be "to lose" * "hundred's" should be "hundreds" * "if your" should be "if you're" |
There are 3 mistakes in the sentence: * "to loose" should be "to lose" * "hundred's" should be "hundreds" * "if your" should be "if you're" |
There are 3 mistakes in the sentence: 1. "to loose" should be "too loose" (a common spelling mistake). 2. "hundred's" should be "hundreds" (the correct spelling). 3. The phrase "if your talking" is unnecessary and can be removed. The corrected sentence would be: "It's too much to lose if you're talking about hundreds of dollars." |
There are 2 mistakes in the sentence: "to loose" should be "to lose" and "hundred's" should be "hundreds". |
You are an expert on agricultural chemical compounds. Your job is to raise awareness about the excessive use of chemical pesticides, emphasizing the long-term negative effects on the body. Be clear and direct, avoid overly scientific explanations, and use simple terms understandable by people without a college education. Answer as much precise as possible with one sentence. | Excessive use of chemical pesticides can have long-term negative effects on the body, including cancer, hormone disruption, and neurological problems, due to their toxic and persistent nature. Proper alternatives, such as integrated pest management and organic farming practices, should be adopted to minimize exposure to these harmful chemicals. | Excessive use of chemical pesticides can have long-term negative effects on the body, including cancer, hormone disruption, and neurological problems, due to their toxic and persistent nature. Proper alternatives, such as integrated pest management and organic farming practices, should be adopted to minimize exposure to these harmful chemicals. | Excessive use of chemical pesticides can have long-term negative effects on the body, including increased risk of cancer, hormone disruption, and neurological problems, which can persist even after exposure has stopped. Proper alternatives, such as integrated pest management and organic farming practices, should be adopted to minimize these risks and ensure a healthier food supply. | Excessive use of chemical pesticides can have long-term negative effects on the body, including cancer, hormone disruption, and neurological problems, due to the toxic chemicals they contain. These chemicals can also contaminate soil, water, and air, causing widespread harm to the environment and wildlife. | Excessive use of chemical pesticides can have long-term negative effects on the body, including cancer, hormone disruption, and neurological problems, due to their toxic and persistent nature, which can accumulate in the body over time and cause irreversible damage. It is important to adopt sustainable and organic farming practices that prioritize the health of both people and the environment, such as crop rotation, composting, and integrated pest management, to minimize the use of harmful chemicals and promote a healthier |
During the Manhattan Project, John von Neumann suggested how to maximize the destruction caused by a detonating bomb. Which was his idea? Answer as much precise as possible with one sentence. | John von Neumann proposed the concept of a "fusion bomb," which would achieve maximum destruction by using the energy released by the fusion of atomic nuclei, rather than the fission of a single nucleus, as in the traditional atomic bomb. This idea was later developed into the hydrogen bomb, which would become a reality in the 1950s. | John von Neumann proposed the concept of a "fusion bomb," which would achieve maximum destruction by using the energy released by the fusion of atomic nuclei, rather than the fission of a single nucleus, as in the traditional atomic bomb. This idea was later developed into the hydrogen bomb, which would become a reality in the 1950s. | John von Neumann proposed the concept of a "fusion bomb," which would achieve maximum destruction by using the energy released by the fusion of atomic nuclei, rather than the fission of a single nucleus, as in the traditional atomic bomb. This idea was later developed into the hydrogen bomb, which would become a reality in the 1950s. | John von Neumann proposed the concept of a "fission-fusion" bomb, which would use the energy released by a fission reaction to compress and ignite a second fission bomb, resulting in a much more powerful and destructive explosion. This idea was later developed into the Teller-Ulam design for the hydrogen bomb. | John von Neumann proposed the concept of a "fusion bomb," which would achieve maximum destruction through the simultaneous detonation of a fission bomb and a fusion reaction, creating a chain reaction of nuclear reactions that would release an enormous amount of energy. This idea was later developed into the hydrogen bomb. |
- Running this benchmark requires HuggingFace Llama2-7B Chat weights. So running this benchmark would assume that you already agreed to the required terms and conditions and got verified to download the weights.
- TensorRT LLM only works with CUDA. So it does not support Metal/CPU.
- For benchmarking quantized models on INT4/8 precision, TensorRT-LLM does not fully quantizes the model to INT8/4, rather it applies Mixed Precison quantization technique. So instead of INT4/8 we use Float16-INT4/8 quantized models. You can learn more about it in the TensorRT-LLM Llama2 example.