Skip to content

Conversation

@SrivastavaKshitij
Copy link
Contributor

Hi @jaybdub:

Hope you are doing well. This PR

  1. fixes documentation for QAT section
  2. makes the output much more readable for test.py
  3. Added pSNR (peak signal to noise ratio test) for all the converters.

I have been using pSNR test for quite sometime to test the correctness of TRT conversion instead of max difference as pSNR is more robust test than max difference.

There is an issue with interpolate layer in TRT>=7 when align_corners=True. I added a unit test for a similar set of parameters where internal model metrics were showing regression. However, the unit test passed the max_difference test in the repo but failed on pSNR test. I will get back to this test case at the end

Usually, when we calculate pSNR at FP32 , if pSNR>=100 , it is safe to say that the conversion was fine. The best case is when pSNR=NaN , i.e. , MSE or mean squared error was zero , that led to pSNR being an infinite number which means that both the output tensors (pytorch model output and trt model output) were identical

Although I havent added pSNR test at FP16 in the repo, I can summarize my findings here , so that if you feel free, we can add those later on.

Inspired from Image processing basics, if

  1. pSNR at FP32 >= 100db , we can safely say that the conversion is good
  2. pSNR at FP16 ~= (pSNR at FP32)//2 - x

x~=10. Based on numerous experiments that I did, x is around 10db and accounts for variance.

There is no mathematical explanation but a toolchain explanation.
TRT basically fuses layers , uses different kernels to execute same layers after fusion, so these optimizations introduce some kind of bias (as one would call). However, if you look at the model as a whole, net effect is 0.
Since we are comparing an unoptimized network with an optimized one, value by value , we can see that difference during pSNR test. However, it will not show up when we run a PR.

For e.g.

pSNR at FP32 = 120db
pSNR at FP16 = 120/2-10 = 50db

For FP32, if pSNR is less than 100db (regardless of the above example) , then there is a conversion issue. 
For FP16, if pSNR is less than 50db (in the above example), then there is a conversion issue

pSNR will drop if MSE (mean squared error) is high

Now back to interpolate conversion issue

|                    torch2trt.converters.interpolate.test_bilinear_mode | float32 |          [(1, 4, 12, 12)] | {} | 2.38E-07 | 160.7023 | 0.0000 | 4.08e+04 | 1.58e+04 | 0.0425 | 0.0799 |
|                     torch2trt.converters.interpolate.test_align_corner | float32 |          [(1, 3, 12, 12)] | {} | 1.90E+00 | 14.5676 | 0.2077 | 4.21e+04 | 1.58e+04 | 0.0423 | 0.0792 |
|          torch2trt.converters.interpolate.test_align_corner_functional | float32 |          [(1, 3, 12, 12)] | {} | 2.01E+00 | 16.1512 | 0.1890 | 4.09e+04 | 1.54e+04 | 0.0433 | 0.0819 |
|    torch2trt.converters.interpolate.test_bilinear_mode_odd_input_shape | float32 |          [(1, 5, 13, 13)] | {} | 2.38E-07 | 155.6233 | 0.0000 | 4.2e+04 | 1.59e+04 | 0.0419 | 0.0798 |

As you can see all the 4 tests pass max difference test. However, the middle tests represent test cases where align_corner=True and pSNR <20db at FP32 which indicates that there is a problem in conversion and it leads to degradation in model metrics. 1st and 4th tests have align_corner=False and pSNR is way greater than 100.

I have already raised the issue with Nvidia and it is being looked into. I think pSNR test will make our unit tests more robust and we should definitely add them

Let me know if you have any questions

Thanks

Kshitij Srivastava

@jaybdub
Copy link
Contributor

jaybdub commented Aug 9, 2021

Hi @SrivastavaKshitij ,

Thanks for sharing this. I just reviewed the PR, and it looks good to me.

I'm going to quickly test and assuming everything goes smoothly, should be good to merge.

Best,
John

@jaybdub jaybdub merged commit 311f328 into NVIDIA-AI-IOT:master Aug 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants