20210619

[cuda10.2导致的bug]

好不容易将solov2_trt编译通过,运行期间又出妖蛾子了.

编译生成了 demo_solov2_coco 可执行文件(build/examples目录下)
平台上提供了 solov2_lite_coco_704.onnx 文件,需要使用onnx2trt转换成trt engine.
参考
< https://github.com/onnx/onnx-tensorrt >
按照安装步骤,然后运行

onnx2trt my_model.onnx -o my_engine.trt

转好了trt engine,就可以运行
./demo_solov2_coco -trt_f model.trt -data_f a.mp4

然后,果然有坑!
报错如下:
OpenCV: FFMPEG: tag 0x47504a4d/'MJPG' is not supported with codec id 7 and format 'mp4 / MP4 (MPEG-4 Part 14)'
OpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v'
[06/19/2021-19:52:38] [F] [TRT] Assertion failed: cublasStatus == CUBLAS_STATUS_SUCCESS
../rtSafe/cublas/cublasLtWrapper.cpp:279
Aborting...
[06/19/2021-19:52:38] [E] [TRT] FAILED_EXECUTION: std::exception
[06/19/2021-19:52:38] [E] Execute Failed!
[06/19/2021-19:52:38] [I] input data size: 1486848Infer time is 0ms
[06/19/2021-19:52:38] [F] [TRT] Assertion failed: cublasStatus == CUBLAS_STATUS_SUCCESS
../rtSafe/cublas/cublasLtWrapper.cpp:279
Aborting...
[06/19/2021-19:52:38] [E] [TRT] FAILED_EXECUTION: std::exception
[06/19/2021-19:52:38] [E] Execute Failed!
[06/19/2021-19:52:38] [I] Infer time is 0ms
[06/19/2021-19:52:39] [F] [TRT] Assertion failed: cublasStatus == CUBLAS_STATUS_SUCCESS
../rtSafe/cublas/cublasLtWrapper.cpp:279
Aborting...
[06/19/2021-19:52:39] [E] [TRT] FAILED_EXECUTION: std::exception
[06/19/2021-19:52:39] [E] Execute Failed!
[06/19/2021-19:52:39] [I] Infer time is 0ms
[06/19/2021-19:52:39] [F] [TRT] Assertion failed: cublasStatus == CUBLAS_STATUS_SUCCESS
../rtSafe/cublas/cublasLtWrapper.cpp:279


经查,这个问题是cuda10.2的问题,取官网下载补丁(patch2)安装即可.
官网关于patch的介绍
Patch 2 (Released Nov 17, 2020)
This patch fixes an issue in cuBLAS library batched GEMM APIs which caused silent corruption of data in uncommon cases with large batch counts in mixed precision and fast math.

然后,成功踩完了所有的坑.

![0_1624110772850_Screenshot from 2021-06-19 21-52-21.png](正在上传 100%)