TensorRT小结2

TensorRT小结2

习TensorFlow-TensorRT项目,总结学习心得.

https://github.com/ardianumam/Tensorflow-TensorRT

  1. Read input Tensroflow model
  2. Convert to frozen model “.pb”
  3. Convert (optimize) to TensorRT model
  4. Inference using TensorRT model

利用keras训练mnist数据集 利用tensorRT改造后 推理加速

1.设置训练过程 读取图片

keras.preprocessing.image.ImageDataGenerator(featurewise_center=False,  
                                             samplewise_center=False, 
                                             featurewise_std_normalization=False, 
                                             samplewise_std_normalization=False, 
                                             zca_whitening=False, 
                                             zca_epsilon=1e-06, 
                                             rotation_range=0, 
                                             width_shift_range=0.0, 
                                             height_shift_range=0.0, 
                                             brightness_range=None, 
                                             shear_range=0.0, 
                                             zoom_range=0.0, 
                                             channel_shift_range=0.0, 
                                             fill_mode='nearest', 
                                             cval=0.0, 
                                             horizontal_flip=False, 
                                             vertical_flip=False, 
                                             rescale=None, 
                                             preprocessing_function=None, 
                                             data_format=None, 
                                             validation_split=0.0, 
                                             dtype=None)

通过实时数据增强生成张量图像数据批次,数据将不断循环.

rescale: 重缩放因子。默认为 None。如果是 None 或 0,不进行缩放,否则将数据乘以所提供的值(在应用任何其他转换之前)

置放缩因子为1/255,把像素值放缩到0和1之间有利于模型的收敛,避免神经元“死亡”。

创建train_generator和testing_generator

2.定义网络

定义keras网络,利用Sequential搭建: input_tensor –> Conv2D_1 –> Conv2D_2 –> Conv2D_3 –> Flatten –> Dense –> output_tensor

Layer (type)                 Output Shape              Param #   
=================================================================
input_tensor (Conv2D)        (None, 28, 28, 20)        520       
_________________________________________________________________
activation (Activation)      (None, 28, 28, 20)        0         
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 14, 14, 20)        0         
_________________________________________________________________
conv2d (Conv2D)              (None, 14, 14, 20)        10020     
_________________________________________________________________
activation_1 (Activation)    (None, 14, 14, 20)        0         
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 7, 7, 20)          0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 7, 7, 20)          10020     
_________________________________________________________________
activation_2 (Activation)    (None, 7, 7, 20)          0         
_________________________________________________________________
flatten (Flatten)            (None, 980)               0         
_________________________________________________________________
dense (Dense)                (None, 10)                9810      
_________________________________________________________________
output_tensor (Dense)        (None, 3)                 33        
=================================================================
Total params: 30,403
Trainable params: 30,403
Non-trainable params: 0

3.训练网络

model.fit_generator

4.推理

利用load_model读取网络 利用np.asarray转numpy的asarray格式

predict_classes与predict函数的区别 当使用predict()方法进行预测时,返回值是数值,表示样本属于每一个类别的概率,我们可以使用numpy.argmax()方法找到样本以最大概率所属的类别作为样本的预测标签。 当使用predict_classes()方法进行预测时,返回的是类别的索引,即该样本所属的类别标签。

出现的问题

Q1 a bytes-like object is required,not 'str'

在保存模型结果时,出现这种问题 encoding搞了半天也没有解决 最后将hdf5的版本由2.4.0升级到了2.5.0后问题解决.

Q2 Cuda Error in nvinfer1::cudnn::findFastestTactic

可能是因为cuda内存管理问题 导致溢出 多试了几次解决了

==================== YOLOv3 =====================

利用权重文件 yolov3_gpu_nms 转换成TensorRT_YOLOv3_2.pb

读取模型的输入输出:

input_tensor, output_tensors = \
utils.read_pb_return_tensors(tf.get_default_graph(),
                             TENSORRT_YOLOv3_MODEL,
                             ["Placeholder:0", "concat_9:0", "mul_9:0"])

在Session读取

 boxes, scores = sess.run(output_tensors, 
                                 feed_dict={input_tensor: 
                                            np.expand_dims(
                                                img_resized, axis=0)})
        boxes, scores, labels = utils.cpu_nms(boxes, 
                                              scores, 
                                              num_classes, 
                                              score_thresh=0.4, 
                                              iou_thresh=0.5)
        image = utils.draw_boxes(image, boxes, scores, labels, 
                                 classes, SIZE, show=False)
点击刷新