Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Adds profile observer to system. This outputs the following information 1) Input tensor sizes 2) Argument list 3) Output tensor sizes 4) Operator run time Example output: I0206 14:00:51.217067 1730559 profile_observer_gpu.cc:53] --------- Starting operator Conv op#0 --------- I0206 14:00:51.217073 1730559 profile_observer_gpu.cc:65] Input 0: Tensor gpu_0/data of type float. Dims: (32,3,227,227,): I0206 14:00:51.217077 1730559 profile_observer_gpu.cc:65] Input 1: Tensor gpu_0/conv1_w of type float. Dims: (64,3,7,7,): I0206 14:00:51.217082 1730559 profile_observer_gpu.cc:71] Argument 0: name: "kernel" i: 7 I0206 14:00:51.217087 1730559 profile_observer_gpu.cc:71] Argument 1: name: "enable_tensor_core" i: 0 I0206 14:00:51.217089 1730559 profile_observer_gpu.cc:71] Argument 2: name: "exhaustive_search" i: 1 I0206 14:00:51.217092 1730559 profile_observer_gpu.cc:71] Argument 3: name: "float16_compute" i: 0 I0206 14:00:51.217095 1730559 profile_observer_gpu.cc:71] Argument 4: name: "stride" i: 2 I0206 14:00:51.217099 1730559 profile_observer_gpu.cc:71] Argument 5: name: "pad" i: 3 I0206 14:00:51.217103 1730559 profile_observer_gpu.cc:71] Argument 6: name: "order" s: "NCHW" I0206 14:00:51.217105 1730559 profile_observer_gpu.cc:71] Argument 7: name: "ws_nbytes_limit" i: 67108864 I0206 14:00:51.217109 1730559 profile_observer_gpu.cc:85] Output 0: Tensor gpu_0/conv1 of type float. Dims: (32,64,114,114,): I0206 14:00:51.217111 1730559 profile_observer_gpu.cc:88] --------- Finished operator Conv in 1.12685 ms --------- Example output for internal RNN op (from seq2seq): I0219 18:57:06.779331 2960991 profile_observer_gpu.cc:52] --------- Starting operator LSTMUnit op#3161697160-7 --------- I0219 18:57:06.779336 2960991 profile_observer_gpu.cc:59] Input 0: Tensor model0/encoder/layer3/lstm/hidden_t_prev of type float. Dims: (1,1,512,): I0219 18:57:06.779340 2960991 profile_observer_gpu.cc:59] Input 1: Tensor model0/encoder/layer3/lstm/cell_t_prev of type float. Dims: (1,1,512,): I0219 18:57:06.779343 2960991 profile_observer_gpu.cc:59] Input 2: Tensor model0/encoder/layer3/lstm/gates_t of type float. Dims: (1,1,2048,): I0219 18:57:06.779346 2960991 profile_observer_gpu.cc:59] Input 3: Tensor encoder_lengths of type int. Dims: (1,): I0219 18:57:06.779350 2960991 profile_observer_gpu.cc:59] Input 4: Tensor timestep_rnnexec_t24 of type int. Dims: (1,): I0219 18:57:06.779353 2960991 profile_observer_gpu.cc:70] Argument 0: name: "no_sequence_lengths" i: 0 I0219 18:57:06.779357 2960991 profile_observer_gpu.cc:70] Argument 1: name: "drop_states" i: 0 I0219 18:57:06.779362 2960991 profile_observer_gpu.cc:70] Argument 2: name: "forget_bias" f: 0 I0219 18:57:06.779366 2960991 profile_observer_gpu.cc:79] Output 0: Tensor model0/encoder/layer3/lstm/hidden_t of type float. Dims: (1,1,512,): I0219 18:57:06.779369 2960991 profile_observer_gpu.cc:79] Output 1: Tensor model0/encoder/layer3/lstm/cell_t of type float. Dims: (1,1,512,): I0219 18:57:06.779372 2960991 profile_observer_gpu.cc:89] RecurrentNetwork 3161697160: order: 7 I0219 18:57:06.779373 2960991 profile_observer_gpu.cc:92] --------- Finished operator LSTMUnit in 0.00153923 ms --------- Existing deficiencies: 1) Need support to create separate CPU and GPU builds Once this is approved, I'll port the changes over to OSS
- Loading branch information