Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Store SparseTensors in a Map inside a container for Queue round-trip.
This is much more efficient than serializing the underlying Tensors to strings and dserializing them on the other side. Instead we pass through the keys to the SparseTensors inside the Map. Methods are kept private for use by queueing wrappers. Includes benchmarks that show wall-time is almost 50% of the wall-time of using the sparse serialization/deserialization wrappers: I1003 17:24:34.355306 18675 benchmark.py:77] Benchmark [BenchmarkSparseTensorsMapVsSerialization.benchmark_very_large_2d_float_st_tensor_maps] iters: 2000, wall_time: 0.00260997, cpu_time: -1,throughput: -1 I1003 17:24:42.735983 18675 benchmark.py:77] Benchmark [BenchmarkSparseTensorsMapVsSerialization.benchmark_vey_large_2d_float_st_serialization] iters: 2000, wall_time: 0.00415492, cpu_time: -1,throughput: -1 *** Update: After updates to sparse_tensor.h's concat code (pushed in a sister PR), there's a speedup in both benchmarks: I1004 09:39:30.630354 24400 benchmark.py:77] Benchmark [BenchmarkSparseTensorsMapVsSerialization.benchmark_very_large_2d_float_st_tensor_maps] iters: 2000, wall_time: 0.0022105 I1004 09:39:38.125391 24400 benchmark.py:77] Benchmark [BenchmarkSparseTensorsMapVsSerialization.benchmark_very_large_2d_float_st_serialization] iters: 2000, wall_time: 0.00372696 *** Update 2: After properly placed std::moves in the sparse_tensors_map code, that benchmark is now faster: Benchmark [BenchmarkSparseTensorsMapVsSerialization.benchmark_very_large_2d_float_st_tensor_maps] iters: 2000, wall_time: 0.00187492 Total speedup is now: 0.00415492 / 0.00187492 = 2.2x Change: 135805924
- Loading branch information