Reconstruction of High-precision Semantic Map

Reconstruction of High-precision Semantic Map
SENSORS 2020 (POSTER)

Xinyuan Tu
Nanjing University
Jian Zhang
Nanjing University
Runhao Luo
Nanjing University
Kai Wang
Nanjing University
Qingji Zeng
Nanjing University

Yu Zhou#
Nanjing University
Yao Yu
Nanjing University
Sidan Du
Nanjing University

#denotes corresponding author

Paper

Abstract

We present a real-time Truncated Signed Distance Field (TSDF)-based three-dimensional (3D) semantic reconstruction for LiDAR point cloud, which achieves incremental surface reconstruction and highly accurate semantic segmentation. The high-precise 3D semantic reconstruction in real time on LiDAR data is important but challenging. Lighting Detection and Ranging (LiDAR) data with high accuracy is massive for 3D reconstruction. We so propose a line-of-sight algorithm to update implicit surface incrementally. Meanwhile, in order to use more semantic information effectively, an online attention-based spatial and temporal feature fusion method is proposed, which is well integrated into the reconstruction system. We implement parallel computation in the reconstruction and semantic fusion process, which achieves real-time performance. We demonstrate our approach on the CARLA dataset, Apollo dataset, and our dataset. When compared with the state-of-art mapping methods, our method has a great advantage in terms of both quality and speed, which meets the needs of robotic mapping and navigation.

Overview Result

Spatial-Temporal Feature Fusion

The incremental reconstruction result provides voxels’ normal that is embedded with position of the sensor as the input of our Observation Adaptive Network (OAN) to provide the observation effectiveness information. Secondly, through the 2D semantic segmentation network, image feature is extracted and LiDAR point cloud is projected to obtain its corresponding image feature. The image feature of current voxel is also the input of our OAN, and it is used together with voxel’s normal and position of the sensor to update current voxel’s state. Finally, we use Attention Based Spatial Fusion Network (ABSFN) to fuse voxels’ state within a limited range of adjacent space to obtain current voxel’s semantic label.

OAN. We assume there are two main factors related to the effectiveness of observations. Firstly, the location of observation $L^k _i$ in the local coordinate system centered on current voxel. Secondly, the normal of current voxel $N^k _i$ . The combination of normal and position can represent the validity of the observation from a geometric perspective. When the observation degree is close to 90 degrees from normal, the observation is not reliable. When the observation degree is close to 0 degree, the observation is reliable. Consequently, we utilize GRU to achieve the observation state $E^k _i$ , which uses voxel’s normal and sensor’s local position as input.

ABSFN. In order to making the most use of spatial feature. We use the self-attention mechanism whose input are the hidden state stored in each voxel $F^k_i$ and offset of neighborhood voxels to the current voxel $O^k _{i,j}$, to explicitly measure the correlation between the current voxel and its adjacent voxels.

The website template was borrowed from Michaël Gharbi.