Researchers from Imperial College London have proposed a new random connection neural network for self-supervised monocular depth estimation in computer vision

2021-12-14 13:30:22 By : Mr. Bill Wu

Depth estimation is one of the basic problems in computer vision, and it is essential for a wide range of applications such as robotic vision or surgical navigation.

In recent years, various deep learning-based methods have been developed to provide end-to-end solutions for depth and disparity estimation. One such method is self-supervised monocular depth estimation. Monocular depth estimation is the process of determining the depth of a scene from a single image. For disparity estimation, most of these models use U-Net-based designs.

Although humans can easily perceive relative depth, the same task of machines has proven to be very challenging due to the lack of an optimal architecture. To solve this problem, we chose a more complex architecture to generate high-resolution photometric output.

The research team at the Hamlyn Center at Imperial College London introduced a unique random connection encoder-decoder architecture for self-supervised monocular depth estimation. The model architecture design is able to extract high-order features from a single image and the loss function used to impose a reliable feature distribution, which is the credit for the success of this idea.

The fact that the connection method does not matter is the basis of this research. The first step in developing this model architecture is to challenge this concept. Researchers built this method by modeling randomly connected neural networks as graphs, with each node acting as a convolutional layer. The random graph generation method connects each of these nodes. After creating the graph, use a deep learning toolkit (such as PyTorch) to convert it into a neural network.

A cascaded random search method is introduced to generate arbitrary network architectures to ensure effective search in the connected space. In addition, a new variant of U-Net topology has been developed to improve the expressive ability of skip connection feature maps spatially and semantically.

Unlike ordinary U-Net, this unique design includes convolution (learnable layer) in the skip connection itself. Therefore, our researchers can improve the utilization of deep semantic features of encoder feature maps, which are usually reserved in the channel space but not explicitly used.

The researchers said that the multi-scale loss function is essential to improve the image reconstruction process. The combination of these multi-scale loss functions provides a new loss function that can improve the quality of image reconstruction. The new loss function effectively extends the depth feature confrontation and perceptual loss to many scales to achieve high-quality view synthesis and error calculation.

The researchers used two surgical data sets to compare the results of their method with state-of-the-art self-supervised depth estimation methods. The research results show that even a random connection network with a normal convolution operation but a unique interconnection can effectively learn the task. In addition, according to this research, the multi-scale penalty in the loss function is essential to create finer details.

The whole idea of ​​conducting this research is to lay the foundation for further research related to neural network architecture design. The experimental results may be helpful for research aimed at shifting from traditional U-Net and manual trial-and-error procedures to more automated design methods.

Paper: https://www.tandfonline.com/doi/full/10.1080/21681163.2021.1997648

Reference: https://www.imperial.ac.uk/news/230818/randomly-connected-neural-network-self-supervision-monocular/

No thanks, I am not interested!