@diyer22 2017-12-02T15:43:15.000000Z 字数 4073 阅读 1013

review of segmentation

dl

review of segmentation

2.2 Transfer Learning

2.3 Data Preprocessing and Augmentation

Despite the power and flexibility of the FCN model,it still lacks various features which hinder its applicationto certain problems and situations:

its inherent spatial in-variance does not take into account useful global context information
no instance-awareness is present by default
efficiency is still far from real-time execution at high resolutions
and it is not completely suited for unstructured data such as 3D point clouds or models.

4.1 Decoder Variants

4.2 Integrating Context Knowledge
Even purely CNNs – without pooling layers – are limited since the receptive field of their units can only grow linearly with the number of layers

Many approaches can be taken to make CNNs awar eof that global information:

refinement as a post-processingstep with Conditional Random Fields (CRFs)
dilated convolutions
multi-scale aggregation
defer the context modeling to RNNs

4.2.1 Conditional Random Fields

4.2.2 Dilated Convolutions
Those works also show a common trend: dilated convolutions are tightly coupled to multi-scale context aggregation

4.2.3 Multi-scale Prediction
the filters will implicitly learn to detect features at specific scales (presumably with certain invariance degree)

use multi-scale networks which generally make use of multiple networks that target different scales and then merge the predictions to produce a single output

[74] 2015 Predicting depth, surface normalsand semantic labels with a common multi-scale convolutionalarchitecture,” inProceedings of the IEEE International Conference onComputer Vision, 2015
like segNet

[76] 2015 Predicting depth, surface normalsand semantic labels with a common multi-scale convolutionalarchitecture,” inProceedings of the IEEE International Conference onComputer Vision, 2015
特征融合

4.2.4 Feature Fusion

[77],[84]

4.3 Instance Segmentation

4.4 RGB-D Data
Different techniques such as Horizontal Height Angle (HHA) [11] are used for encoding the depth into three channels as follows: horizontal disparity, height above ground, and the angle between local surface normal and the inferred gravity direction

leverage a multi-viewapproach to improve existing single-view works

4.5 3D Data
take a point cloud and parse it through a dense voxel grid, generating a set of occupancy voxels which are used as input to a 3D CNN to produce one label per voxel.They then map back the labels to the point cloud
it has somedisadvantages:
* quantization
* loss of spatial information
* unnecessarily large representations

PointNet is based on fully connected layers instead of convolutional ones

4.6 Video Sequences
features from shallow layers change faster than deepones
processing them at different update rates depending on their depth. By doing this, deep features can be persisted over frames thanks to their semantic stability, thus saving inference time

5 DISCUSSION
we will gather the results of the methods on the most representative datasets using the previously described met-rics.
5.1 Evaluation Metrics

As we have observed, many methods report results on non-standard datasets or they are not even tested at all. Thatmakes comparisons impossible

5.3 Summary

DeepLab is the most solid method which out performsthe rest on almost every single RGB images dataset by asignificant margin.
The 2.5D or multimodal datasets aredominated by recurrent networks such as LSTM-CF.
3Ddata segmentation still has a long way to go with Point-Net paving the way for future research on dealing with unordered point clouds without any kind of preprocessing or discretization.
dealing with video sequences is another green area with no clear direction, but Clockwork Convnets are the most promising approach thanks to their efficiency and accuracy duality.
3D convolutions are worth remarking due to their power and flexibility to process multichannel inputs, making them successful at capturing both spatial and temporal information

5.4 Future Research Directions

3D datasets: lack data,ILSVRC will feature 3D data in 2018

Sequence datasets: lack of large-scale data

Point cloud segmentation using Graph ConvolutionalNetworks (GCNs): treat point clouds as graphs andapply convolutions over them

Context knowledge:

Real-time segmentation:

Memory:Pruning to simplify a network

Temporal coherency on sequences: it is important to work on video streams

Multi-view integration: Use of multiple views in re-cently proposed segmentation works is mostly lim-ited to RGB-D cameras and in particular focused on single-object segmentation.

2D 常规图像
2D 交通图像
2D 其他数据集
2.5D (RGB-D)数据集
3D CAD数据集
3D 点云数据集

FCN介绍
解码器变体(SegNet)
整合全局信息
条件随机场(CRF)
膨胀卷积(Dilated Convolutions)
多尺度
特征融合
RNN
实例(Instance)分割
2.5D(RGB-D)数据
3D数据
视频序列

review of segmentation

review of segmentation

内容目录