@diyer22
2017-12-02T15:43:15.000000Z
字数 4073
阅读 965
dl
2.2 Transfer Learning
2.3 Data Preprocessing and Augmentation
Despite the power and flexibility of the FCN model,it still lacks various features which hinder its applicationto certain problems and situations:
4.1 Decoder Variants
4.2 Integrating Context Knowledge
Even purely CNNs – without pooling layers – are limited since the receptive field of their units can only grow linearly with the number of layers
Many approaches can be taken to make CNNs awar eof that global information:
4.2.1 Conditional Random Fields
4.2.2 Dilated Convolutions
Those works also show a common trend: dilated convolutions are tightly coupled to multi-scale context aggregation
4.2.3 Multi-scale Prediction
the filters will implicitly learn to detect features at specific scales (presumably with certain invariance degree)
use multi-scale networks which generally make use of multiple networks that target different scales and then merge the predictions to produce a single output
[74] 2015 Predicting depth, surface normalsand semantic labels with a common multi-scale convolutionalarchitecture,” inProceedings of the IEEE International Conference onComputer Vision, 2015
like segNet
[76] 2015 Predicting depth, surface normalsand semantic labels with a common multi-scale convolutionalarchitecture,” inProceedings of the IEEE International Conference onComputer Vision, 2015
特征融合
4.2.4 Feature Fusion
[77],[84]
4.3 Instance Segmentation
4.4 RGB-D Data
Different techniques such as Horizontal Height Angle (HHA) [11] are used for encoding the depth into three channels as follows: horizontal disparity, height above ground, and the angle between local surface normal and the inferred gravity direction
leverage a multi-viewapproach to improve existing single-view works
4.5 3D Data
take a point cloud and parse it through a dense voxel grid, generating a set of occupancy voxels which are used as input to a 3D CNN to produce one label per voxel.They then map back the labels to the point cloud
it has somedisadvantages:
* quantization
* loss of spatial information
* unnecessarily large representations
PointNet is based on fully connected layers instead of convolutional ones
4.6 Video Sequences
features from shallow layers change faster than deepones
processing them at different update rates depending on their depth. By doing this, deep features can be persisted over frames thanks to their semantic stability, thus saving inference time
5 DISCUSSION
we will gather the results of the methods on the most representative datasets using the previously described met-rics.
5.1 Evaluation Metrics
As we have observed, many methods report results on non-standard datasets or they are not even tested at all. Thatmakes comparisons impossible
5.3 Summary
5.4 Future Research Directions
3D datasets: lack data,ILSVRC will feature 3D data in 2018
Sequence datasets: lack of large-scale data
Point cloud segmentation using Graph ConvolutionalNetworks (GCNs): treat point clouds as graphs andapply convolutions over them
Context knowledge:
Real-time segmentation:
Memory:Pruning to simplify a network
Temporal coherency on sequences: it is important to work on video streams
Multi-view integration: Use of multiple views in re-cently proposed segmentation works is mostly lim-ited to RGB-D cameras and in particular focused on single-object segmentation.
2D 常规图像
2D 交通图像
2D 其他数据集
2.5D (RGB-D)数据集
3D CAD数据集
3D 点云数据集
FCN介绍
解码器变体(SegNet)
整合全局信息
条件随机场(CRF)
膨胀卷积(Dilated Convolutions)
多尺度
特征融合
RNN
实例(Instance)分割
2.5D(RGB-D)数据
3D数据
视频序列